Knowledgebase:
Recommended Settings for GFS Mounted File Systems
28 April 2015 02:56 PM

Summary

When used as a file system, GFS needs to be tuned for optimal performance with MarkLogic Server.

Recommendations

Specifically, we recommend tuning the demote_secs and statfs_fast parameters. The demote_secs parameter determines the amount of time GFS will wait before demoting a lock on a file that is not in use. (GFS uses a time-based locking system.) One of the ways that MarkLogic Server makes queries go fast is its use of memory mapped index files. When index files are stored on a GFS filesystem, locks on these memory-mapped files are demoted purely on the basis of demote_secs, regardless of use. This is because they are not accessed using a method that keeps the lock active -- the server interacts with the memory map, not direct access to the on-disk file.

When a GFS lock is demoted, pages from the memory-mapped index files are removed from cache. When the server makes another request of the memory-mapped file, GFS must acquire another lock and the requested page(s) from the on-disk file must be read back into cache. The lock reacquisition process, as well as the I/O needed to load data from disk into cache, may causes noticeable performance degradation.

Starting with MarkLogic Server 4.0-4, MarkLogic introduced an optimization for GFS. From that maintenance release forward, MarkLogic gets the status of its memory-maps files every hour, which results in the retention of the GFS locks on those files so that they do not get demoted. Therefore, it is important that demote_secs is equal to or greater than one hour. It is also recommended that the tuning parameter statfs_fast is set to "1" (true), which makes statfs on GFS faster.

Using gfs_tool, you should be able to set the demote_secs and statfs_fast parameters to the following values:

demote_secs 3600

statfs_fast 1

While we're discussin tuning a Linux filesystem, it is worth noting the following Linux tuning tips also:

  • Use the deadline elevator (aka I/O scheduler), rather than cfq, on all hosts in the cluster. This has been added to our installation requirements for RHEL. With RHEL-4, this requires the elevator=deadline option at boot time. With RHEL-5, this can be changed at any time via /sys/block/*/queue/scheduler
  • If you are running on a VM slice, then no-op I/O scheduler is recommended.
  • Set the following kernel tuning parameters:

Edit /etc/sysctl.conf:

vm.swappiness = 0

vm.dirty_background_ratio=1

vm.dirty_ratio=40

Use sudo sysctl -f to apply these changes.

  • It is very important to have at least one journal per host that will mount the filesystem. If the number of hosts exceeds the number of journals, performance will suffer. It is, unfortunately, impossible to add more journals without rebuilding the entire filesystem, so be sure to set journals up for each host during your initial build.

 

Working with RedHat

Should you run into GFS-related problems, running the following Script will provide all the information that you need in order to work with the Redhat Support Team:


mkdir /tmp/debugfs

mount -t debugfs none /tmp/debugfs

mkdir /tmp/$(hostname)-hangdata

cp -rf /tmp/debugfs/dlm/ /tmp/$(hostname)-hangdata

cp -rf /tmp/debugfs/gfs2/ /tmp/$(hostname)-hangdata

echo 1 > /proc/sys/kernel/sysrq 

echo 't' > /proc/sysrq-trigger 

sleep 60

cp /var/log/messages /tmp/$(hostname)-hangdata/

clustat > /tmp/$(hostname)-hangdata/clustat.out

cman_tool services > /tmp/$(hostname)-hangdata/clustat.out

mount -l > /tmp/$(hostname)-hangdata/mount-l.out

ps aux > /tmp/$(hostname)-hangdata/ps-aux.out

tar cjvf /tmp/$(hostname)-hangdata.tar.bz /tmp/$(hostname)-hangdata/

umount /tmp/debugfs/

rm -rf /tmp/debugfs

rm -rf /tmp/$(hostname)-hangdata

(0 vote(s))
Helpful
Not helpful

Comments (1)
Peter Kester
04 June 2015 08:53 AM
Do these settings also apply to GFS2 filesystems?