Solutions

MarkLogic Data Hub Service

Fast data integration + improved data governance and security, with no infrastructure to buy or manage.

Learn More

Learn

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Community

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Company

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

 
Knowledgebase:
Linux FileSystem Tuning - Performance (data=writeback) vs DataIntegrity (data=ordered)
30 December 2014 01:28 PM

Summary

MarkLogic recommends the default "ordered" option for Linux ext3 and ext4 file-systems.

File System administrators in Linux are tempted to use the data=writeback option to achieve higher throughput from their file-system, but this comes with the side-effects of potential data corruption and data-secuity breach. This article explains both file system options with respect to MarkLogic Server. 

"data=ordered"

Linux ext3 and ext4 file system has default data option of "ordered", which writes to the main file system before committing to the journal.

https://www.kernel.org/doc/Documentation/filesystems/ext4.txt

https://www.kernel.org/doc/Documentation/filesystems/ext3.txt

Both of these file-system goes the extra mile to protect your files and writes data associated with that meta data by default with data=ordered, thus assuring file-system integrity to application layer - essential for MarkLogic Server data integrity. 

"data=writeback"

Other journaled file systems like XFS and JFS write meta data to the disk;  to make ext3 and ext4 behave like XFS and other journal file system, an administrator could set 'data=writeback' in their mount options.

The 'data=writeback' mode does not preserve data ordering when writing to the disk, so commits to the journal may happen before the data is written to the file system. This method is faster because only the meta data is journaled, but is not good at protecting data integrity in the face of a system failure.

If there is a crash between the time when metadata is commited to the journal and when data is written to disk, the post-recovery metadata can point to incomplete, partially written or incorrect data on disk; which can lead to corrupt data files. Additionally, data which was supposed to be overwritten in the filesystem could be exposed to users - resulting in a security risk.

Linus Torvalds comments on 'data=writeback'

"it makes things much smoother, since now the actual data is no longer in the critical path for any journal writes, but anybody who thinks that's a solution is just incompetent.  We might as well go back to ext2 then. If your data gets written out long after the metadata hit the disk, you are going to hit all kinds of bad issues if the machine ever goes down."   - http://thread.gmane.org/gmane.linux.kernel/811167/focus=811654

 

(21 vote(s))
Helpful
Not helpful

Comments (0)