Solutions

MarkLogic Data Hub Service

Fast data integration + improved data governance and security, with no infrastructure to buy or manage.

Learn More

Learn

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Community

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Company

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

 
Knowledgebase:
Fast vs Strict Locking
14 September 2015 04:17 PM

Introduction

The Performance Considerations section of the Loading Content Into MarkLogic Server documentation states 

"When you load content, MarkLogic Server performs updates transactionally, locking documents as needed and saving the content to disk in the journal before the transaction commits. By default, all documents are locked during an update and the journal is set to preserve committed transactions, even if the MarkLogic Server process ends unexpectedly."

There are two types of locking which are specified at the database level:

  • Fast locking employs a hashed locking scheme (based on the URI) where each fragment URI has a designated forest, so the lock created during the insert is restricted only to that forest.
  • Setting up a database with "strict" locking will force the coordination of an update lock across all forests in the database (and across the cluster) until the insert has taken place.

Fast locking has been the default setting for newly created MarkLogic databases since MarkLogic 5 (released October 2011)

When should I use strict locking?

If at any point in your code, you are specifying the forest to insert document or fragment into (using a technique commonly referred to as in-forest evaluation), configuring the setting for that database at "strict" is definitely the safest choice. If your code always allows the server to determine the target forest for the document/fragment, you're perfectly safe using fast locking.

In the situation where two different people create the same document (with the same URI) and where fast locking was taking place, this would result in:

  • A transaction culminating in an insert into a given forest (as assigned by the ML node servicing the request) for the first fragment
  • An "update" transaction (in the same forest) where the first fragment is then marked as deleted
  • A new fragment takes place of the first fragment to complete the second transaction

Subsequent merges would then remove the stand entry for the first fragment (now deleted/replaced by the subsequent transaction)

The fast option would not create a dangerous race condition unless your application would allow two different people to insert a document with the same URI into two different forests as two separate transactions and where URI assignment is handled by your XQuery/application layer; if the code responsible for making those transactions were to inadvertently assign the same URI to two different forests in a cluster, this could cause a problem that strict locking would guard against. If your application always allows MarkLogic to assign the forest for the document, there is no danger whatsoever in keeping to the server default of "fast" locking.

Additionally - consider what kind of failover you system is using. When using fast journaling with local disk replication, the journal disk write needs to fail on both master and replica nodes in order for data loss to occur - so there's no need for strict in this scenario. In contrast, strict journaling should be used with shared-disk failover, as data loss is possible if using fast journaling and a single node fails before the OS flushes the buffer to disk.

Is there a performance implication in switching to strict locking?

Fast locking will be faster than strict locking, but the performance penalty is largely going to be dependent on a number of factors; the number of forests in a given database, the number of nodes across which the database forests are spread and the speed at which all nodes in the cluster can coordinate a transaction across the cluster (Network/IO) will all have some (potentially minimal) impact.

If the conditions of your application suit, we recommend staying with the default of fast locking on all your databases.

There may be reasons for using 'strict' locking - especially if you are considering loading documents using in-forest-evaluation in your code.

Further reading

https://docs.marklogic.com/guide/ingestion/performance

(1 vote(s))
Helpful
Not helpful

Comments (0)