Community

MarkLogic 10 and Data Hub 5.0

Latest MarkLogic releases provide a smarter, simpler, and more secure way to integrate data.

Read Blog →

Company

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up →

 
Knowledgebase:
Read Only Queries Run at a Timestamp & Update Transactions use Locks
22 March 2017 03:54 PM

Overview

Update transactions run with readers/writers locks, obtaining locks as needed for documents accessed in the transaction. Because update transactions only obtain locks as needed, update statements always see the latest version of a document. The view is still consistent for any given document from the time the document is locked. Once a document is locked, any update statements in other transactions wait for the lock to be released before updating the document.

Read only query transactions run at a particular system timestamp, instead of acquiring locks, and have a read-consistent view of the database. That is, the query transaction runs at a point in time where all documents are in a consistent state.

The system timestamp is a number maintained by MarkLogic Server that increases every time a change or a set of changes occurs in any of the databases in a system (including configuration changes from any host in a cluster). Each fragment stored in a database has system timestamps associated with it to determine the range of timestamps during which the fragment is valid.

On a clustered system where there are multiple hosts, the timestamps need to be coordinated accross all hosts. Marklogic Server does this by passing the timestamp in every message communicated between hosts of the cluster, including the heartbeat message. Typically, the message carries two important pieces of information:

  • The origin host id
  • The precise time on the host at the time that heartbeat took place

In addition to the heartbeat information, the "Label" file for each forest in the database is written as changes are made. The Label file also contains timestamp information; this is what each host uses to ascertain the current "view" of the data at a given moment in time. This technique is what allows queries to be executed at a 'point in time' to give insight into the data within a forest at that moment.

You can learn more about transactions in MarkLogic Server by reading the Understanding Transactions in MarkLogic Server section of the MarkLogic Server Application Developers Guide.

The distribute timestamps option on Application Server can specify how the latest timestamp is distributed after updates. This affects performance of updates and the timeliness of read-after-write query results from other hosts in the group.

When set to fast, updates return as quickly as possible. No special timestamp notification messages are broadcasted to other hosts. Instead, timestamps are distributed to other hosts when any other message is sent. The maximum amount of time that could pass before other hosts see the update timestamp is one second, because a heartbeat message is sent to other hosts every second.

When set to strict, updates immediately broadcast timestamp notification messages to every other host in the group. Updates do not return until their timestamp has been distributed. This ensures timeliness of read-after-write query results from other hosts in the group.

When set to cluster, updates immediately broadcast timestamp notification messages to every other host in the cluster. Updates do not return until their timestamp has been distributed. This ensures timeliness of read-after-write query results from any host in the cluster, so requests made to any app server on any host in the cluster will see immediately consistent results.

The default value for "distribute timestamps" option is fast. The remainder of this article is applicable when fast mode is used.

Read after Write in Fast Mode

We will look at the different scenario for the case where a read occurs in a transaction immediately following an update transaction.

  • If the read transaction is executed against an application server on the same node of the cluster (or any node that participated in the update) then the read will execute at a timestamp equal to or greater than the time that the update occurred.
  • If the read is executed in the context of an update transaction, then, by acquiring locks, the view of the documents will be the latest version of the documents.
  • If the read is executed in a query transaction, then the query will execute at the latest timestamp that the host on which it was executed is aware of. Although this will always produce a transactionally consistent view of the database, it may not return the latest updates. The remainder of this article addresses this case.

Consider the following code: