Solutions

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Learn

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Community

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Company

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

 
Knowledgebase:
Read Only Queries Run at a Timestamp & Update Transactions use Locks
22 March 2017 03:54 PM

Overview

Update transactions run with readers/writers locks, obtaining locks as needed for documents accessed in the transaction. Because update transactions only obtain locks as needed, update statements always see the latest version of a document. The view is still consistent for any given document from the time the document is locked. Once a document is locked, any update statements in other transactions wait for the lock to be released before updating the document.

Read only query transactions run at a particular system timestamp, instead of acquiring locks, and have a read-consistent view of the database. That is, the query transaction runs at a point in time where all documents are in a consistent state.

The system timestamp is a number maintained by MarkLogic Server that increases every time a change or a set of changes occurs in any of the databases in a system (including configuration changes from any host in a cluster). Each fragment stored in a database has system timestamps associated with it to determine the range of timestamps during which the fragment is valid.

On a clustered system where there are multiple hosts, the timestamps need to be coordinated accross all hosts. Marklogic Server does this by passing the timestamp in every message communicated between hosts of the cluster, including the heartbeat message. Typically, the message carries two important pieces of information:

  • The origin host id
  • The precise time on the host at the time that heartbeat took place

In addition to the heartbeat information, the "Label" file for each forest in the database is written as changes are made. The Label file also contains timestamp information; this is what each host uses to ascertain the current "view" of the data at a given moment in time. This technique is what allows queries to be executed at a 'point in time' to give insight into the data within a forest at that moment.

You can learn more about transactions in MarkLogic Server by reading the Understanding Transactions in MarkLogic Server section of the MarkLogic Server Application Developers Guide.

The distribute timestamps option on Application Server can specify how the latest timestamp is distributed after updates. This affects performance of updates and the timeliness of read-after-write query results from other hosts in the group.

When set to fast, updates return as quickly as possible. No special timestamp notification messages are broadcasted to other hosts. Instead, timestamps are distributed to other hosts when any other message is sent. The maximum amount of time that could pass before other hosts see the update timestamp is one second, because a heartbeat message is sent to other hosts every second.

When set to strict, updates immediately broadcast timestamp notification messages to every other host in the group. Updates do not return until their timestamp has been distributed. This ensures timeliness of read-after-write query results from other hosts in the group.

When set to cluster, updates immediately broadcast timestamp notification messages to every other host in the cluster. Updates do not return until their timestamp has been distributed. This ensures timeliness of read-after-write query results from any host in the cluster, so requests made to any app server on any host in the cluster will see immediately consistent results.

The default value for "distribute timestamps" option is fast. The remainder of this article is applicable when fast mode is used.

Read after Write in Fast Mode

We will look at the different scenario for the case where a read occurs in a transaction immediately following an update transaction.

  • If the read transaction is executed against an application server on the same node of the cluster (or any node that participated in the update) then the read will execute at a timestamp equal to or greater than the time that the update occurred.
  • If the read is executed in the context of an update transaction, then, by acquiring locks, the view of the documents will be the latest version of the documents.
  • If the read is executed in a query transaction, then the query will execute at the latest timestamp that the host on which it was executed is aware of. Although this will always produce a transactionally consistent view of the database, it may not return the latest updates. The remainder of this article addresses this case.

Consider the following code:

The above example performs the following steps:

  • Instantiates two XCC ContentSource Objects - each connecting to a different host in the cluster.
  • Establishes a short loop (which runs the enclosed steps 10 times)
    • Creates a unique UUID which is used as a URI for the Document
    • Establishes a session with the first host in the cluster and performs he following:
      • Gets the timestamp (session.getCurrentServerPointInTime()) and writes it out to the console / stdout
      • Inserts a simple, single element () as a document-node into a given database
      • Gets the timestamp again and writes it out to the console / stdout
    • The session with the first host is then closed. A new session is established with the second host and the following steps are performed:
      • Gets the timestamp at the start of the session and writes it out to the console / stdout
      • An attempt is made to retrieve the document which was just inserted
    • On success the second session will be closed.
    • If the document could not be read successfully, an immediate retry attempt follows thereafter - which will result a successful retrieval.

Running this test will yield one of two results for each iteration of the loop:

Query Transaction at Timestamp that includes Update

Most of the time, you will find that the timestamps will be in lockstep with the host before - note that there is no time difference between the output from getCurrentServerPointInTime() after the document has been inserted and before the attempt is made to retrieve the document from the connection to the second host in the cluster.

----------------- START OF INSERT / READ CYCLE (1) -----------------
First host timestamp before document is inserted: 	13673327800295300
First host timestamp after document is inserted: 	13673328229180040
Second host timestamp before document is read: 	13673328229180040
------------------ END OF INSERT / READ CYCLE (1) ------------------

However, you may also see this:

----------------- START OF INSERT / READ CYCLE (10) -----------------
First host timestamp before document is inserted: 	13673328311216780
First host timestamp after document is inserted: 	13673328322546380
Second host timestamp before document is read: 	13673328311216780
------------------ END OF INSERT / READ CYCLE (10) ------------------

Note that on this run, the timestamps are out of sync; at the point where getCurrentServerPointInTime() is called, the timestamp for the second connection is at that point just before the document is inserted.

Yet this also returns results that include the updates; in the interval between the timestamp being written to the console and the construction and submission of the newAdhocQuery(), the document has become available and was successfully retrieved during the read process.

The path with an immediate retry

Now let's explore what happens when the read only query transaction runs at a point in time that does not include the updates:

----------------- START OF INSERT / READ CYCLE (2) -----------------
First host timestamp before document is inserted: 	13673328229180040
First host timestamp after document is inserted: 	13673328240679460
Second host timestamp before document is read: 		13673328229180040
WARNING: Immediate read failed; performing an immediate retry
Second host timestamp for read retry: 		13673328240679460
Result Sequence below:
<?xml version="1.0" encoding="UTF-8"?>
<ok/>
------------------ END OF INSERT / READ CYCLE (2) ------------------

Note that on this occasion, we see an outcome that starts much like the previous example; the timestamps mismatch and we see that we've hit the point in the code where our validation of the response fails.

Also note that the timestamp at the point where the retry takes place is now back in step; from this, we can see that the document should be available even before the retry request is executed. Under these conditions, the response (the result) is also written to stdout so we can be sure the document was available on this attempt.

Multi Version Concurrency Control

In order to gurarantee that the "holistic" view of the data is current  and available in a read only query transaction across each host in the cluster, two things need to take place:

  • All forests need to be up-to-date and all pending transactions need to be committed.
  • Each host must be in complete agreement as to the 'last known good' (safest) timestamp from which the query can be allowed to take place.

In all situations, to ensure a complete (and reliable) view of the data, the read only query transaction must take place at the lowest known timestamp across the cluster

With every message between nodes in the cluster, the latest timestamp information is communicated across each host in the cluster - the first "failed" attempt to read the document necessitates communication between each host in the cluster - and by doing so, this action propagates a new "agreed" timestamp across every node in the cluster.

It is because of this, the retry will always work; at the point where the immediate read after write fails, timestamp changes are propagated, and the new timestamp is now at a waypoint for the retry query to take place. This is why the single retry is always guaranteed to work.

(7 vote(s))
Helpful
Not helpful

Comments (0)