Read Only Queries Run at a Timestamp & Update Transactions use Locks
12 May 2020 10:40 AM
Update transactions run with readers/writers locks, obtaining locks as needed for documents accessed in the transaction. Because update transactions only obtain locks as needed, update statements always see the latest version of a document. The view is still consistent for any given document from the time the document is locked. Once a document is locked, any update statements in other transactions wait for the lock to be released before updating the document.
Read only query transactions run at a particular system timestamp, instead of acquiring locks, and have a read-consistent view of the database. That is, the query transaction runs at a point in time where all documents are in a consistent state.
The system timestamp is a number maintained by MarkLogic Server that increases every time a change or a set of changes occurs in any of the databases in a system (including configuration changes from any host in a cluster). Each fragment stored in a database has system timestamps associated with it to determine the range of timestamps during which the fragment is valid.
On a clustered system where there are multiple hosts, the timestamps need to be coordinated accross all hosts. Marklogic Server does this by passing the timestamp in every message communicated between hosts of the cluster, including the heartbeat message. Typically, the message carries two important pieces of information:
In addition to the heartbeat information, the "Label" file for each forest in the database is written as changes are made. The Label file also contains timestamp information; this is what each host uses to ascertain the current "view" of the data at a given moment in time. This technique is what allows queries to be executed at a 'point in time' to give insight into the data within a forest at that moment.
You can learn more about transactions in MarkLogic Server by reading the Understanding Transactions in MarkLogic Server section of the MarkLogic Server Application Developers Guide.
The distribute timestamps option on Application Server can specify how the latest timestamp is distributed after updates. This affects performance of updates and the timeliness of read-after-write query results from other hosts in the group.
When set to fast, updates return as quickly as possible. No special timestamp notification messages are broadcasted to other hosts. Instead, timestamps are distributed to other hosts when any other message is sent. The maximum amount of time that could pass before other hosts see the update timestamp is one second, because a heartbeat message is sent to other hosts every second.
When set to strict, updates immediately broadcast timestamp notification messages to every other host in the group. Updates do not return until their timestamp has been distributed. This ensures timeliness of read-after-write query results from other hosts in the group.
When set to cluster, updates immediately broadcast timestamp notification messages to every other host in the cluster. Updates do not return until their timestamp has been distributed. This ensures timeliness of read-after-write query results from any host in the cluster, so requests made to any app server on any host in the cluster will see immediately consistent results.
The default value for "distribute timestamps" option is fast. The remainder of this article is applicable when fast mode is used.
Read after Write in Fast Mode
We will look at the different scenario for the case where a read occurs in a transaction immediately following an update transaction.
Consider the following code:
The above example performs the following steps:
Running this test will yield one of two results for each iteration of the loop:
Query Transaction at Timestamp that includes Update
Most of the time, you will find that the timestamps will be in lockstep with the host before - note that there is no time difference between the output from getCurrentServerPointInTime() after the document has been inserted and before the attempt is made to retrieve the document from the connection to the second host in the cluster.
----------------- START OF INSERT / READ CYCLE (1) ----------------- First host timestamp before document is inserted: 13673327800295300 First host timestamp after document is inserted: 13673328229180040 Second host timestamp before document is read: 13673328229180040 ------------------ END OF INSERT / READ CYCLE (1) ------------------
However, you may also see this:
----------------- START OF INSERT / READ CYCLE (10) ----------------- First host timestamp before document is inserted: 13673328311216780 First host timestamp after document is inserted: 13673328322546380 Second host timestamp before document is read: 13673328311216780 ------------------ END OF INSERT / READ CYCLE (10) ------------------
Note that on this run, the timestamps are out of sync; at the point where getCurrentServerPointInTime() is called, the timestamp for the second connection is at that point just before the document is inserted.
Yet this also returns results that include the updates; in the interval between the timestamp being written to the console and the construction and submission of the newAdhocQuery(), the document has become available and was successfully retrieved during the read process.
The path with an immediate retry
Now let's explore what happens when the read only query transaction runs at a point in time that does not include the updates:
----------------- START OF INSERT / READ CYCLE (2) ----------------- First host timestamp before document is inserted: 13673328229180040 First host timestamp after document is inserted: 13673328240679460 Second host timestamp before document is read: 13673328229180040 WARNING: Immediate read failed; performing an immediate retry Second host timestamp for read retry: 13673328240679460 Result Sequence below: <?xml version="1.0" encoding="UTF-8"?> <ok/> ------------------ END OF INSERT / READ CYCLE (2) ------------------
Note that on this occasion, we see an outcome that starts much like the previous example; the timestamps mismatch and we see that we've hit the point in the code where our validation of the response fails.
Also note that the timestamp at the point where the retry takes place is now back in step; from this, we can see that the document should be available even before the retry request is executed. Under these conditions, the response (the result) is also written to stdout so we can be sure the document was available on this attempt.
Multi Version Concurrency Control
In order to gurarantee that the "holistic" view of the data is current and available in a read only query transaction across each host in the cluster, two things need to take place:
In all situations, to ensure a complete (and reliable) view of the data, the read only query transaction must take place at the lowest known timestamp across the cluster
With every message between nodes in the cluster, the latest timestamp information is communicated across each host in the cluster - the first "failed" attempt to read the document necessitates communication between each host in the cluster - and by doing so, this action propagates a new "agreed" timestamp across every node in the cluster.
It is because of this, the retry will always work; at the point where the immediate read after write fails, timestamp changes are propagated, and the new timestamp is now at a waypoint for the retry query to take place. This is why the single retry is always guaranteed to work.