How do updates work in MarkLogic Server?
29 November 2012 01:29 PM


A database consists of one or more forests. A forest is a collection of documents (mostly XML trees, thus the name), implemented as a physical directory on disk. Each forest holds a set of documents and all their indexes. 

When a new document is loaded into MarkLogic Server, the server puts this document in an in-memory stand and writes the action to an on-disk journal to maintain transactional integrity in case of system failure. After enough documents are loaded, the in-memory stand will fill up and be flushed to disk, written out as an on-disk stand. As more document are loaded, they go into a new in-memory stand. At some point this in-memory stand fills up as well, and the in-memory stand gets written as yet another new on-disk stand.

To read a single term list, MarkLogic must read the term list data from each individual stand and unify the results. To keep the number of stands to a manageable level where that unification isn't a performance concern, MarkLogic runs merges in the background. A merge takes some of the stands on disk and creates a new singular stand out of them, coalescing and optimizing the indexes and data, as well as removing any previously deleted fragments
Each forest has its own in-memory stand and set of on-disk stands. Loading and indexing content is a largely parallelizable activity so splitting the loading effort across forests and potentially across machines in a cluster can help scale the ingestion work.

Deletions and Multi-Version Concurrency Control (MVCC)

What happens if you delete or change a document? If you delete a document, MarkLogic marks the document as deleted but does not immediately remove it from disk. The deleted document will be removed from query results based on its deletion markings, and the next merge of the stand holding the document will bypass the deleted document when writing the new stand. MarkLogic treats any changed document like a new document, and treats the old version like a deleted document.

This approach is known in database circles as which stands for Multi-Version Concurrency Control (or MVCC).
In an MVCC system changes are tracked with a timestamp number which increments for each transaction as the database changes. Each fragment gets its own creation-time (the timestamp at which it was created) and deletion-time (the timestamp at which it was marked as deleted, starting at infinity for fragments not yet deleted).

For a request that doesn't modify data the system gets a performance boost by skipping the need for any URI locking. The query is viewed as running at a certain timestamp, and throughout its life it sees a consistent view of the database at that timestamp, even as other (update) requests continue forward and change the data.

Updates and Deadlocks

An update request, because it isn't read-only, has to use read/write locks to maintain system integrity while making changes. Read-locks block for write-locks; write-locks block for both read and write-locks. An update has to obtain a read-lock before reading a document and a write-lock before changing (adding, deleting, modifying) a document. Lock acquisition is ordered, first-come first-served, and locks are released automatically at the end of a request.

In any lock-based system you have to worry about deadlocks, where two or more updates are stalled waiting on locks held by the other. In MarkLogic deadlocks are automatically detected with a background thread. When the deadlock happens on the same host in a cluster, the update farthest along (with the most locks) wins and the other update gets restarted. When it happens on different hosts, because lock count information isn't in the wire protocol, both updates start over. MarkLogic differentiates queries from updates using static analysis. Before running a request, it looks at the code to determine if it includes any calls to update functions. If so, it's an update. If not, it's a query. Even if at execution time the update doesn't actually invoke the updating function, it still runs as an update.

For the most part it's not under the control of the user. The one exception is there's an xdmp:lock-for-update($uri) call that requests a write-lock on a document URI, without actually having to issue a write and in fact without the URI even having to exist.

When a request potentially touches millions of documents (such as sorting a large data set to find the most recent items), a query request that runs lock-free will outperform an update request that needs to acquire read-locks and writelocks. In some cases you can speed up the query work by isolating the update work to its own transactional context. This technique only works if the update doesn't have a dependency on the outer query, but that turns out to be a common case. For example, let's say you want to execute a content search and record the user's search string to the database for tracking purposes. The database update doesn't need to be in the same transactional context as the search itself, and would slow things down if it were. In this case it's better to run the search in one context (read-only and lock-free) and the update in a different context. See the xdmp:eval() and xdmp:invoke() functions for documentation on how to invoke a request from within another request and manage the transactional contexts between the two.

Document Lifecycle

Let's track the lifecycle of a document from first load to deletion until the eventual removal from disk. A document load request acquires a write-lock for the target URI as part of the xdmp:document-load() function call. If any other request is already doing a write to the same URI, our load will block for it, and vice versa. At some point, when the full update request completes successfully (without any errors that would implicitly cause a rollback), the actual insertion work begins, processing the queue of update work orders. MarkLogic starts by parsing and indexing the document contents, converting the document from XML to a compressed binary fragment representation. The fragment gets added to the in-memory stand. At this point the fragment is considered a nascent fragment, a term you'll see sometimes on the administration console status pages. Being nascent means it exists in a stand but hasn't been fully committed. (On a technical level, nascent fragments have creation and deletion timestamps both set to infinity, so they can be managed by the system while not appearing in queries prematurely.) If you're doing a large transactional insert you'll accumulate a lot of nascent fragments while the documents are being processed. They stay nascent until they've been committed. Once the fragment is placed into the in-memory stand, the request is ready to commit. It obtains the next timestamp value, journals its intent to commit the transaction, and then makes the fragment available by setting the creation timestamp for the new fragment to the transaction's timestamp. At this point it's a durable transaction, replayable in event of server failure, and it's available to any new queries that run at this timestamp or later, as well as any updates from this point forward (even those in progress). As the request terminates, the write-lock gets released.

Our document lives for a time in the in-memory stand, fully queryable and durable, until at some point the in-memory stand fills up and gets written to disk. Our document is now in an on-disk stand. Sometime later, based on merge algorithms, the on-disk stand will get merged with some other on-disk stands to produce a new on-disk stand. The fragment will be carried over, its tree data and indexes incorporated into the larger stand. This might happen several times.

At some point a new request makes a change to the document, such as with an xdmp:node-replace() call. The request making the change first obtains a read-lock on the URI when it first accesses the document, then promotes the read-lock to a write-lock when executing the xdmp:node-replace() call. If another write-lock were already present on the URI from another executing update, the read-lock would have blocked until the other write-lock released. If another read-lock were already present, the lock promotion to a write-lock would have blocked. Assuming the update request finishes successfully, the work runs similar to before: parsing and indexing the document, writing it to the in-memory stand as a nascent fragment, acquiring a timestamp, journaling the work, and setting the creation timestamp to make the fragment live. Because it's an update, it has to mark the old fragment as deleted also, and does that by setting the deletion timestamp of the original fragment to the transaction timestamp. This combination effectively replaces the old fragment with the new. When the request concludes, it releases its locks. Our document is now deleted, replaced by the new version.

The old fragment still exists on disk, of course. In fact, any query that was already in progress before the update incremented the timestamp, or any query doing time travel with an old timestamp, can still see it. Eventually the on-disk stand holding the fragment will be merged again, at which point the old fragment will be completely removed from the system. It won't be written into the new on-disk stand. That is, unless the administration "merge timestamp" was set to allow deep time travel. In that case it will live on, sticking around in case any new queries want to time travel to see old fragments.

(12 vote(s))
Not helpful

Comments (0)