Solutions

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Learn

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Community

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Company

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

 
Knowledgebase:
MarkLogic Server Isolation Levels
11 January 2016 04:42 PM

Introduction

MarkLogic automatically provides 

  • ANSI REPEATABLE READ level of isolation for update transactions, and 
  • Serializable isolation for read-only (query) transactions.

MarkLogic can be made to provide ANSI SERIALIZABLE isolation for update transactions, but doing so requires developers to manage their own predicate locks.

Isolation Levels - Background

There are many possible levels of isolation, and many different taxonomies of isolation levels. The most common taxonomy (familiar to those with a RDBMS background) is the one defined by ANSI SQL, which defines four levels of isolation based on read phenomena that are possible at each level. ANSI has a definition for each phenomenon, but these definitions are open to interpretation. Broad interpretation results in more rigorous criteria for each isolation level (and therefore better isolation at each level), whereas strict interpretation results in less rigorous isolation at each level. Here I’ll use a shorthand notation to describe these phenomena, and will use the broad rather than the strict interpretation. The notation specifies the operation, the transaction performing the operation, and the item or domain on which the operation is performed. Operations in my notation are:

  • Write (w)
  • Read (r)
  • Commit (c)
  • Abort/rollback (a)

An example of this shorthand: w1[x] means transaction1 writes to item x.

Now the phenomena:

  • A dirty read happens when a transaction T2 reads an item that is being written by concurrently running transaction T1. In other words: w1[x]…r2[x]…((c1 or a1) and (c2 or a2) in any order). This phenomenon could lead to an anomaly in the case where T1 later aborts, and T2 has then read a value that never existed in the database.
  •  A non-repeatable read happens when a transaction T2 writes an item that was read by a transaction T1 prior to T1 completing. In other words: r1[x]…w2[x]…((c1 or a1) and (c2 or a2) in any order). Non-repeatable reads don’t produce the same anomalies as dirty reads, but can produce errors in cases where T1 relies on the value of x not changing between statements in a multi-statement transaction (e.g. reading and then updating a bank account balance).
  • A phantom read happens when a transaction T1 retrieves a set of data items matching some search condition and concurrently running transaction T2 makes a change that modifies the set of items that match that condition. In other words: (r1[P] and w2[x in P] in any order)…((c1 or a1) and (c2 or a2) in any order), where P is a set of results. Phantom reads are usually less serious than dirty or non-repeatable reads because it generally doesn’t matter if item x in P is written before or after T1 finishes unless T1 is itself explicitly reading x. And in this case the phenomenon would no longer be a phantom, but would instead be a dirty or non-repeatable read per the definitions above. That said, there are some cases where phantom reads are important.

 The isolation levels ANSI defines are based on which of these three phenomena are possible at that isolation level. They are:

  • READ UNCOMMITTED – all three phenomena are possible at this isolation level.
  • READ COMMITTED – Dirty reads are not possible, but non-repeatable and phantom reads are.
  • REPEATABLE READ – Dirty and non-repeatable reads are not possible, but phantom reads are.
  • SERIALIZABLE – None of the three phenomena are possible at this isolation level.

Note that as defined above, ANSI SERIALIZABLE is not sufficient for transactions to be truly serializable (in the sense that running them concurrently and running them in series would in all cases produce the same result), so SERIALIZABLE is an unfortunate choice of names for this isolation level, but that’s what ANSI called it.

Update Transaction Locks

Typically, a DBMS will avoid dirty and non-repeatable reads by taking locks on records (called item locks). Locks are either shared locks (which can be held by more than one transaction) or exclusive locks (which can be held by only one transaction at a time). In most DBMSes (including MarkLogic), locks taken when reading an item are shared and locks taken when writing an item are exclusive.

MarkLogic prevents dirty and non-repeatable reads in update transactions by taking item locks on items that are being read or written during a transaction and releasing those locks only on completion of the transaction (post-commit or post-abort). When a transaction needs to lock an item on which another transaction has an exclusive lock, that transaction waits until either the lock is released or the transaction times out. Deadlock detection prevents cases where two transactions are waiting on each other for exclusive locks. In this case one of the transactions will abort and restart.

In addition, MarkLogic prevents some types of phantom reads by taking item locks on the set of items in a search result. This prevents phantom reads involving T2 removing an item in a set that T1 previously searched, but does not prevent phantom reads involving T2 inserting an item in a set that T1 previously searched, or those involving T2 searching for items and seeing a deletion caused by T1.

Avoiding All Phantom Reads

To avoid all phantom reads via locking, it is necessary to take locks not just on items that currently match the search criteria, but also on all items that could match the search criteria, whether they currently exist in the database or not. Such locks are called predicate locks. Because you can search for pretty-much anything in MarkLogic, guaranteeing a predicate lock for arbitrary searches would require locking the entire database. From a concurrency and throughput perspective, this is obviously not desirable. MarkLogic therefore leaves the decision to take predicate locks and the scope of those locks in the hands of application developers. Because the predicate domain can frequently be narrowed down with some application-specific knowledge, this provides the best balance between isolation and concurrency. To take a predicate lock, you lock a synthetic URI representing the predicate domain in every transaction that reads from or writes to that domain. You can take shared locks on a synthetic URI via fn:doc(URI). Exclusive locks are taken via xdmp:lock-for-update(URI).

Note that predicate locks should only be taken in situations where phantom reads are intolerable. If your application can get by with REPEATABLE READ isolation, you should not take predicate locks, because any additional locking results in additional serialization and will impact performance.

Summary

To summarize, MarkLogic automatically provides ANSI REPEATABLE READ level of isolation for update transactions and true serializable isolation for read-only (query) transactions. MarkLogic can be made to provide ANSI SERIALIZABLE isolation for update transactions, but doing so requires developers to manage their own predicate locks.

(2 vote(s))
Helpful
Not helpful

Comments (0)