Knowledgebase:
Database Replication Lag Limit Explained
29 November 2012 01:56 PM

INTRODUCTION

From the documentation:

Queries on a Replica database must run at a timestamp that lags the current cluster commit timestamp due to replication lag. Each forest in a Replica database maintains a special timestamp, called a Non-blocking Timestamp, that indicates the most current time at which it has complete state to answer a query. As the Replica forest receives journal frames from its Master, it acknowledges receipt of each frame and advances its nonblocking timestamp to ensure that queries on the local Replica run at an appropriate timestamp. Replication lag is the difference between the current time on the Master and the time at which the oldest unacknowledged journal frame was queued to be sent to the Replica.

To read more:

http://tinyurl.com/7zwq4l2

SCENARIO

Consider the following customer scenario:

  • The storage the database resides on at one site fails.
  • This requires the customer to run for a period of time on a single site.
  • The storage / MarkLogic server are recovered at the site where the failure occurred.
  • The customer needs to re-establish replication between the two sites

QUESTIONS AND ANSWERS

Q: Should we tune the lag limit to suit our application?

AWe have found in our own performance testing that increasing the lag limit beyond the default is typically not helpful.

When the master has a sustained rate of updates, a large lag limit causes it to run quickly ahead of the replica, then stall for an extended period of time until the replica catches up. This pattern repeats over and over and gives inconsistent performance on the master.

A smaller lag limit causes the master to suspend updates more frequently but for shorter periods of time, resulting in more consistent perceived performance.

Q: Is there any option to restore the replica database to a point in time from a backup of the master database & re-initiate replication from that point onwards?

A: It's fine to restore a backup to the failed system when it comes back online and before configuring replication in the reverse direction.

Q: Is there a limit to how old a backup of the replica database can be (e.g. can a replica be restored from months back in comparison to the master) and will it still sync back to the master without issue? And does this depend on what journal data is available?

A: There is no limit to how old a backup can be; the system will calculate all the deltas and apply them.

Q: Are there any documented API built-ins for any of these things?

A: Indeed; all the replication information is available through a call to xdmp:forest-status()

xdmp:forest-status( 
  xdmp:database-forests( 
    xdmp:database("MyDatabase"), 
    fn:true()))

For further information:

http://tinyurl.com/d6vbpk4

Q: Can you also advise if the replication lag limit mentioned in section 1.2.5 and the related possibility of transactions stalling on the master database applies during the bulk replication phase?

A: As long as the replica's forests are in "open replica" state, the replica will respond to queries at any commit timestamp it is able to support irrespective of whether replication is lagged.

A new feature in MarkLogic 5 is an application server setting for multi-version concurrency control (by default this is set to contemporaneous - meaning it will run from the latest timestamp that any query has committed - irrespective of whether there are still transactions in-flight).

Conversely, if nonblocking is chosen (i.e. if you create an application server to query a replica database and you set multi-version concurrency control to nonblocking), the server will choose the last timestamp where all pending transactions are known to have successfully committed.

If you wish to evaluate a query against a replica database you can use xdmp:database-nonblocking-timestamp() to determine the most current query timestamp that will not block.

(3 vote(s))
Helpful
Not helpful

Comments (0)