Community

MarkLogic 10 and Data Hub 5.0

Latest MarkLogic releases provide a smarter, simpler, and more secure way to integrate data.

Read Blog →

Company

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up →

 
Knowledgebase: Administration
Database Replication Indexing on Replica Explained
04 May 2020 03:04 PM

Introduction

Prior to MarkLogic 9.0-7

For optimum efficiency, indexing information was not replicated over the network between the Master and Replica databases and is instead regenerated by the Replica database. If you want the option to switch over to the Replica database after a disaster and have queries that perform as expected, your index settings must be identical on both the Master and Replica clusters. View Documentation on Master and Replica Database Index Settings.

If you need to update index settings after configuring Database Replication, it is recommended that you update the settings on the Replica database before updating those on the Master database; this is because changes to the index settings on the Replica database only affect newly replicated documents and will not trigger reindexing on existing documents.

After MarkLogic 9.0-7 +

In recent versions of MarkLogic, we have further improved synchronization between Master and Replica, and now Index data is automatically replicated from the master. There also is a function to verify and optionally repair replica index data automatically. xdmp:forest-validate-replica-index automatically validates and optionally repairs replica index data to match the master index data. 

If you need to update index settings after configuring Database Replication, make sure they are updated on both the Master and Replica databases.

Note - For both above versions, index configuration is not automatically replicated. It is still the responsibility of the database administrator to ensure the replica index configuration matches the master index configuration. It is your responsibility to ensure the clusters are kept in sync regarding index settings.

Changes to the index settings on the Master database will trigger reindexing, after which the reindexed documents will be replicated to the Replica.

When a Database Replication configuration is removed for the Replica database (such as after a disaster), the Replica database will reindex, if necessary.

The MarkLogic Server Database Replication Guide contains additional information if you would like to learn more about this feature in the product.

Negative impact of reindexing replica cluster

If a replica database is reindexed after decoupled from the master and then re-coupled at a later time:

  1. When the databases are reconnected, every reindexed document in the database will be replicated during the bulk synchronization process.  Internally, MarkLogic Server generates a "document id” for each document at reindex time. The "document id" is random (i.e. not deterministic); When database replication bulk synchronization occurs, the manifests from master and replica will have different "document ids" for same document, resulting in the document being replicated.
  2. Because of #1, during bulk replication, there will be transient duplicate URIs on the replica cluster as the document deletes will not be coordinated with the document inserts.  If the replica database is used for (read only) queries, then this could result in query errors.    Once the databases complete the asynchronous bulk replication process, the replica database will be in a good state again. 

Additional Recommendations

  1. If a new index is added to the database replica cluster, add a verification step to make sure the indexes are available on the replica cluster once reindex is complete on the master. This will allow you to avoid any index usage issues after DR failover - saving time to bring the replica cluster up to a usable state. 
  2. Document and practice DR failover to optimize the procedure. 

Summary

The MarkLogic DBA is responsible for keeping the index settings the same across both the Master and Replica hosts. We recommend using the Admin API to script configuration settings for your databases and to store these configuration scripts in a version control system such as Subversion or Git.

Questions

Q: If the replica was to be used for query resolution and required the index to be applied on the Master, although the data had replicated to the Replica, would the query be able to use the new indexed data with the config option not enabled?

A: Yes - here's a simple test to demonstrate this working:

  1. Create some sample data on the master database
    for $i in (1 to 100) 
    return xdmp:document-insert(fn:concat("/",$i,".xml"), element index {$i})
  2. Confirm that replication is taking place across both databases.
  3. On the replica host, create an integer element range index on the element.
  4. Confirm that a cts:element-values query on the replica returns an XDMP-ELEMRIDXNOTFOUND error:
    cts:element-values(xs:QName("index"))
  5. Create the element range index on the master
  6. Confirm that both environments can now run the query successfully:
    1 2 3 [...] 100
(4 vote(s))
Helpful
Not helpful

Comments (0)