Rebalancing, replication and forest reordering
02 September 2020 10:07 PM
This KB article talks specifically about how the Rebalancer interacts with database replication, and how to solve the issues that may arise if not configured correctly.
For a general discussion on how rebalancing works in MarkLogic, refer to this article and the server documentation.
Rebalancing and Database Replication
When database replication is configured for a database, rebalancing is disabled by default on the Replica database and no rebalancing will occur until the database replication configuration is deleted. Until the time when the primary is available, forest to forest mapping will remain.
Note that the Replica databases must have at least as many forests as the Master database. Otherwise, not all of the data on the Master database will be replicated.
It is important to make sure that the assignment policy on the Replica is the same as the Master - so that in a DR situation, when the Replica takes over as the Primary, rebalancing is not triggered.
Forest order mismatch can cause Rebalancing
Forest order is the order in which forests are attached to the database. When the document assignment policy is set to 'Segment', 'Legacy' or 'Bucket', it is required that the Replica database configuration should have the same forest order as the Master to ensure rebalancing does not occur if or when replication is deconfigured.
If there is a difference in forest orders between the Master and the Replica, a Warning level message is logged on the Replica, which looks like this:
In this state, when database replication is deleted between the clusters, the database on the Replica cluster will start to rebalance right away and it could take variable amount of time depending on how many documents need to be rebalanced.
Fixing the forest order:
On clusters with database replication enabled and both Master and Replica databases in sync (document counts match and all primary forests on Replica db are in 'open replica' state), the following steps help in removing the mismatch and making the forest order same on both Master and Replica
i. Make sure that both Master and Replica databases have the same rebalancer assignment policy.
ii. Disable rebalancer and reindexer, if you have them enabled on both clusters for the database in question.
iii. Obtain the forest order from the Master cluster - below is the query for an example database:
Example output for this query is
iv. On the Replica cluster, reorder the forests according to the order returned on the Master from step iii:
v. Re-enable rebalancer and reindexer on both clusters, if you had them enabled previously.
vi.Verify that the Warning messages on the Replica cluster do not appear anymore. (these messages are logged once every hour)
Understanding what work the rebalancer will do
Using the rebalancer to move the content in one forest to another location