Moving Forests Across Storage Devices
27 April 2016 02:14 PM
Since the time this article was originally written, MarkLogic included Forest Rebalancing and Forest Retiring Features in the more recent versions of MarkLogic Server. For zero downtime movement of forests, please refer to our documentation for these features - http://docs.marklogic.com/guide/admin/database-rebalancing.
The legacy Article follows:
There are many reasons why you may need to move a forest from one storage device to another. For example:
No matter what the reason, the action of moving forests should be well planned and deliberate, while the procedure should be well tested. This article lists both the steps that should be followed as well as issues to be considered when planning a move.
We will present two different techniques for moving a forest. The first being appropriate for databases that the can be restricted from updates for the duration of the forest move. The second being appropriate for production databases where database downtime needs to be minimized.
Simple Procedure to Move a Forest
The simple procedure to move a forest can be used on any forest whose database can be restricted from updates for the duration of the process. This will typically be for test, development and staging systems, but may also include production environments that can be disabled for extended maintenance windows.
To retain data integrity, this procedure requires that the associated database is restricted from updates. The update restriction can be enforced in a variety of ways:
The following steps can be used to move a forest:
Step 1: Begin enforcement of update restriction;
Step 2: Create a backup of the forest you would like to move;
Step 3: Create a new forest, specifying the new location for the forest;
Step 4: Restore the forest data from step 2 to the newly created forest;
Step 5: Verify that the forest data is restored successfully;
Step 6: Switch forests attached to the database;
a. Detach the original forest from the database;
b. Attach the new forest to the database;
Step 7: Remove update restriction (from step 1);
Step 8: (Optional) Remove/delete the original forest.
Moving a Forest Minimizing Downtime
The following steps can be used to move a forest while minimizing downtime:
Step 1: Create a new forest, specifying the new location for the forest.
Step 2: (Optional) Seed the new forest from backup. Although we will be using the local disk failover feature to synchronize forest content, seeding the new forest from a recent backup will result in faster synchronization and will use less resources (i.e. less disruptive to the production system)
Step 3: If you do not have a recent forest backup of the forest you would like to move, create one.
Step 4: Perform a forest level restore to the newly created forest.
Step 5: Configure the new forest as a forest replica of the original forest.
Step 5: Wait until the Forest is in the “sync replicating” state. You can use the Admin UI Forest status page to check for sync replicating.
Step 6: Switch forests: This step requires that the database is OFFLINE for a short period of time.
a. Detach the original forest from the database;
b. Remove the forest replica configuration created in step 5;
c. Attach the new forest to the database ;
Step 7: (Optional) Remove/delete the original forest.
Retaining Forest Name
Both forest move procedures presented require the new forest to have a different name than the original because forest names must be unique within a MarkLogic Server cluster and both procedures have the original and new forests existing in the system at the same time. Although rare, some applications have forest name dependencies (i.e. applications that perform in-forest query evaluations or in-forest placement of document inserts). If this is the case, you will either need to update your application, or change the method used to move the forest (since MarkLogic Server does not provide a mechanism to change the name of a forest).
Forest Replicas and Failover Hosts
If the original forest has ‘forest replicas’ or ‘failover hosts’ configured, you will need to detach these configurations before you can delete the original forest.
If you would like the new forest to be configured with ‘forest replicas’ of ‘failover hosts’, you must first detach these configurations from the original forest before reattaching them to the new forest.
The majority of the time will be spent transferring content from the original forest to the new forest. You can estimate the amount of time this will take from
Estimate time = (Forest-size / read-rate) + (Forest-size / write-rate)
Sizing Rules and Recommendations
When determining the resources allocated to forest data, it is recommended that you stay within the following guidelines:
[MarkLogic Recommendation] “The I/O subsystem should have capacity for sustained I/O at 20-MB/sec per content forest in each direction (i.e., 20-MB/sec reads and 20-MB/sec writes at the same time.”
[MarkLogic Recommendation] “The size of all forest data on a server should not exceed 1/3 of the available disk space. The other 2/3rds should be available for forest merges and reindexing, otherwise you will risk merge or reindex failures.”
( The 3x disk space requirement was always true for MarkLogic 6 and earlier releases. However, beginning in MarkLogic 7, the 3x disk space requirement can be reduced if configured and managed. )
[MarkLogic Rule of thumb] “Provision at least 2 CPU cores per active forests. This facilitates concurrent operations. “
[MarkLogic Rule of thumb] “Forests should not grow beyond 200GB or 64-million fragments. These thresholds do not guarantee a particular level of performance and may need to be lowered depending on the application.”
Additional Related Knowledgebase articles