Community

MarkLogic 10 and Data Hub 5.0

Latest MarkLogic releases provide a smarter, simpler, and more secure way to integrate data.

Read Blog →

Company

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up →

 
Knowledgebase: MarkLogic Server
Moving Forests Across Storage Devices
27 April 2016 02:14 PM

Update:

Since the time this article was originally written, MarkLogic included Forest Rebalancing and Forest Retiring Features in the more recent versions of MarkLogic Server.  For zero downtime movement of forests, please refer to our documentation for these features - http://docs.marklogic.com/guide/admin/database-rebalancing.  

The legacy Article follows: 

Summary

There are many reasons why you may need to move a forest from one storage device to another. For example:

  • Transition from shared storage to dedicated storage (or vice versa);
  • Replace a small storage device with a larger one;
  • Reorganize - forest placement;

No matter what the reason, the action of moving forests should be well planned and deliberate, while the procedure should be well tested.  This article lists both the steps that should be followed as well as issues to be considered when planning a move.

We will present two different techniques for moving a forest.  The first being appropriate for databases that the can be restricted from updates for the duration of the forest move.  The second being appropriate for production databases where database downtime needs to be minimized.

Simple Procedure to Move a Forest

The simple procedure to move a forest can be used on any forest whose database can be restricted from updates for the duration of the process.  This will typically be for test, development and staging systems, but may also include production environments that can be disabled for extended maintenance windows.

To retain data integrity, this procedure requires that the associated database is restricted from updates.    The update restriction can be enforced in a variety of ways:

  • By setting all forests in the database to “read-only”; 
  • By disabling all application servers that reference the database.  You will also need to verify that there are no tasks in the task queue that can update the database.
  • By restricting access at the application level.
  • By restricting access procedurally – this is a common approach in test, development and staging environments.    

The following steps can be used to move a forest: 

Step 1: Begin enforcement of update restriction;

Step 2: Create a backup of the forest you would like to move;

Step 3: Create a new forest, specifying the new location for the forest;

Step 4: Restore the forest data from step 2 to the newly created forest;

Step 5: Verify that the forest data is restored successfully;

Step 6: Switch forests attached to the database;

a. Detach the original forest from the database;

b. Attach the new forest to the database;

WARNING: When moving a forest in the Security database, this step must occur in a single transaction (i.e. detaching original security forest and attaching a new security forest in a single transaction). The MarkLogic Server must have an operational Security database to function properly

Step 7: Remove update restriction (from step 1);

Step 8: (Optional) Remove/delete the original forest.

Moving a Forest Minimizing Downtime

If the forest to be moved resides on a production system whose content databases are continually being updated, and if you cannot afford the database to be restricted from updates for the duration of a backup and a restore, then you can use the local disk failover feature to synchronize your forests before switching them.  This approach will minimize the required downtime of the database.

The following steps can be used to move a forest while minimizing downtime: 

Step 1: Create a new forest, specifying the new location for the forest.

Step 2: (Optional) Seed the new forest from backup. Although we will be using the local disk failover feature to synchronize forest content, seeding the new forest from a recent backup will result in faster synchronization and will use less resources (i.e. less disruptive to the production system)

Step 3: If you do not have a recent forest backup of the forest you would like to move, create one.

Step 4: Perform a forest level restore to the newly created forest.

Step 5: Configure the new forest as a forest replica of the original forest.

Step 5: Wait until the Forest is in the “sync replicating” state.  You can use the Admin UI Forest status page to check for sync replicating.

Step 6: Switch forests: This step requires that the database is OFFLINE for a short period of time.

a. Detach the original forest from the database;

b. Remove the forest replica configuration created in step 5;

c. Attach the new forest to the database ;

  1. WARNING: When moving a forest in the Security database , this step must occur in a single transaction (i.e. detaching original Security forest and attaching a new Security forest in a single transaction). The MarkLogic Server must have an operational Security database to function properly

Step 7: (Optional) Remove/delete the original forest.

Retaining Forest Name

Both forest move procedures presented require the new forest to have a different name than the original because forest names must be unique within a MarkLogic Server cluster and both procedures have the original and new forests existing in the system at the same time. Although rare, some applications have forest name dependencies (i.e. applications that perform in-forest query evaluations or in-forest placement of document inserts). If this is the case, you will either need to update your application, or change the method used to move the forest (since MarkLogic Server does not provide a mechanism to change the name of a forest).  

  • You can modify the “Simple Forest Move” procedureby performing the forest delete after (step 2) ‘creating a successful forest backup’, and before (step 3) ‘creating a new forest’.  This way, in step 3, you can create the new forest with the same name as the forest that was deleted.
  • To retain the forest name while minimizing database downtime, you can perform the “Moving a Forest Minimizing Downtime” procedure twice – the first time to a temporary forest and the second time to the final destination. 

Forest Replicas and Failover Hosts

If the original forest has ‘forest replicas’ or ‘failover hosts’ configured, you will need to detach these configurations before you can delete the original forest.

If you would like the new forest to be configured with ‘forest replicas’ of ‘failover hosts’, you must first detach these configurations from the original forest before reattaching them to the new forest.

Estimate Time

The majority of the time will be spent transferring content from the original forest to the new forest.  You can estimate the amount of time this will take from

  • The size of the forest on disk (forest-size in MB);
  • The I/O read rate available for the device where the original forest resides (read-rate in MB/second); and
  • The I/O write rate available for the device where the new forest resides (write-rate in MB/second).

Estimate time = (Forest-size / read-rate) + (Forest-size / write-rate)

Sizing Rules and Recommendations

When determining the resources allocated to forest data, it is recommended that you stay within the following guidelines:

[MarkLogic Recommendation]The I/O subsystem should have capacity for sustained I/O at 20-MB/sec per content forest in each direction (i.e., 20-MB/sec reads and 20-MB/sec writes at the same time.”

[MarkLogic Recommendation]The size of all forest data on a server should not exceed 1/3 of the available disk space.  The other 2/3rds should be available for forest merges and reindexing, otherwise you will risk merge or reindex failures.”

     (  The 3x disk space requirement was always true for MarkLogic 6 and earlier releases. However, beginning in MarkLogic 7, the 3x disk space requirement can be reduced if configured and managed. )

[MarkLogic Rule of thumb]Provision at least 2 CPU cores per active forests. This facilitates concurrent operations. “

[MarkLogic Rule of thumb]Forests should not grow beyond 200GB or 64-million fragments. These thresholds do not guarantee a particular level of performance and may need to be lowered depending on the application.

Additional Related Knowledgebase articles

Knowledgebase Article: Understand the Logs during rebalancer and reindex activity

Knowledgebase Article: Data Balancing in MarkLogic 

Knowledgebase Article: Rebalancing, replication and forest reordering 

Knowledgebase Article: Diagnosing Rebalancer issues after adding or removing a forest 

 

 

(0 vote(s))
Helpful
Not helpful

Comments (0)