Transferring data between MarkLogic Server clusters
24 July 2017 12:45 PM
There are a number of options for transferring data between MarkLogic Server clusters. The best option for your particular circumstances will depend on your use case.
Database Backup and Restore
To transfer the data between two independent clusters, you may use a database backup and restore procedure, taking advantage of MarkLogic Server's facility to make a consistent backup of a database.
Note: the backup directory path that you use must exist on all hosts that serve any forests in the database. The directory you specify can be an operating system mounted directory path, it can be an HDFS path, or it can be an S3 path. Further information on using HDFS and S3 storage with MarkLogic is available in our documentation:
Further information regarding backup and restore may be found in our documentation and Knowledgebase:
Database Replication is another method you might choose to use to transfer content between environments. Database Replication will allow you to maintain copies of forests on databases in multiple MarkLogic Server clusters. Once the replica database in the replica cluster is fully synchronized with its master, you may break replication between the two and then go on to use the replica cluster/database as the master.
Note: to enable Database Replication, a license key that includes Database Replication is required. You would also need to ensure that all hosts are: running the same maintenance release of MarkLogic Server; using the same type of Operating System; and Database Replication is correctly configured.
Also note that for optimum efficiency, indexing information is not replicated over the network between the Master and Replica databases and is instead regenerated by the Replica database. The following Knowledgebase article contains further information on this:
Further details on Database Replication and how it can be configured, may be found in our documentation:
MarkLogic Content Pump (mlcp)
Depending on your specific requirements, you may also like to make use of the MarkLogic Content Pump (mlcp), which is a command line tool for getting data out of and into a MarkLogic Server database. Using mlcp, you can export documents and metadata from a database, import documents and metadata to a database, or copy documents and metadata from one database to another.
If required, you may use mlcp to extract a consistent database snapshot, forcing all documents to be read from the database at a consistent point in time:
Note: the version of mlcp you use should be same as the most recent version of MarkLogic Server that will be used in the transfer.
Also note that mlcp should not be run on a host that is currently running MarkLogic Server, as the Server assumes it has the entire machine available to it, including the CPU and disk I/O capacity.
Further information regarding mlcp is available in our documentation:
Related Knowledgebase articles that you may also find useful: