Knowledgebase: Administration
Disaster Recovery
26 November 2019 12:01 PM

Introduction

MarkLogic Server provides a variety of  disaster recovery (DR) facilities including full backup, incremental backup, and journal archiving that when combined with other ML features can create a complete disaster recovery strategy. This paper shows some examples of how these features can be combined. It is not comprehensive nor does it reflect features offered only in the latest releases.

Details

This article will cover three perspectives. First, a quick overview of the metrics used by businesses to measure the quality of their Disaster Recovery strategies will be covered. Next, an overview of how to combine the features that MarkLogic offers in various categories will be given.

More?: High Availability and Disaster Recovery features ,  High Availability & Disaster Recovery datasheetScalability, Availability, and Failover Guide 

Disaster Recovery Criteria

In order to configure MarkLogic Server to perform well in Disaster Recovery situations, we should first define what parameters we will use to measure each possible approach. For most situations, these four measures are used: 

Long Term Retention Policy (LTR): Long Term Retention Policy can be driven by any number of business, regulatory and other criteria. It is included here because MarkLogic's backup files are often a key part of an LTR strategy. 

Recovery Point Objective (RPO)The requirement for how up-to-date the database has to be post-recovery with respect to its state immediately before the incident that required recover.

Recovery Time Objective (RTO)The requirement for the time elapsed between the incident and the recovery to the RPO.

CostThe storage cost, the computational resource cost and  the operations cost of the overall deployment strategy.

Flexible Replication Features

Flexible replication can be used to support LTR objectives but is generally not useful for Disaster Recovery

More? Flexible Replication Guide

Platform Support Features

Flash backup provides a way to leverage backup features of your deployment platform while maintaining transaction integrity. Platform specific solutions can often achieve RPO and RTO targets that would be impossible through other means.

More? Flash Backup

High Availability Features

Forest replication provides recovery from host failures.

More? Scalability, Availability, and Failover Guide

Disaster Recovery Features

Database Replication

Database Replication is the process of maintaining copies of forests on databases in multiple MarkLogic Server clusters.

More? Understanding Database Replication

Backups

Of all your backup options, full backups restore the quickest, but take the most time to backup and possibly the most storage space. Each full backup is a backup set in that it contains everything you need to restore to the point of the backup.

Full backups with journal archiving allow restores to a point after the backup, but the journal archive grows in an unbounded way with the number of transactions, and replaying the journals to get to your recovery point takes time proportional to the number of transactions in the journal archive, so over time, this becomes less efficient.

With full + incremental backups, a backup set is a full backup, plus the incremental backups taken after that full backup. Incremental backups are quick to backup, but take longer to restore, and over time the backup set gets larger and larger, so it may end up consuming more backup space than a full backup alone (depending on your backup retention policy).

Full + incremental backups with journal archiving have the same characteristics as incremental backups, except that you can roll forward from the most recent incremental. With this strategy, the journal archive doesn't grow in an unbounded way because the archive is purged when you take the next incremental backup. Note that if your RPO is between incremental backups, you must also enable a merge timestamp by setting the merge timestamp to a negative value (see below).

More?: Administrator’s Guide to Backing Up and Restoring a Database  How does "point-in-time" recovery work with Journal Archiving? 

Forest Merge Configurations

Forest merges recover the disk space occupied by deleted documents. A negative merge timestamp delays that permanent deletion. If we want incremental backups to contain all the fragments that were deleted since the last incremental backup then we want to set the delay to a period greater than the incremental backup period. This requires more disk space for the incremental backups and also requires additional space in the live database, but provides the most flexibility.

Setting retain-until-backup on a given database (thru the Admin UI or thru an API call) has a similar effect by telling the server to keep the deleted fragments until a full backup or an incremental backup completes. Many clients choose to use both the negative merge timestamp and retain until backup options together.

More?: admin:database-set-merge-timestamp  admin:database-set-retain-until-backup

Other Features

The need for a negative merge timestamp can be understood by remembering that forest merges recover the disk space occupied by deleted documents. A negative merge timestamp delays that permanent deletion. If we want incremental backups to contain all the fragments that were deleted since the last incremental backup then we want to set the delay to a period greater than the incremental backup period. This requires more disk space for the incremental backups and also requires additional space in the live database, but provides the most flexibility.

Setting retain-until-backup on a given database (thru the Admin UI or thru an API call) has a similar effect by telling the server to  keep the deleted fragments until a full backup or an incremental backup. Many clients choose to use both the negative merge timestamp and retain until backup options together.

More?: admin:database-set-merge-timestamp,  admin:database-set-retain-until-backup 

Conclusion

Planning to meet a Long Term Retention (LTR) policy, a Recovery Point Objective (RPO) and a Recovery Time Objective (RTO) and a Cost goal is a key part of developing an overall MarkLogic deployment plan. MarkLogic offers a wealth of tools that can complement each other when they are properly coordinated. As is clear from this article, the choices are many, broad, and interrelated.

(4 vote(s))
Helpful
Not helpful

Comments (0)