Community

MarkLogic 10 and Data Hub 5.0

Latest MarkLogic releases provide a smarter, simpler, and more secure way to integrate data.

Read Blog →

Company

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up →

 
Knowledgebase:
MarkLogic Backup/Restore FAQ
17 August 2022 02:12 PM
Question Answer Further Reading
What are Backup/Restore best practices? Please refer to our MarkLogic Support FAQ for more details
Should we backup default databases? Please refer to our MarkLogic Support FAQ for more details
Should I be backing up my local disk failover forests? Please refer to our Local Disk Failover FAQ for more details
In terms of disaster recovery (DR) - how do I choose between backup/restore or replication?
Please refer to our Database Replication FAQ for more details

How many copies of data do we have if we enable failover, Backup/Restore, Database Replication?

Your primary cluster has its data forests (1st copy) and likely local disk failover forests (2nd) for high availability. Your replica cluster likely has its own data forests (3rd) and local disk failover forests (4th) for more up-to-date disaster recovery copies. You can also take backups from either environment (now 5 copies) for a less up-to-date DR copy.

Please analyze these and setup accordingly (You don't have to setup all of them or have multiple replica forests or backup copies) depending on your need.

On which environment should I take a backup? Primary or Replica cluster? 

In general, it's probably best to take a backup from the environment,  primary or replica (one of the two, unlikely to need near identical or identical backups from both), that can best accommodate the backup load.

 

What does a MarkLogic Database Backup contain?

MarkLogic database backups are by default self-contained with the following

  • The configuration files.
  • The Security database, including all of its forests.
  • The Schemas database, including all of its forests.
  • The Triggers database, including all of its forests.
  • All of the forests of the database you are backing up.

Documentation:

White Paper:

What are the important points to note before performing Backups/Restore?

Refer to the "Notes about Backup and Restore Operations" section in our documentation.

Documentation:

Will there be any interruption in running queries/updates while backup runs?

Most of the time, when a backup is running, all queries and updates proceed as usual. MarkLogic simply copies stand data from the source directory to the backup target directory, file by file. Stands are read-only except for the small Timestamps file, so this bulk copy can proceed without needing to interrupt any requests. Only at the very end of the backup does MarkLogic have to halt incoming requests briefly to write out a fully consistent view for the backup, flushing everything from memory to disk.

Documentation:

White Paper:

What is Flash Backup?

In flash backup mode you need to quiesce all forests in a given database for long enough to allow you to make a file level backup of the forest data.

White Paper:

KB Article:

What are the advantages of using MarkLogic backup over other options/methods?


  • Our Backup and Restore APIs use a timestamp to guarantee that a backup is consistent according to a given timestamp; during the course of the time the backup takes to run, the on-disk stands being backed up will be kept until the backup has completed and it will also allow new updates to continue to take place (advancing the database forest timestamps), so it's generally recommended as the safest strategy to use if you want to be able to restore from a crash.
  • Our Backup and Restore API also force a checkpoint with the forest Journal files and any in-memory transactions just before the backup starts, meaning that all transactions up to the point at which the backup started are guaranteed to be in the backup set.
  • If you want to use other backup methods other than what MarkLogic provides, you can explore that. But you need to make sure that there are no updates happening at that time. Forests should be completely quiesced first; you wouldn't need to stop MarkLogic Server to do this, but you would need to (at the very least) ensure the forests were placed into flash-backup mode - this would allow queries to take place but would not allow any transactions to make changes while the backup task ran.

KB Article:

Can we restore backups across feature releases of MarkLogic? 

Yes, you can restore from older version to newer version - but not vice versa.

KB Articles:

Can we restore backups across different OS platforms?

No, MarkLogic backup files are platform specific and should only be restored onto the same platform. This is true for both database and forest backups.

Documentation:

KB Article:

What is the role of Journals in relation to Backup and Restore?

Refer to the Knowledgebase article for details.

How does "point-in-time" recovery work with Journal Archiving?
Refer to the documentation and Knowledgebase article for details.
Do the journal archive files from a backup become invalid with the next backup?

New journal archives are started when the next full backup is done. During the period of time that the new full backup is running, we archive journals to both the old and new location until we're sure the new full backup will complete successfully.

Documentation:

Do the archive files normally get deleted with a subsequent backup?

They are typically deleted when the corresponding full backup is deleted.

Documentation:

How much free space is needed for the Journal Archive files in a Backup? The size of the journal archive can be larger (for example 6x) and totally dependent on how much data  you are ingesting and how much time you have between backups.

KB Article:

Can you explain resource consumption during Backup/Restore? Full backup/restore operations are resource (I/O, CPU and Memory) intensive and should be scheduled during off-hours, when possible.

Documentation:

Is it possible to restore to a target database with different number of forests than the source database? Yes, use the "Forest topology changed" option while restoring.

Documentation:

KB Article:

What is the recommended way to backup/restore multiple databases? Refer to our knowledgebase article for more details
How to configure database backup rotation? You can configure the maximum number of full (does not apply for incremental) backups to keep by specifying a number to the "max backups" parameter. When you reach the specified maximum number of backups, the next backup will delete the oldest backup. Specify 0 to keep an unlimited number of backups. You can set this in Admin UI or use API's to set this value.

Documentation:

What are the best practices for spacing incremental backups? Incremental backups are more resource-intensive than full backups as they need to query the data to find the changes between backup. You would need to monitor your system closely to ensure that the overhead of running so many incremental backups is not affecting your system performance or even that a subsequent backup starts before the previous has completed. Frequent incremental backups are not recommended, general recommendation is to space them at least 6 hours apart.

KB Article:

Can you explain the directory structure for Incremental backups?

If an incremental backup directory is specified, after the first incremental backup is done, the full backup can be archived to another location. The subsequent incremental backups do not need to examine the full backup.

Once you restore an incremental backup, you can no longer use the previous full backup location for ongoing incremental backups. After the restore, you need to make a fresh full backup and use the full backup location for ongoing incremental backups. This means that after restore of an incremental backup, scheduled backups need to be updated to use the fresh full backup location.

Documentation: 

Why do Incremental backups take more time than Full backups?

Incremental backups would be expected to use higher CPU and RAM as they perform queries to determine what data has changed and needs to be backup, full backups simply backup up all available Forest data and are more likely to be I/O constrained. If the system is memory or CPU constrained during the time incremental backup is running, (i.e other processes or queries running), then the incremental task would take lower priority and could possibly take longer to run than a Full backup. Please also note that Incremental backups are designed to minimize storage - not time.

Note that incremental backups could be fast when not much data has changed from the last time an incremental back up was taken, or when the system is otherwise idle. However, most of the time incremental backups are given lower priority, to consume least amount of resources, which ultimately results in longer run times.

Why use incremental backup when using journal archiving? Is this a recommended combination?

Incremental backups are more compact than archived journals and are faster to restore.

Incremental backup improves both restore time and also space requirements over journal archiving, but it's not an either/or decision - you can use both where appropriate.

Restoring from incremental backup taken on a different cluster fails. What do I need to check?

Every incremental backup will store a reference to the location of the previous incremental backup and the very first one will store a reference to the location of the full backup. These are stored in a file by the name BackupTag.txt. The restore job fetches the backup locations from this file, and if they still point to an older location, then incremental restore will fail in this scenario.

 

KB Article:

Why MarkLogic Server backup is slower than file copy?

Refer to our Knowledgebase article for more details

Can you explain how Backup/Restore with encryption works?
  • If any forest in the backup has encryption enabled, then the entire backup will be encrypted.
  • As long as the current database being restored is encrypted, the restored database will also be encrypted.
  • By default the MarkLogic embedded KMS is automatically included in a backup. If you set the backup option to exclude and turn off the automatic inclusion of the keystore, you are responsible for saving keystore (the embedded KMS) to a secure location.

Documentation:

How can I monitor MarkLogic Backup?
  • Check Database status page on the Admin UI
  • Use the MarkLogic API's
 

KB Article:

(3 vote(s))
Helpful
Not helpful

Comments (0)