Knowledgebase:
Database backup, restore and local disk failover
27 March 2014 04:27 PM

Summary

This article discusses what happens when you backup or restore your database after a local disk failover event on one of the database forests.

Introduction

MarkLogic Server provides high availability in the event of a data node failure. Data node failures can include operating system crashes, MarkLogic Server restarts, power failures, or persistent system failures; for example hardware failures. With Forest level failover enabled and configured, a machine that hosts a forest can go down and the MarkLogic Server cluster automatically recovers from the outage and keep continuing to process queries without any immediate action needed by an administrator. In MarkLogic Server, if a forest becomes unavailable then the entire database to which this forest is attached becomes unavailable for further query operations. Without failover, such a failure requires a manual intervention (such as administrator) to either reconfigure the forest to another host or to remove this forest from the configuration (cluster). With failover, you can configure the forest to automatically switch to a replica forest on a different host. MarkLogic Server Failover provides for high availability and maintains data and transactional integrity in the event of a data node failure.

The failover scenarios are well documented on our developer web site.

Local Disk Failover

You to configure a forest on another host to serve as a replica forest which will take over when a primary master forest's host goes offline. Local-disk failover allows you to create one or more replica forests for each primary forest. Replica forests contain the exact same data as the primary forest and are kept consistent transactionally. 

It is helpful to use the following terms to refer to the forest configurations and states:

  • Configured Master is the forest which is originally configured as the primary forest.
  • Configured Replica is a forest on another host that is configured as a replica forest of the primary. 
  • Acting Master is the forest that is server as the master forest, regardless of the configuration.
  • Acting Replica is the forest that is server as the replica forest, regardless of the configuration.

Database Backup when a forest is failed over

If you attempt to take a Database back up or perform a database restore when One of the forests of the database had failed over to the replica (i.e. Configured Replica is serving as Acting Master), it may result in XDMP-FORESTNOTOPEN or XDMP-HOSTDOWN errors.

When a database backup takes place, by default, everything associated with database gets backed up. You can also choose to backup any individual forests (only the forests selected while configuring backup are backed up). T

Replica Forest will only be backed up when the 'Include replica forests' are enabled.  If you have not configured the backup to include replica forests, then the replica forests will not be backed up even if it is the acting master. If the Configured Master is also not available, then neither forest will be backed up. In this circumstance, you may see a message in the error logs similar to "Warning: Not backing up database test because first forest master is not available, and replica backups aren't enabled."

Restore when a forest is failed over

Restore's will fail if executed when a forest is failed over (i.e. Configured Replica is serving as Acting Master). In this circumstance, you may see a message in the error logs similar to "Operation failed with error message. Check server logs." or "XDMP:HOSTDOWN".

How to detect if a forest is failed over

In the Admin UI:

  1. Click the Forests icon in the left tree menu;
  2. Click the Summary tab;
  3. You see the configured replica in open state; (This indicates that the Configured Replica is serving as Acting Master).

At the time of the failover event, you may see messages in the Error Log similar to:
2013-10-03 12:49:53.873 Info: Disconnecting from domestic host rh6v-intel64-9.marklogic.com in cluster 16599165797432706248 because it has not responded for 30 seconds.
2013-10-03 12:49:53.873 Info: Disconnected from host rh6v-intel64-9.marklogic.com
2013-10-03 12:49:53.873 Info: Unmounted forest test_P
2013-10-03 12:49:53.875 Info: Forest test_R assuming the role of master with new precise time 13808297938747190
2013-10-03 12:49:53.875 Debug: Recovering undo on forest test_R
2013-10-03 12:49:53.875 Debug: Recovered undo at endTimestamp 13807844927734200 minQueryTimestamp 0 on forest test_R

Revert back from the failover state:

When the configured master is the acting replica, this is considered the "failover state".  In order to revert back, you must either restart the acting master forest or restart the host in which the acting master forest is locally mounted. After restarting, the forest will automatically revert to Configured Master if it's host is online. To check the status of the forests, see the Forests Summary tab in the Admin Interface. 


Conclusion 

For backup and restore to work correctly, clusters configured with local disk failover must have no forests in a failed over state. If a cluster is configured with local disk failover, and if some of its forests are failed over to their local disk replicas, the conditions causing the fail over must be resolved, and the cluster must be returned to the original forest configuration before backup and restore operations may resume.

(13 vote(s))
Helpful
Not helpful

Comments (0)