Removing hosts from a MarkLogic cluster minimizing downtime
26 June 2017 11:13 AM
This is a procedure to take hosts out of a MarkLogic cluster with minimal unavailability. It is assumed that High availability is configured using local disk failover and all master forests have at least one replica forest configured.
When a host in a MarkLogic cluster becomes unavailable, the host is not be fully disconnected from the cluster until the configured host timeout (default is 30 seconds) expires. If a master forest resides on that host, the database and any application that references it will be unavailable from the time the host becomes unavailable until all replica forests assume the role of acting master.
If the host unavailability is planned, then you can take steps to minimize the database and application unavailability. This article discusses that a procedure.
When a host is removed from the MarkLogic cluster, all the remaining hosts must assume the workload previously performed by that host. For this reason, we recommend
If removing more than one host at a time:
Important Note: Schemas and Security databases must also be configured for high availability. Unavailability of Schemas and Security databases can impact availability of other databases, and also availability of administrative functions.
Step 0: Verify all replica forests are synchronized
Before initiating this procedure, verify that all replica forests are in sync with the master forest by checking the forest status of the replicas are in the “sync replicating” state.
Step 1: Force failover from master forests to replica forests
Disable all master forests on the maintenance group together. This minimizes the database unavailability time as the forest failing over from the master forest to replica forest can happens in parallel.
Step 2: Verify failover succeeded.
Wait until all of the replica forests take over – configured replica forests are now the acting master forests and in the “open” state, while the configured master forest is now disabled. You can manually monitor forest status in the Admin UI by refreshing the Forest status display. Once all forests have assumed their new roles, the database will be online.
Step 3: Shutdown hosts and perform maintenance
Shutdown MarkLogic Server instance on all hosts in the maintenance group.
Verify the rest of the cluster is still responsive before taking down the hosts themselves.
When maintenance complete, bring hosts back online.
Step 4: Enable configured master forests
Once all hosts are back online, enable all forests disabled in step 1. Once enabled, the configured master forests will assume the role of acting replica forest and will initiate a process to synchronize the master/replica pairs.
Step 5: Verify forests synchronized
Before forcing the configured master forests to assume the role of acting master, verify all acting replica forests are in sync with the acting master forest by checking the forest status of the acting replica forests are in the “sync replicating” state.
Step 6: Force configured master forests to resume acting master forest role.
In order to force the configured master forests to assume the role of acting master forests, restart the configured replica / acting master forests together. Restarting all forests together will help minimize outage impact.
Scripting Failover: "flipping" replica forests back to their masters using XQuery