Knowledgebase:
Performing Maintenance on a MarkLogic cluster minimizing downtime
24 February 2021 07:29 AM

Summary

This is a procedure to assist with maintenance activities that may require the MarkLogic service to be shutdown for a period of time, or for an OS reboot, while minimizing unavailability. It is assumed that High Availability (HA) is configured using local disk failover and all primary forests have a replica forest configured.

NOTE: Security and App-Services databases must also be configured for HA.

When a host in a MarkLogic cluster becomes unavailable, the host is not be fully disconnected from the cluster until the configured host timeout(default is 30 seconds) expires. If a primary forest resides on that host, the database and any application that references it will be unavailable from the time the host becomes unavailable until all replica forests assume the role of acting primary.

If the host unavailability is planned, then you can take steps to minimize the database and application unavailability. This article discusses that a procedure.

Planning

When a host from the MarkLogic cluster is taken offline, all the remaining hosts must assume the workload previously performed by that host. For this reason, we recommend:

  • Scheduling server maintenance during low usage periods.
  • Evenly distributing a host's replica forests across the other nodes in the cluster so that the extra workload is evenly distributed when that host is unavailable.
  • Minimize the number of hosts removed for maintenance at any one time.

If performing maintenance on more than one host at a time:

  • Define a maintenance group of hosts containing primary forests that have their local disk replica forests on hosts not in the maintenance group.
  • All required forests must have replica forests defined. This includes all content forest, security database forests and forests for all linked schema databases.

Maintenance groups should be sized so that the remaining available hosts represents a reasonable portion of compute, memory and IO resources that can absorb the extra workload required during the maintenance period.

Step 0: Verify all replica forests are synchronized

Before initiating this procedure, verify that all replica forests are in sync with the primary forest by checking the forest status of the replicas are in the “sync replicating” state.

This can be achieved using the MarkLogic Server administrative function xdmp:foreststatus or the Management API GET /manage/v2/forests/{id|name}?view=status endpoint.

Step 1: Shutdown the host via REST API, forcing an immediate failover

Make a call to the /manage/v2/hosts/{id|name} (POST) endpoint, setting failover to true.

curl --anyauth --user user:password -X POST -i --data "state=shutdown&failover=true" 
-H "Content-type: application/x-www-form-urlencoded" 
http://localhost:8002/manage/v2/hosts/my-host?format=JSON Using this endpoint with the failover parameter tells the cluster to use fast failover, which immediately fails the primary forests managed by that host over to their replicas, instead of waiting 30 seconds for the host to timeout.

Step 2: Verify failover succeeded

Wait until all of the replica forests take over – configured replica forests are now the acting primary forests and in the “open” state, while the configured primary forest is now disabled. You can manually monitor forest status in the Admin UI by refreshing the Forest status display. Once all forests have assumed their new roles, the database will be online.

This step can also be achieved using the methods identified in Step 0.

Step 3: Verify forests are synchronized

Once maintenance has been completed and all hosts are back online, some of the replica forests may still be the acting as primaries. Verify that all acting replicas are in sync with the acting primary forests by by checking the forest status, and checking that the acting replicas are in the "sync replicating" state.

This step can also be achieved using the methods identified in Step 0.

Step 4: Force configured primary forests to resume acting primary forest role

In order to force the configured primary forests to assume the role of acting primary forests, restart the configured replica / acting primary forests together. Restarting all forests together will help minimize outage impact.

This step can also be achieved using the MarkLogic Server administrative function xdmp:forest-restart or the Managment API POST /manage/v2/forests/{id|name} endpoint.

Further Reading

(2 vote(s))
Helpful
Not helpful

Comments (0)