Performing Maintenance on a MarkLogic cluster minimizing downtime
24 February 2021 07:29 AM
|
|
SummaryThis is a procedure to assist with maintenance activities that may require the MarkLogic service to be shutdown for a period of time, or for an OS reboot, while minimizing unavailability. It is assumed that High Availability (HA) is configured using local disk failover and all primary forests have a replica forest configured. NOTE: Security and App-Services databases must also be configured for HA. When a host in a MarkLogic cluster becomes unavailable, the host is not be fully disconnected from the cluster until the configured host timeout(default is 30 seconds) expires. If a primary forest resides on that host, the database and any application that references it will be unavailable from the time the host becomes unavailable until all replica forests assume the role of acting primary. If the host unavailability is planned, then you can take steps to minimize the database and application unavailability. This article discusses that a procedure. PlanningWhen a host from the MarkLogic cluster is taken offline, all the remaining hosts must assume the workload previously performed by that host. For this reason, we recommend:
If performing maintenance on more than one host at a time:
Maintenance groups should be sized so that the remaining available hosts represents a reasonable portion of compute, memory and IO resources that can absorb the extra workload required during the maintenance period. Step 0: Verify all replica forests are synchronizedBefore initiating this procedure, verify that all replica forests are in sync with the primary forest by checking the forest status of the replicas are in the “sync replicating” state. This can be achieved using the MarkLogic Server administrative function xdmp:foreststatus or the Management API GET /manage/v2/forests/{id|name}?view=status endpoint. Step 1: Shutdown the host via REST API, forcing an immediate failoverMake a call to the /manage/v2/hosts/{id|name} (POST) endpoint, setting failover to true. curl --anyauth --user user:password -X POST -i --data "state=shutdown&failover=true" Step 2: Verify failover succeededWait until all of the replica forests take over – configured replica forests are now the acting primary forests and in the “open” state, while the configured primary forest is now disabled. You can manually monitor forest status in the Admin UI by refreshing the Forest status display. Once all forests have assumed their new roles, the database will be online. This step can also be achieved using the methods identified in Step 0. Step 3: Verify forests are synchronizedOnce maintenance has been completed and all hosts are back online, some of the replica forests may still be the acting as primaries. Verify that all acting replicas are in sync with the acting primary forests by by checking the forest status, and checking that the acting replicas are in the "sync replicating" state. This step can also be achieved using the methods identified in Step 0. Step 4: Force configured primary forests to resume acting primary forest roleIn order to force the configured primary forests to assume the role of acting primary forests, restart the configured replica / acting primary forests together. Restarting all forests together will help minimize outage impact. This step can also be achieved using the MarkLogic Server administrative function xdmp:forest-restart or the Managment API POST /manage/v2/forests/{id|name} endpoint. Further Reading | |
|