Solutions

MarkLogic Data Hub Service

Fast data integration + improved data governance and security, with no infrastructure to buy or manage.

Learn More

Learn

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Community

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Company

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

 
Knowledgebase:
Removing hosts from a MarkLogic cluster minimizing downtime
26 June 2017 11:13 AM

This is a procedure to take hosts out of a MarkLogic cluster with minimal unavailability. It is assumed that High availability is configured using local disk failover and all master forests have at least one replica forest configured.

When a host in a MarkLogic cluster becomes unavailable, the host is not be fully disconnected from the cluster until the configured host timeout (default is 30 seconds) expires. If a master forest resides on that host, the database and any application that references it will be unavailable from the time the host becomes unavailable until all replica forests assume the role of acting master.  

If the host unavailability is planned, then you can take steps to minimize the database and application unavailability.  This article discusses that a procedure. 

Planning

When a host is removed from the MarkLogic cluster, all the remaining hosts must assume the workload previously performed by that host.  For this reason, we recommend

  • Scheduling server maintenance during low usage periods.
  • Evenly distributing a host's replica forests across the other nodes in the cluster so that the extra workload is evenly distributed when that host is unavailable. 
  • Minimize the number of hosts removed for maintenance at any one time.   

If removing more than one host at a time:

  • Define a maintenance group of hosts containing configured master forests that have their local disk replica forests on hosts not in the maintenance group.
  • All required forests must have replica forests defined. This includes all content forest, security database forests and forests for all linked schema databases. 

Important Note: Schemas and Security databases must also be configured for high availability. Unavailability of Schemas and Security databases can impact availability of other databases, and also availability of administrative functions.

  • Maintenance groups should be sized so that the remaining available hosts represents a reasonable portion of compute, memory and IO resources that can absorb the extra workload required during the maintenance period.

Step 0: Verify all replica forests are synchronized

Before initiating this procedure, verify that all replica forests are in sync with the master forest by checking the forest status of the replicas are in the “sync replicating” state.

This step can also be achieved via script using the MarkLogic Server administrative function xdmp:foreststatus or management api GET /manage/v2/forests/{id|name}?view=status.

Step 1: Force failover from master forests to replica forests

Disable all master forests on the maintenance group together.  This minimizes the database unavailability time as the forest failing over from the master forest to replica forest can happens in parallel.

This step can also be achieved via script using the MarkLogic Server administrative function admin:forest-set-enabled or management api POST /manage/v2/forests/{id|name}.

Step 2: Verify failover succeeded.

Wait until all of the replica forests take over – configured replica forests are now the acting master forests and in the “open” state, while the configured master forest is now disabled.  You can manually monitor forest status in the Admin UI by refreshing the Forest status display.  Once all forests have assumed their new roles, the database will be online.

This step can also be achieved via script using the MarkLogic Server administrative function xdmp:foreststatus or management api GET /manage/v2/forests/{id|name}?view=status.

Step 3: Shutdown hosts and perform maintenance

Shutdown MarkLogic Server instance on all hosts in the maintenance group.

Verify the rest of the cluster is still responsive before taking down the hosts themselves.

When maintenance complete, bring hosts back online.

Step 4: Enable configured master forests 

Once all hosts are back online, enable all forests disabled in step 1.  Once enabled, the configured master forests will assume the role of acting replica forest and will initiate a process to synchronize the master/replica pairs.

This step can also be achieved via script using the MarkLogic Server administrative function admin:forest-set-enabled or management api POST /manage/v2/forests/{id|name}.

Step 5: Verify forests synchronized

Before forcing the configured master forests to assume the role of acting master, verify all acting replica forests are in sync with the acting master forest by checking the forest status of the acting replica forests are in the “sync replicating” state.

This step can also be achieved via script using the MarkLogic Server administrative function xdmp:foreststatus or management api GET /manage/v2/forests/{id|name}?view=status.

Step 6: Force configured master forests to resume acting master forest role. 

In order to force the configured master forests to assume the role of acting master forests, restart the configured replica / acting master forests together.  Restarting all forests together will help minimize outage impact.

This step can also be achieved via script using the MarkLogic Server administrative function xdmp:forest-restart or management api POST /manage/v2/forests/{id|name}. 

Further Reading

Scripting Failover: "flipping" replica forests back to their masters using XQuery

Further reading

(2 vote(s))
Helpful
Not helpful

Comments (0)