Knowledgebase:
Planning for a Rolling upgrade
12 July 2022 03:32 PM

Introduction

Rolling upgrades are used to upgrade a large cluster with many hosts to a newer version of MarkLogic Server without incurring any downtime in availability or interruption of transactions. 
This article acts as a supplement to our Rolling upgrade documentation. It discusses the preconditions and assumptions that our feature documentation makes for a successful no-downtime Rolling Upgrade. It also makes a few suggestions to plan the overall approach.

Assumptions:

There are a few basic assumptions that our Rolling upgrade documentation makes, and they include ALL of the following:

  1. as suggested in the feature documentation, 'The security database and the schemas database must be on the same host, and that host should be the first host you upgrade when upgrading a cluster.'
  2. fast failover works, and it should be in the order of seconds. Use a xdmp:shutdown with the failover flag set to true.
  3. the MarkLogic node, taken down as part of Rolling Upgrade, does not have any in-flight transactions
  4. The load balancer should have all the requests drained for the node going down and it should not send any more requests to that node.
  5. all failed transactions will automatically retry and should succeed as soon as fast failover is complete.

Suggestions and approaches:

1. To avoid breaking open network connections while taking a node down for service during a Rolling upgrade, you must drain your requests from the load balancer before you shut down a particular MarkLogic node. Redirecting all new requests to the remaining available nodes allows you not to lose connections. Most modern-day load balancers (such as F5) should be able to perform such kinds of operations. So include these steps in your overall plan as there may be a need to trigger them manually. So that, while a Rolling upgrade is underway, your Load Balancer is accepting incoming requests and routing them to healthy instances only.

2. While taking down a node, for a faster failover, use xdmp:shutdown with the failover flag set to true. In the case of REST manage call for the node, use the 'failover=true' URI parameter for faster failover.

3. Take into account, that it takes time to remount the forests (not just the security forests).

4. It is important to distribute replica forests evenly in your cluster so that when a cluster node is down, its forests failover adds an even load between the remaining up nodes. So, ensure not overloading a particular node to slow the overall process. Keep a close watch on the logs to see if there is any slowness.

5. When possible, always plan a maintenance window for the upgrades.

Summary:

To plan a successful Rolling upgrade without downtime, keep in mind your whole stack, consider your whole approach, prepare your cluster in advance, and test the approach well.

References:

1. Rolling Upgrade Process
2. Important Points to Note Before Performing Rolling Upgrades

(0 vote(s))
Helpful
Not helpful

Comments (0)