MarkLogic Fundamentals - How should I scale out my cluster?
05 November 2020 02:50 PM


A MarkLogic cluster is a group of inter-connected individual machines (often called “nodes” or “hosts”) that work together to perform computationally intensive tasks. Clustering offers scalability and high-availability by avoiding single-points of failure. This knowledgebase article contains tips and best practices around clustering, especially in the context of scaling out.

How many nodes should I have in a cluster?

If you need high-availability, there should be a minimum of three nodes in a cluster to satisfy quorum requirements.

Anything special about deploying on AWS?

Quorum requirements hold true even in a cloud environment where you have Availability Zones (or AZs). In addition to possible node failure, you can also defend against possible AZ failure by splitting your d-Nodes and e-Nodes evenly across three availability zones.

Load distribution after failover events

If a d-node experiences a failover event, the remaining d-nodes pick up its workload so that the data stored in its forests remains available.

Failover forest topology is an important factor in both high-availability and load-distribution within a cluster. Consider the example below of a 3-node cluster where each node has two data forests (dfs) and two local disk-failover forests (ldfs):

  • Case 1: In the event of a fail over, if both dfs (df1.1 and df1.2) from node1 fail over to node2, the load on node2 would double (100% to 200%, where node2 would now be responsible for its own two forests - df2.1 and df2.2 - as well as the additional two forests from node1 - ldf1.1 and ldf1.2)
  • Case 2: In the event of a fail over, if we instead set up the replica forests in such a way that when node1 goes down, df1.1 would fail over to node2 and df1.2 would fail over to node3, then the load increase would be reduced per node. Instead of one node going from 100% to 200% load, two nodes would instead go from 100% to 150%, where node2 is now responsible for its two original forests - df2.1 and df2.2, plus one of node1's failover forests (ldf1.1), and node3 would also now be responsible for its two original forests - df3.1 and df3.2, plus one of node1's failover forests (ldf1.2)

Growing or scaling out your cluster

If you need to fold in additional capacity to your cluster, try to add nodes in "rings of three." Each ring of three can have its own independent failover topology, where nodes 1, 2, and 3 will fail over to each other as described above, and nodes 4, 5, and 6 will fail over to each other separate from the original ring of three. This results in minimal configuration changes for any nodes already in your cluster when adding capacity.

Important related takeaways

  • In addition to the standard MarkLogic Server clustering requirements, you'll also want to pay special attention to the hardware specification of individual nodes
    • Although the hardware specification doesn’t have to be exactly the same across all nodes, it is highly recommended that all d-nodes be of the same specification because cluster performance will ultimately be limited by the slowest d-node in the system
    • You can read more about the effect of slow d-nodes in a cluster in the "Check the Slowest D-Node" section of our "Performance Testing
      With MarkLogic" whitepaper
  • Automatic fail-back after a failover event is not supported in MarkLogic due to the risks of unintentional overwrites, which could potentially result in accidental data loss. Should a failover event occur, human intervention is typically required to manually fail-back. You can read more about the considerations involved in failing a forest back in the following knowledgebase article: Should I flip failed over forests back to their respective masters? What are the risks if I leave them?


Further reading

(9 vote(s))
Not helpful

Comments (0)