MarkLogic Fundamentals - How should I scale out my cluster?
18 June 2020 06:00 PM
A MarkLogic cluster is a group of inter-connected individual machines (often called “nodes” or “hosts”) that work together to perform computationally intensive tasks. Clustering offers scalability and high-availability by avoiding single-points of failure. This knowledgebase article contains tips and best practices around clustering, especially in the context of scaling out.
How many nodes should I have in a cluster?
If you need high-availability, there should be a minimum of three nodes in a cluster to satisfy quorum requirements.
Anything special about deploying on AWS?
Quorum requirements hold true even in a cloud environment where you have Availability Zones (or AZs). In addition to possible node failure, you can also defend against possible AZ failure by splitting your d-Nodes and e-Nodes evenly across three availability zones.
Load distribution after failover events
If a d-node experiences a failover event, the remaining d-nodes pick up its workload so that the data stored in its forests remains available.
Failover forest topology is an important factor in both high-availability and load-distribution within a cluster. Consider the example below of a 3-node cluster where each node has two data forests (dfs) and two local disk-failover forests (ldfs):
Growing or scaling out your cluster
If you need to fold in additional capacity to you cluster, try to add nodes in "rings of three." Each ring of three can have its own independent failover topology, where nodes 1, 2, and 3 will fail over to each other as described above, and nodes 4, 5, and 6 will fail over to each other separate from the original ring of three. This results in minimal configuration changes for any nodes already in your cluster when adding capacity.
Important related takeaways