Considerations when scaling out your MarkLogic instance
16 June 2020 02:22 AM
MarkLogic Server is engineered to scale out horizontally by easily adding forests and nodes. Be aware, however, that when adding resources horizontally, you may also be introducing additional demand on the underlying resources.
On a single node, you will see some performance improvement in adding additional forests, due to increased parallelization. This is a point of diminishing returns, though, where the number of forests can overwhelm the available resources such as CPU, RAM, or I/O bandwidth. Internal MarkLogic research (as of April 2014) shows the sweet spot to be around six forests per host (assuming modern hardware). Note that there is a hard limit of 1024 primary forests per database, and it is a general recommendation that the total number of forests should not grow beyond 1024 per cluster.
At cluster level, you should see performance improvements in adding additional hosts, but attention should be paid to any potentially shared resources. For example, since resources such as CPU, RAM, and I/O bandwidth would now be split across multiple nodes, overall performance is likely to decrease if additional nodes are provisioned virtually on a single underlying server. Similarly, when adding additional nodes to the same underlying SAN storage, you'll want to pay careful attention to making sure there's enough I/O bandwidth to accommodate the number of nodes you want to connect.
More generally, additional capacity above a bottleneck generally exacerbates performance issues. If you find your performance has actually decreased after horizontally scaling out some part of your stack, it is likely that a part of your infrastructure below the part at which you made changes is being overwhelmed by the additional demand introduced by the added capacity.