MarkLogic Server I/O Requirements Guide
04 February 2020 04:12 PM
This article will help MarkLogic Administrators and System Architects who need to understand how to provision the I/O capacity of their MarkLogic installation.
MarkLogic Disk Usage
Databases in MarkLogic Server are made up of forests. Individual forests are made up of stands. In the interests of both read and write performance, MarkLogic Server doesn't update data already on disk. Instead, it simply writes to the current in-memory stand, which will then contain the latest version of any new or changed fragments, and old versions are marked as obsolete. The current in-memory stand will eventually become yet another on-disk stand in a particular forest.
Ultimately, however, the more stands or obsolete fragments there are in a forest, the more time it takes to resolve a query. Merges are a background process that reduce the number of stands and purge obsolete fragments in each forest in a database, thereby improving the time it takes to resolve queries. Because merges are so important to the optimal operation of MarkLogic Server, it's important to provision the appropriate amount of I/O bandwidth, where each forest will typically need 20MB/sec read and 20MB/sec write. For example, a machine hosting four forests will typically need sufficient I/O bandwidth for both 80MB/sec read and 80MB/sec write.
Determining I/O Bandwidth
One way to determine I/O bandwidth would be to use a synthetic benchmarking utility to return the available read and write bandwidth for the system as currently provisioned. While useful in terms of getting a ballpark sense of the I/O capacity, this approach unfortunately does not provide any information about the real world demand that will ultimately be placed on that capacity.