Understanding MarkLogic Minimum Disk Space Requirements
17 October 2019 05:16 PM
To simplify the calculation, the documentation for disk space requirements section of the MarkLogic Installation guide states that starting from MarkLogic 8 (and continued in MarkLogic 9), the minimum disk space requirement is 1.5 times the total forest size for sufficiently large forests. Previously, the requirement was to maintain disk space that is 3x forest data size.
In MarkLogic 8, we introduced (and continued in MarkLogic 9):
Stated most simply, the minimum disk space requirement for a forest is the greater of 192 GB or 1.5x times forest data size.
This article explains the calculation of the minimum disk space requirement, but please keep in mind that sufficient disk space beyond the bare minimum requirement should be available in order to handle influx of data into your system for at least the amount of time it takes to provision more capacity.
Before we dive into how the minimum disk space requirement is calculated, let's briefly discuss some of the conditions that need to be met to make this calculation achievable:
Once all of the assumptions are met, let's look at how we can calculate the minimum disk space required, taking into account:
This can be expressed as:
minimum-disk-space-required = forest-max-size + merge-space + concurrent-update
The actual forest size varies when considering deleted fragments . The amount of variance depends on the merge-min-ratio setting (for example: a merge ratio of 2 can result in a forest with 1/3 fragments deleted)
forest-max-size = (1 + 1/merge-min-ratio) * minimum-forest-size
Where minimum-forest-size is calculated by a fully merged forest with no deleted fragments. But even the minimum-forest-size can vary based on the raw document content size and index setting.
minimum-forest-size = document-size * expected-index-expansion
Index expansion varies by index settings and document content. We have seen index expansion from 0.75X to 5X. The only way to estimate this value is by experimentation – with a sufficiently large representative sample data set so that stand overhead is insignificant.
When calculating disk space required, we need to account for the maximum possible size (i.e. forest size includes index expansion of documents and maximum deleted fragments.).
forest-max-size = (1 + 1/merge-min-ratio) * (document-size * expected-index-expansion)
(Note: although stated as a maximum forest size, this value may still be lower than actual if merge assumptions not met.)
During merging, there is a point in time where the old stands and new stand coexist on disk - the write must succeed before the old stand can be removed. There may be multiple merges occurring, but MarkLogic Server will with the merge-max-size configuration, the merges never require more than 1.33x the merge-max-size (the old stand is already taken into account in the forest size calculations):
1.33 * merge-max-size
There is a time lag from the time when merging begins and ends and you need space for the documents that can be updated/reindexed during that time (reindex is equivalent to a delete and an insert)
(merge-max-size / merge-rate) * update rate
If you make the simplifying assumption that concurrent update (or reindexing) occurs 50% slower as merging, then this just becomes
0.66 * merge-max-size
Putting it Together
So putting it all together in terms of
minimum-disk-space-required = forest-max-size + 2 * merge-max-size
Remember that forest-max-size is a function of document content size, index expansion and retained deleted fragments (merge-min-ratio) per the equation presented earlier.
Again, per our assumptions, there are conditions where the calculated minimum disk space requirement may not be sufficient.
Out of Space
What happens if MarkLogic Server does not have enough disk space?
The most likely outcome is that merges will begin to fail and you will see an XDMP-MERGESPACE error in the error log. It is also possible that forests will go offline. If a forest goes offline, the database will also be offline, halting all access to the database. When this happens, you will need to take manual corrective action to either free up some disk space or add more.
The minimum disk space requirement is forest-max-size + 2 * merge-max-size. But there are many conditions, including Merge policy configuration, the one hour merge window, and long running operations that can cause deleted fragments and obsolete stands to be retained, resulting in larger than expected forest sizes and greater than expected disk space utilization. High Availability, Disaster Recovery, and Database backup / restore solutions will also require additional disk space to be available.
It is always a good idea to give your system enough head room to avoid application or database outages and monitor your disk usage continuously to understand your trends in order to predict when your disk space allocation will be insufficient.
Recovering from low disk space
Migrating to MarkLogic and understanding the 1.5x requirement