Group caches and Linux huge pages
17 August 2022 05:00 PM
This article discusses the MarkLogic group-level caches and Linux Huge Page configurations.
MarkLogic utilizes caches to increase retrieval performance of frequently-accessed objects. In particular, MarkLogic caches:
1. Expanded trees (Expanded Tree Cache)
On any groups that have app servers configured for your application (E-nodes), the Expanded Tree Cache is used to hold frequently-accessed XML documents. This cache is used as workspace for holding the result set of a particular query. MarkLogic recommends that most customers set the Expanded Tree Cache size to 1/8th of the physical memory on the server, up to the maximum allowed size.
For groups that only manage forest content and do not have app servers configured (D-nodes), the Expanded Tree Cache is used only during the process of reindexing content. The cache size should be set to 1024 for D-nodes.
2. Compressed trees (Compressed Tree Cache)
On any groups that do not manage forest content (E-nodes), the Compressed Tree Cache is unused, and should be set to 128.
For groups that manage forest content (D-nodes), the Compressed Tree Cache is used to hold recently-accessed XML content in a compressed form. Its purpose is to minimize random disk reads for frequently-accessed content. MarkLogic recommends that most customers set the Compressed Tree Cache size to 1/16th of the physical memory on the server, up to the maximum allowed size.
3. Lists (List Cache)
On any groups that do not manage forest content (E-nodes), the List Cache is unused, and should be set to 128.
For groups that manage forest content (D-nodes), the List Cache is used to hold recently-accessed index termlists. Its purpose is to minimize disk reads for frequently-accessed index terms, which are used for almost every MarkLogic XQuery. MarkLogic recommends that most customers set the List Cache size to 1/8th of the physical memory on the server, up to the maximum allowed size.
4. Triples (Triple Cache)
The triple cache holds blocks of compressed triples from disk.
If a cache page has not been accessed after the given amount of time, it's released from the cache. It is flushed using a least recently used algorithm, so the cache memory shrinks as pages in the cache time out.
See Group Level Cache Settings based on RAM for more information on default settings and maximums.
If you have logging set at Debug level, the error log will give information on the cache sizes at startup:
Rule of Thirds
By default, MarkLogic Server will allocate roughly one third of physical memory to the aforementioned caches, but the server will try to utilize as much memory as possible. The "Rule of Thirds" provides a conceptual explanation of how MarkLogic uses memory on a server:
It is very common for Linux servers running MarkLogic to show high memory utilization. In fact, it is desirable to have MarkLogic utilize much of the memory on the server. However, the server should use very little swap, as that will have a severe negative impact on performance. Adhering to the Rule of Thirds should generally ensure that a server is properly sized, and any cases of memory-related performance degradations should be compared against this rule to identify improper sizing.
MarkLogic server memory use falls into two major categories: large block and small block. Caches and in-memory stands look for large blocks of contiguous memory space, while range indexes, workspace memory, and the Linux filesystem buffer utilize smaller blocks of memory. In order to efficiently allocate the large blocks of memory for the group-level caches and in-memory stands, MarkLogic recommends the usage of Linux Huge Pages. Instead of the kernel allocating 4k pages of memory, huge pages are 2048k in size and can be quickly allocated for larger blocks of memory. At a minimum, MarkLogic recommends allocating enough huge pages to cover the group-level caches (roughly one third of physical memory). The upper end of recommended huge pages includes both the caches and in-memory stands.
The Installation Guide for All Platforms offers the following guidelines for setting up Linux Huge pages:
On Linux systems, MarkLogic recommends setting Linux Huge Pages to 3/8 the size of your physical memory, and should be configured to reserve the space at boot time. For details on setting up Huge Pages, see the following Red Hat Enterprise Linux (RHEL) KB:
How can I configure huge pages in Red Hat Enterprise Linux
If you have Huge Pages set up on a Linux system, your swap space on that machine should be equal to the size of your physical memory minus the size of your Huge Page (because Linux Huge Pages are not swapped), or 32GB, whichever is lower. For example, if you have 64 GB of physical memory, and Huge Pages are set to 24 GB, then lower of (64-24) GB or 32 GB being 32GB is the required swap space.
At system startup on Linux machines, MarkLogic Server logs a message to the
Linux Huge Pages and Transparent Huge Pages
Group Level Cache Based on RAM size
Knowledgebase: Memory Consumption Logging and Status
Knowledgebase: RAMblings - Opinions on Scaling Memory in MarkLogic Server