Community

MarkLogic 10 and Data Hub 5.0

Latest MarkLogic releases provide a smarter, simpler, and more secure way to integrate data.

Read Blog →

Company

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up →

 
Knowledgebase: Performance Tuning
Group caches and Linux huge pages
07 April 2020 12:09 PM

SUMMARY

This article discusses the MarkLogic group-level caches and Linux Huge Page configurations.

Group Caches

MarkLogic utilizes caches to increase retrieval performance of frequently-accessed objects. In particular, MarkLogic caches:

1. Expanded trees (Expanded Tree Cache)

On any groups that have app servers configured for your application (E-nodes), the Expanded Tree Cache is used to hold frequently-accessed XML documents. This cache is used as workspace for holding the result set of a particular query. MarkLogic recommends that most customers set the Expanded Tree Cache size to 1/8th of the physical memory on the server.

For groups that only manage forest content and do not have app servers configured (D-nodes), the Expanded Tree Cache is used only during the process of reindexing content. The cache size should be set to 1024 for D-nodes.

2. Compressed trees (Compressed Tree Cache)

On any groups that do not manage forest content (E-nodes), the Compressed Tree Cache is unused, and should be set to 128.

For groups that manage forest content (D-nodes), the Compressed Tree Cache is used to hold recently-accessed XML content in a compressed form. Its purpose is to minimize random disk reads for frequently-accessed content. MarkLogic recommends that most customers set the Compressed Tree Cache size to 1/16th of the physical memory on the server.

3. Lists (List Cache)

On any groups that do not manage forest content (E-nodes), the List Cache is unused, and should be set to 128.

For groups that manage forest content (D-nodes), the List Cache is used to hold recently-accessed index termlists. Its purpose is to minimize disk reads for frequently-accessed index terms, which are used for almost every MarkLogic XQuery. MarkLogic recommends that most customers set the List Cache size to 1/8th of the physical memory on the server.

Rule of Thirds

By default, MarkLogic Server will allocate roughly one third of physical memory to the aforementioned caches, but the server will try to utilize as much memory as possible. The "Rule of Thirds" provides a conceptual explanation of how MarkLogic uses memory on a server:

  • One third of physical memory for MarkLogic group-level caches
  • One third of physical memory for in-memory content (range indexes and in-memory stands)
  • One third of physical memory for workspace, app server overhead, and Linux filesystem buffer

It is very common for Linux servers running MarkLogic to show high memory utilization. In fact, it is desirable to have MarkLogic utilize much of the memory on the server. However, the server should use very little swap, as that will have a severe negative impact on performance. Adhering to the Rule of Thirds should generally ensure that a server is properly sized, and any cases of memory-related performance degradations should be compared against this rule to identify improper sizing.

Huge Pages

MarkLogic server memory use falls into two major categories: large block and small block. Caches and in-memory stands look for large blocks of contiguous memory space, while range indexes, workspace memory, and the Linux filesystem buffer utilize smaller blocks of memory. In order to efficiently allocate the large blocks of memory for the group-level caches and in-memory stands, MarkLogic recommends the usage of Linux Huge Pages. Instead of the kernel allocating 4k pages of memory, huge pages are 2048k in size and can be quickly allocated for larger blocks of memory. At a minimum, MarkLogic recommends allocating enough huge pages to cover the group-level caches (roughly one third of physical memory). The upper end of recommended huge pages includes both the caches and in-memory stands.

The Installation Guide for All Platforms offers the following guidelines for setting up Linux Huge pages:

On Linux systems, MarkLogic recommends setting Linux Huge Pages to 3/8 the size of your physical memory, and should be configured to reserve the space at boot time. For details on setting up Huge Pages, see the following Red Hat Enterprise Linux (RHEL) KB:

How can I configure huge pages in Red Hat Enterprise Linux

If you have Huge Pages set up on a Linux system, your swap space on that machine should be equal to the size of your physical memory minus the size of your Huge Page (because Linux Huge Pages are not swapped), or 32GB, whichever is lower. For example, if you have 64 GB of physical memory, and if you have Huge Pages set to 24 GB, then you need swap space of 40 GB (64 - 24).

At system startup on Linux machines, MarkLogic Server logs a message to the ErrorLog.txt file showing the Huge Page size, and the message indicates if the size is below the recommended level."

Further Reading

Linux Huge Pages and Transparent Huge Pages

(10 vote(s))
Helpful
Not helpful

Comments (0)