Knowledgebase: MarkLogic Server
Pitfalls Running MarkLogic Process as non-root user
30 September 2022 02:54 PM

Introduction

Some customers choose to run MarkLogic without the watchdog process running as root. As this is increasingly becoming a popular topic, there is an additional Knowledgebase article that discusses this in further detail:

Knowledgebase: Start and Stop MarkLogic Server as Non-Root User

The aim of this Knowledgebase article is to recommend some of the modifications you should consider making to the user that is taking the responsibility of running as the root process would have done.

MarkLogic server's root process makes a number of OS-specific settings to allow the product to run optimally. If you choose to make these modifications, this article aims to provide you with enough information to ensure you can match the settings that the server changes.

Points to consider

  • We do not recommend changing the root user.
  • Future upgrades to MarkLogic Server are likely to change what our root process sets up before starting the daemon process.
  • The Linux kernel Out of Memory (OOM) killer is less likely to attempt to terminate a process running as root, so if you're doing this, you should consider having additional monitoring in place to ensure you can react quickly in the event that your watchdog process is killed.

The root MarkLogic process is simply a restarter process, waiting the non-root (daemon) process to exit - and if the daemon process exits abnormally, for any reason, the root process will fork and exec another process under the daemon process. The root process runs no XQuery scripts, opens no sockets, and accesses no database files.

We strongly recommend starting MarkLogic as root, and to let it switch to the non-root user on its own.

When the server initializes, if it initialises with the default root process, it performs some privileged kernel calls to configure sockets, memory, and threads. For example:

  • it allocates huge pages if any are available,
  • increases the number of file descriptors it can use,
  • binds any configured low-numbered socket ports, and
  • requests the capability to runs some of its threads at high priority.

MarkLogic Server will function if it isn't started as root, but it may not perform as well.

Problems Seen by Customers running MarkLogic as a non-root user

1. If non-root user account isn't able authenticate due to any underlying system issue, MarkLogic can't startup properly.  This can result in an endless restart loop of MarkLogic Server. 

Getting started

You should check the following settings which are configured by the root process when MarkLogic first starts.

1. maxproc soft limit

The maxproc soft limit is set to 1024 by default. In /etc/init.d/MarkLogic the following line raises the soft limit to match the hard limit for the current process heirarchy:

ulimit -u `ulimit -Hu`

2. Ensure Huge Pages are assigned correctly

If you see something like this in /var/log/messages

MarkLogic: Linux Huge Pages: shmget(1): Operation not permitted

If you look in /etc/sysctl.conf, you should see (or add) a line:

vm.hugetlb_shm_group = {gid}

Here the {gid} is the group id of the user that runs MarkLogic. Again, it would make sense to ensure that both users (whatever you're using in place of root and daemon) are able to do this.

3. Server HugePages calculations

lower value range
A calculated total of Group Level caches (List Cache + Compressed Cache + Expanded Cache)
upper value range
Take the total from the lower value range and then for each database, add the following:
  • in memory list size
  • in memory tree size
  • in memory range index size * number of defined range indexes
  • in memory reverse index size (if reverse query is enabled)
  • in memory triple index size (if triple positions are enabled)
  • Multiply these by the number of assigned local forests [exclude AppServices, Fab, Extensions, Modules,Schemas, Security, Triggers, Last-Login, Meters] + small buffer

4. Additional kernel parameters to be defined in /etc/sysctl.conf

  • shmall
  • shmmax
  • shmmni

The above values influence shared memory handling and these values are set automatically if MarkLogic runs with the default root/daemon settings.

On Redhat (RHEL) these values are pre-defined but not on SuSE. We recommend these values should be updated in sysctl.conf anyway.

First step: get the current PAGE_SIZE by the following cmd call:

getconf PAGE_SIZE

With the PAGE_SIZE you can calculate kernel.shmall as per the instructions below:

kernel.shmall = (HugePages_Total * hugepagesize) / PAGE_SIZE

And you can set kernel.shmmax and kernel.shmmni accordingly:

kernel.shmmax = 17179869184 // 16GB MarkLogic default settings
kernel.shmmni = 32768 // default is 4096 not enough on big RAM systems [this will change the page size returned above but doesn't change the calculation above]

5. Configure vm.hugetlb_shm_group

In case MarkLogic runs under a different user ID some more parameters needs to be added to /etc/sysctl.conf:

vm.hugetlb_shm_group=gid of hugetlb group [group of user id]

6. Configuring limits

You can also set memory limits in /etc/security/limits.conf

username soft memlock (1024*1024*Huge Pages in MB)
username hard memlock (1024*1024*Huge Pages in MB)

7. Configure / increase the vm.max_map_count

The vm.max_map_count allows for the restriction of the number of individual VMAs (Virtual Memory Areas) that a particular process can use. A Virtual Memory Area is a contiguous area of virtual address space.

The amount of VMAs a process is allowed to create as specified by the OS. By default, there are usually around 65530 memory map entries allowed per process.

From the kernel documentation for max_map_count:

This file contains the maximum number of memory map areas a process may have. Memory map areas are used as a side-effect of calling malloc, directly by mmap and mprotect, and also when loading shared libraries. While most applications need less than a thousand maps, certain programs, particularly malloc debuggers, may consume lots of them, e.g., up to one or two maps per allocation. The default value is 65536.

See: https://kernel.org/doc/Documentation/sysctl/vm.txt

Our recommendation is that this value can be safely doubled or even quadrupled where modern hardware is taken into consideration:

sysctl vm.max_map_count=262120

For this step, this is more important for hosts that have a larger amount of RAM. If you are setting up hosts with 256GB RAM or greater, this change is really worth considering.

8. Configure SOMAXCONN

Linux SOMAXCONN parameter defines the maximum number of backlog value MarkLogic process allowed to pass to socket listen. Different Linux platforms (RHEL/CentOS) or even different versions of Linux may have different default SOMAXCONN value.
MarkLogic default backlog value for application servers is 512; However, Linux platform have lower SOMAXCONN value than MarkLogic requested higher backlog, then MarkLogic requested backlog will not be respected by Linux.  MarkLogic when started as root user, will go through each application server and find the max backlog value, and set the Linux SOMAXCONN value to match the highest backlog value.
One can set the SOMAXCONN to max value of any application server backlog value manually using below.
sysctl -w net.core.somaxconn=512 

9. Socket buffer rmem_max and wmem_max

Linux parameter defines the max send buffer size (wmem) and receive buffer size (rmem) for TCP ports. In other word, this parameter set the amount of memory that is allocated for each TCP socket when it is opened or created while transferring files. For more efficient parallel job performance, MarkLogic sets buffer values based on platform hardware RAM size during the startup when started as a root as below.

RAM> 32GB   Read/Write butter size  2048 KB (262144 bytes)
RAM> 8GB     Read/Write butter size  1024 KB
RAM> 4GB     Read/Write butter size    512 KB
RAM<= 4GB   Read/Write butter size    128 KB

One can set the rmem_max and wmem_max values for platform with RAM>32GB manually using below.

sysctl -w net.core.rmem_max=262144
sysctl -w net.core.wmem_max=262144

10. Linux swappiness and Dirty background ratio

MarkLogic sets Linux swappiness and dirty background ratio parameters during startup. When starting as non-root, Linux swappiness and dirty background ratio should be set as per KB "Linux Swappiness".

Further reading

It is recommended that you also read this Knowledgebase article which covers running MarkLogic as a non-root user:

Knowledgebase - Start and Stop MarkLogic Server as Non-root User

Our documentation also covers running the main MarkLogic process (daemon by default) as a different user:

Documentation: Configuring MarkLogic Server on UNIX Systems to Run as a Non-daemon User

(2 vote(s))
Helpful
Not helpful

Comments (0)