Community

MarkLogic 10 and Data Hub 5.0

Latest MarkLogic releases provide a smarter, simpler, and more secure way to integrate data.

Read Blog →

Company

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up →

 
Knowledgebase: MarkLogic Server
Pitfalls Running MarkLogic Process as non-root user
04 October 2017 10:38 AM

Introduction

Some customers choose to run MarkLogic without the watchdog process running as root. As this is increasingly becoming a popular topic, there is an additional Knowledgebase article that discusses this in further detail:

Knowledgebase: Start and Stop MarkLogic Server as Non-Root User

The aim of this Knowledgebase article is to recommend some of the modifications you should consider making to the user that is taking the responsibility of running as the root process would have done.

MarkLogic server's root process makes a number of OS-specific settings to allow the product to run optimally. If you choose to make these modifications, this article aims to provide you with enough information to ensure you can match the settings that the server changes.

Points to consider

  • We do not recommend changing the root user.
  • Future upgrades to MarkLogic Server are likely to change what our root process sets up before starting the daemon process.
  • The Linux kernel Out of Memory (OOM) killer is less likely to attempt to terminate a process running as root, so if you're doing this, you should consider having additional monitoring in place to ensure you can react quickly in the event that your watchdog process is killed.

The root MarkLogic process is simply a restarter process, waiting the non-root (daemon) process to exit - and if the daemon process exits abnormally, for any reason, the root process will fork and exec another process under the daemon process. The root process runs no XQuery scripts, opens no sockets, and accesses no database files.

We strongly recommend starting MarkLogic as root, and to let it switch to the non-root user on its own.

When the server initializes, if it initialises with the default root process, it performs some privileged kernel calls to configure sockets, memory, and threads. For example:

  • it allocates huge pages if any are available,
  • increases the number of file descriptors it can use,
  • binds any configured low-numbered socket ports, and
  • requests the capability to runs some of its threads at high priority.

MarkLogic Server will function if it isn't started as root, but it may not perform as well.

Problems Seen by Customers running MarkLogic as a non-root user

1. If non-root user account isn't able authenticate due to any underlying system issue, MarkLogic can't startup properly.  This can result in an endless restart loop of MarkLogic Server. 

Getting started

You should check the following settings which are configured by the root process when MarkLogic first starts.

1. maxproc soft limit

The maxproc soft limit is set to 1024 by default. In /etc/init.d/MarkLogic the following line raises the soft limit to match the hard limit for the current process heirarchy:

ulimit -u `ulimit -Hu`

2. Ensure Huge Pages are assigned correctly

If you see something like this in /var/log/messages

MarkLogic: Linux Huge Pages: shmget(1): Operation not permitted

If you look in /etc/sysctl.conf, you should see (or add) a line:

vm.hugetlb_shm_group = {gid}

Here the {gid} is the group id of the user that runs MarkLogic. Again, it would make sense to ensure that both users (whatever you're using in place of root and daemon) are able to do this.

3. Server HugePages calculations

lower value range
A calculated total of Group Level caches (List Cache + Compressed Cache + Expanded Cache)
upper value range
Take the total from the lower value range and then for each database, add the following:
  • in memory list size
  • in memory tree size
  • in memory range index size * number of defined range indexes
  • in memory reverse index size (if reverse query is enabled)
  • in memory triple index size (if triple positions are enabled)
  • Multiply these by the number of assigned local forests [exclude AppServices, Fab, Extensions, Modules,Schemas, Security, Triggers, Last-Login, Meters] + small buffer

4. Additional kernel parameters to be defined in /etc/sysctl.conf

  • shmall
  • shmmax
  • shmmni

The above values influence shared memory handling and these values are set automatically if MarkLogic runs with the default root/daemon settings.

On Redhat (RHEL) these values are pre-defined but not on SuSE. We recommend these values should be updated in sysctl.conf anyway.

First step: get the current PAGE_SIZE by the following cmd call:

getconf PAGE_SIZE

With the PAGE_SIZE you can calculate kernel.shmall as per the instructions below:

kernel.shmall = (HugePages_Total * hugepagesize) / PAGE_SIZE

And you can set kernel.shmmax and kernel.shmmni accordingly:

kernel.shmmax = 17179869184 // 16GB MarkLogic default settings
kernel.shmmni = 32768 // default is 4096 not enough on big RAM systems [this will change the page size returned above but doesn't change the calculation above]

5. Configure vm.hugetlb_shm_group

In case MarkLogic runs under a different user ID some more parameters needs to be added to /etc/sysctl.conf:

vm.hugetlb_shm_group=gid of hugetlb group [group of user id]

6. Configuring limits

You can also set memory limits in /etc/security/limits.conf

username soft memlock (1024*1024*Huge Pages in MB)
username hard memlock (1024*1024*Huge Pages in MB)

7. Configure / increase the vm.max_map_count

The vm.max_map_count allows for the restriction of the number of individual VMAs (Virtual Memory Areas) that a particular process can use. A Virtual Memory Area is a contiguous area of virtual address space.

The amount of VMAs a process is allowed to create as specified by the OS. By default, there are usually around 65530 memory map entries allowed per process.

From the kernel documentation for max_map_count:

This file contains the maximum number of memory map areas a process may have. Memory map areas are used as a side-effect of calling malloc, directly by mmap and mprotect, and also when loading shared libraries. While most applications need less than a thousand maps, certain programs, particularly malloc debuggers, may consume lots of them, e.g., up to one or two maps per allocation. The default value is 65536.

See: https://kernel.org/doc/Documentation/sysctl/vm.txt

Our recommendation is that this value can be safely doubled or even quadrupled where modern hardware is taken into consideration:

sysctl vm.max_map_count=262120

For this step, this is more important for hosts that have a larger amount of RAM. If you are setting up hosts with 256GB RAM or greater, this change is really worth considering.

Further reading

It is recommended that you also read this Knowledgebase article which covers running MarkLogic as a non-root user:

Knowledgebase - Start and Stop MarkLogic Server as Non-root User

Our documentation also covers running the main MarkLogic process (daemon by default) as a different user:

Documentation: Configuring MarkLogic Server on UNIX Systems to Run as a Non-daemon User

(2 vote(s))
Helpful
Not helpful

Comments (0)