MarkLogic 10 and Data Hub 5.0

Latest MarkLogic releases provide a smarter, simpler, and more secure way to integrate data.

Read Blog →


Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up →

Most popular articles 
Newest articles 


This article discusses the "Stand s has n fragments" messages that may appear in error log or system log files. These messages can appear at different log levels (Notice, Warning, Error, Critical, Alert, and Emergency) as the severity will increase as the number of fragments in a single stand increases, indicating increasing risk. 

Fragment counts and their corresponding Log levels:

 In MarkLogic 8 and MarkLogic 9, the fragment count thresholds within a single stand for the log levels are:  

  • At around 84 million fragments, MarkLogic Server will report this with a Notice level log message
  • At around 109 million fragments, MarkLogic Server will report this with a Warning level log message
  • At around 134 million fragments, MarkLogic Server will report this with an Error level log message
  • At around 159 million fragments, MarkLogic Server will report this with a Critical level log message
  • At around 184 million fragments, MarkLogic Server will report this with an Alert level log message
  • At around 209 million fragments, MarkLogic Server will report this with an Emergency level log message

At 256 million fragments your data may be at risk of becoming corrupted due to integer overflow. The log level reflects the risk and is intended to get your attention at higher stand fragment counts.

Emergency level log entries

Consider an example Error Log entry where the following information is observed:

2015-06-20 10:13:39.746 Emergency: Stand /space/Data/Forests/App-Services/00000fae has 213404541 fragments.

At all levels, the messages should be monitored and managed, but at the Emergency level, you will need to take corrective action soon.  

Corrective Actions

Note that it is the number of fragments in a stand that is important, not the number of fragments in a forest.  The actions that you take should act to decrease the size of stands in a forest. 

Some of the actions you can take:

  • If not already configured, MarkLogic databases should be configured with a merge-max-size value smaller than the current forest size (Databases created in MarkLogic 7 or MarkLogic 8 have a default value of 32GB).
  • If merge-max-size already configured for the database, decrease the value of this setting. 


Occasionally, you might see an "Invalid Database Online Event" error in your MarkLogic Server Error Log. This article will help explain what this error means, as well as provide some ways to resolve it.

What the Error Means

The XDMP-INVDATABASEONLINEEVENT means that something went wrong during the database online trigger event. There are many situations that can trigger this event, such as a server-restart, or when any of the databases has a change in configuration). In most cases, this error is harmless - it is just giving you information.

Resolving the Error

We often see this error when the user id that is baked into the database online event created by CPF is no longer valid, and the net effect is that CPF's restart handling is not functioning. We believe reinstalling CPF should fix this issue.

If re-installing CPF does not resolve this error, you will want to further analyze and debug the code that is invoked by the restart trigger.





Upon boot of CentOS 6.3, MarkLogic users may encounter the following warning:

:WARNING: at fs/hugetlbfs/inode.c:951 hugetlb_file_setup+0x227/0x250() (Not tainted)

MarkLogic 6.0 and earlier have not been certified to run on CentOS 6.3. This message is due to MarkLogic using a resource that has been deprecated in CentOS 6.3. The message can be ignored, as it will not cause any issues with MarkLogic performance. Although this example points specifically points out CentOS 6.3, this message could potentially occur in other MarkLogic/Linux combinations.


Some customers have reported seeing kernel level messages like this in their /var/log/messages file:

Jan 31 17:41:46 ml-c1-u3 kernel: [17467686.201893] TCP: Possible SYN flooding on port 7999. Sending cookie

This may also be seen as part of the output from a call to dmesg and could possibly follow a stack trace, for example:

[<ffffffff810d3d27>] ? audit_syscall_entry+0x1d7/0x200 
[<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b possible SYN flooding on port 7999. Sending cookies. possible SYN flooding on port 7999. Sending cookies.

What does it mean?

The tcp_syncookies configuration is likely enabled on your system.  You can check for this by viewing the contents of /proc/sys/net/ipv4/tcp_syncookies

$ cat /proc/sys/net/ipv4/tcp_syncookies

If the value returned is 1 (as per the example above), then tcp_syncookies are enabled for this host

Possible SYN flooding

A SYN flood is a form of denial-of-service attack in which an attacker sends a succession of SYN requests to a target's system in an attempt to consume enough server resources to make the system unresponsive to legitimate traffic.

Source: Wikipedia

You would expect to see evidence of a SYN flood when a "flood" of TCP SYN messages are sent to the host. Under normal operation, your kernel should acknowledge these incoming SYNs with a SYN-ACK, are not followed by ACK messages from the client. The process (or pattern) described above is known as Three Way Handshaking. The goal of this is to firmly establish communication on both the server and the client.

In the event of a real attack, a SYN flood will most likely originate from a fake IP address; during an attack, the client performing the "flood" is not waiting for the SYN-ACK response back from the server it is attacking.

Under normal operation (i.e. without SYN cookies), TCP connections will be kept half-open after receiving the first SYN because of the handshake mechanism used to establish TCP connections. Due to the fact that there is a limit to how many half open connections that the kernel can maintain at any given time, this is where the problem becomes characterized as an attack.

The term half-open refers to TCP connections whose state is out of synchronization between the two communicating hosts, possibly due to a crash of one side. A connection which is in the process of being established is also known as embryonic connection.

Source: Wikipedia

If SYN cookies are enabled, then the kernel doesn't track half-open connections. Instead it relies on the sequence number in the following ACK datagram that the ACK follows a SYN and a SYN-ACK which establishes full communication between client and server. By ignoring half-open connections, SYN floods are no longer a problem.

In the case of MarkLogic, this message can appear if the rate of incoming messages is perceived to the kernel as being unusually high. In this case, this would not be indicative of a real SYN flooding attack, but to the TCP/IP stack it looks like it exhibits the same characteristics and the kernel responds by reporting a possible (fake) attack.

Notes from the kernel documentation

See the section of the kernel documentation for tcp_syncookies - BOOLEAN for some further information regarding this feature:

The syncookies feature attempts to protect a socket from a SYN flood attack. This should be used as a last resort, if at all. This is a violation of the TCP protocol, and conflicts with other areas of TCP such as TCP extensions. It can cause problems for clients and relays. It is not recommended as a tuning mechanism for heavily loaded servers to help with overloaded or misconfigured conditions. For recommended alternatives see tcp_max_syn_backlog, tcp_synack_retries, and tcp_abort_on_overflow.

Further down, they state:

Note, that syncookies is fallback facility. It MUST NOT be used to help highly loaded servers to stand against legal connection rate. If you see SYN flood warnings in your logs, but investigation shows that they occur because of overload with legal connections, you should tune another parameters until this warning disappear. See: tcp_max_syn_backlog, tcp_synack_retries, tcp_abort_on_overflow.


Tuning on a MarkLogic Server

Any dmesg output indicating "possible SYN flooding on port 7999" may appear in tandem with very heavy XDQP (TCP) traffic within a MarkLogic cluster - this link provides further detail in relation to a similar scenario with Apache HTTP server. You can tune your TCP settings to try to avoid SYN Flooding error messages, but SYN flooding can also be a symptom of a system under resource pressure. 

If a MarkLogic Server instance sees SYN flooding message on a system that is otherwise healthy and the messages occur because of normal and expected MarkLogic Server communications, you may want to increase the backlog (tcp_max_syn_backlog) or adjust some of the other settings (such as tcp_synack_retries, tcp_abort_on_overflow). However, if SYN Flooding message only occurs on a system that is under resource pressures, then solving the resource issue should be the focus.  

How to disable SYN cookies

You can disable syncookies by adding the following line to /etc/sysctl.conf:

# disable TCP SYN Flood Protection
net.ipv4.tcp_syncookies = 0

Also note that the new setting will take only effect after a host reboot.

Further reading


After upgrading to MarkLogic 10.x from any of the previous versions of MarkLogic, examples of the following Warning and Notice level messages may be observed in the ErrorLogs:

Warning: Lexicon '/var/opt/MarkLogic/Forests/Documents/00000006/c4ea1b602ee84a34+Lexicon' collation='' out of order

Notice: Repairing out of order lexicon /var/opt/MarkLogic/Forests/Documents/00000006/c4ea1b602ee84a34+Lexicon collation '' version 0 to 602

Warning: String range index /space/Forests/Documents/0006ef0e/c0dc932d1b4bcaae-37c6e3905909f64e+string collation '' out of order. 

Notice: Repairing out of order string range index /space/Forests/Documents/0006ef0e/c0dc932d1b4bcaae-37c6e3905909f64e+string collation '' version 0 to 602

Starting with MarkLogic 10.0, the server now automatically checks for any lexicons or string range indexes that may be in need of repair.  Lexicons and range indexes perform "self-healing" in non-read-only stands whenever a lexicon/range index is opened within the stand.


This is due to changes introduced to the behavior of MarkLogic's root collation.

Starting with MarkLogic 10.0, the root collation has been modified, along with all collations that derive from it, which means there may be some subtle differences in search ordering.

For more information on the specifics of these changes, please refer to

This helps the server to support newer collation features, such as reordering entire blocks of script characters (for example: Latin, Greek, and others) with respect to each other. 

Implementing these changes has, under some circumstances, improved the performance of wildcard matching by more effectively limiting the character ranges that search scans (and returns) for wildcard-based matching.

Based on our testing, we believe this new ordering yields better performance in a number of circumstances, although it does create the need to perform full reindexing of any lexicon or string range index using the root collation.

MarkLogic Server will now check lexicons and string range indexes and will try to repair them where necessary.  During the evaluation, MarkLogic Server will skip making further changes if any of the following conditions apply:

(a) They are already ordered according to the latest specification provided by ICU (1.8 at the time of writing)

(b) MarkLogic Server has already checked the stand and associated lexicons and indexes

(c) The indexes use codepoint collation (in which case, MarkLogic Server will be unable to change the ordering).

Whenever MarkLogic performs any repairs, it will always log a message at Notice level to inform users of the changes made.  If for any reason, MarkLogic Server is unable to make changes (e.g. a forest is mounted as read-only), MarkLogic will skip the repair process and nothing will be logged.

As these changes have been introduced from MarkLogic 10 onwards, you will most likely observe these messages in cases where recent upgrades (from prior releases of the product) have just taken place.

Repairs are performed on a stand by stand basis, so if a stand does not contain any values that require ordering changes, you will not see any messages logged for that stand.

Also, if any ordering issues are encountered during the process of a merge of multiple stands, there will only be one message logged for the merge, not one for each individual stand involved in that merge.


  • Repairs will take place for any stand that has been found to have a lexicon or string index that has an out-of-order and out-of-date (e.g. utilising a collation described by an earlier version of ICU) collation, unless that stand is mounted as read only.
  • Any repair will generate Notice messages when maintenance takes place.
  • Whenever a lexicon or string Range index is opened, this check/repair will take place for any string range index; lexicon call (e.g. cts:values); range query (e.g. cts:element-range-query) and during merges merges.
  • The check looking for ICU version mismatches plus items that are out-of-order, so any lexicon / string range index with older ordering (and which requires no further changes), no further action will be taken for that stand.

Known side effects

If the string range index or lexicon is very large, repairing can cause some performance overhead and may impact search performance during the repair process.


These messages can be avoided by issuing a full reindex of your databases immediately after performing your upgrade to MarkLogic 10.


Forests in MarkLogic Server may be in one of several mount states. On mounting, local disk failover forests or database replication forests should both eventually reach the sync replicating or async replicating state. There are occasions, however, where local disk failover or database replication forests will sometimes get stuck in the wait replication state. This knowledgebase article will itemize many of these wait replication scenarios, as well as the operational tactics to use in response. 

Wait replication scenarios

Wait replication as a result of lack of quorum

A quorum in MarkLogic server represents more than 50% of the total number nodes of the cluster. It's very important to note the total number of nodes - regardless of group membership, forest assignment, whether nodes are running/not running, etc. - if a machine exists in the hosts.xml configuration file and in the list of hosts in the Admin UI, it contributes to the total count.

While it's possible to run a MarkLogic cluster with only a subset of the configured nodes up, it's not a recommended configuration. In addition, if the number of active nodes in your cluster falls below the greater than 50% quorum threshold, you might run into forests in the wait replication state due to the lack of quorum.

What to do about it? You'll need to alter your cluster's configuration to meet the quorum requirement. That can mean either removing missing nodes from the cluster's configuration (essentially telling the cluster to stop looking for those missing nodes), or alternatively bringing up nodes that are currently part of the configuration, but not actively returning heartbeats (effectively letting the cluster see nodes it expects to be there). 

You can read more about quorum at the following knowledgebase articles:

Wait replication as a result of mixed file permissions

The root MarkLogic process is simply a restarter process which waits for the non-root (daemon) process to exit. If the daemon process exits abnormally, for any reason, the root process will fork and exec another process under the daemon process. The root process runs no XQuery scripts, opens no sockets, and accesses no database files. While it's possible to run the MarkLogic process as a non-root user, be very careful about forest file permissions - if your configured MarkLogic user doesn't have the necessary permissions, you might see wait replication and an inability to correctly failover to local disk failover forests when necessary - in which case you'll need to set your forest file permissions correctly to move forward. You can read more about running the MarkLogic process as a non-root user at:

Wait replication due to upgrading in the wrong order

Per our documentation, when upgrading you must first upgrade your replica environment, then subsequently upgrade your master environment.

if your cluster upgrades aren’t done in the correct order, you’re going to need to:

  1. Decouple your master and replica clusters, then stop the replica cluster

  2. Edit your replica cluster's databases.xml to remove entries with Security database replication

  3. Start the replica cluster, beginning with the node that hosts the Security forest

  4. Manually run http://(hostname of node hosting the Security forest):8001/security-upgrade-go.xqy?force=true

  5. Re-couple your master and replica clusters

You can read more about upgrading environments using database replication at:

Wait replication because you downgraded

MarkLogic Server does not support downgrades. If you do attempt to downgrade your installation, your replica forests will be stuck in wait replication.

What to do about it? As in the case of upgrading in the wrong order, you'll need to manually run http://(hostname of node hosting the Security forest):8001/security-upgrade-go.xqy?force=true. You can read more about MarkLogic Server and downgrades at:

Wait replication because your master and replica forest names don't match

By default, the "Connect Forests by Name" option is set to true. This means the server has certain expectations around how master and replica forests should be named

What to do about it? Set "Connect Forests by Name" to false, then manually connect master and replica forests. You can read more about wait replication due to forest name mismatch at:

Wait replication as a result of merge blackouts (completely disabled merges)

What is merging and why do we need merge blackouts?

MarkLogic Server does lazy deletes, which marks documents obsolete (but doesn't actually delete them). Merges are when obsolete documents are actually deleted - in bulk, while also optimizing your data. Merge blackouts prevent this deferred deletion and optimization from happening. Merge blackouts can also sometimes result in wait replication. Consider a database that has both master and local disk failover forests where you have configured a merge blackout with the “disable merges completely” option (instead of “limit merges to” option). If a node failure on any of the nodes holding some of these forests were to occur during the merge blackout period, as soon as the failed node comes back online, all the forests associated with that specific node go into a “wait replication” state until the merge blackout period ends or is manually removed.


  • Avoid completely disabling merges
  • If you do need to control merges, it's much better to set the maximum merge size in your blackout to a smaller number (“limit merges to” option)


When configuring database replication, it is important to note that the Connect Forests by Name field is true by default. This works great because, when new forests of the same name are later added to the Master and Replica databases, they will be automatically configured for Database Replication.

The issue

The problem arises when you use replica forest names that do not match the original Master forest names. In that case, you may find that failover events cause forests to get stuck in the Wait Replication state. The usual methods of failing back to the designated masters will not work - restarting the replicas will not work, and neither will shutting down cluster/removing labels/restarting cluster.


In this case, the way to fix the issue is to set Connect Forests by Name to false, and then you must manually connect the Master forests on the local cluster to the Replica forests on the foreign cluster, as described in the documentation: Connecting Master and Replica Forests with Different Names.

it is worth noting that, starting MarkLogic 7, you are also allowed to rename the replica forests. Once you rename the replica forests to the same name as the forest name of the designated master database (e.g., the Security database should have a Security forest in both the master and replica), then they will be automatically configured for Database Replication, as expected.


This article will show you how to add a Fast Data Directory (FDD) to an existing forest.


The fast data directory stores transaction journals and stands. When the directory becomes full, larger stands will be merged into the data directory. Once the size of the fast data directory approaches its limit, then stands are created in the data directory.

Although it is not possible to add an FDD path to a currently-existing forest, it is possible to do the following:

1. Destroy an existing forest configuration (while preserving the data)

2. Recreate a forest with the same name and data, with an FDD added


The queries below illustrate steps one and two of the process. Note that you can also do this with Admin UI.

The query below will delete the forest configurations but not data.


1. Schedule a downtime window for this procedure (DO NOT DO THIS ON A LIVE PRODUCTION SYSTEM)

2. Ensure that all ingestion and merging has stopped

3. Just to be on safer side, take a Backup of the forest first before applying this in Production

3. Detach the forest before running these queries

1) Use the following API to delete an existing forest configuration

NOTE: make sure to set the $delete-data parameter to false().

$config as element(configuration),
$forest-ids as xs:unsignedLong*,
$delete-data as xs:Boolean {=FALSE}
) as element(configuration)

2) Use the following API to create a new forest  pointing to the old data directory which includes the configured FDD:

$config as element(configuration),
$forest-name as xs:string,
$host-id as xs:unsignedLong,
$data-directory as xs:string?,
[$large-data-directory as xs:string?],
[$fast-data-directory as xs:string?]
) as element(configuration)

Here's an example query that uses these APIs:

xquery version "1.0-ml";

declare namespace html = "";

import module namespace admin = "" 
at "/MarkLogic/admin.xqy";

let $config := admin:get-configuration()

(: preserve some path values from the old forest :)

let $forest-name := "YOUR_FOREST_NAME"

let $new-fast-data := "YOUR_NEW_FAST_DATA_DIR"

let $old-data := admin:forest-get-data-directory($config, admin:forest-get-id($config, $forest-name))

let $old-large-data := admin:forest-get-large-data-directory($config, admin:forest-get-id($config, $forest-name))

$config, admin:forest-get-id($config, $forest-name),

let $config1 := admin:get-configuration()

You can create and attach the forest in a single transaction. This is also possible using the admin UI (as two separate transactions); i.e., deleting only configuration of forest without data.

After attaching the forest, please reindex and data will then migrate to FDD. Note that the sample query needs to be executed on the host where the forest resides.




MarkLogic has shipped with a REST API since MarkLogic 7.

In MarkLogic 8 the REST API was vastly expanded, allowing ways for MarkLogic Database administrators to manage almost all common MarkLogic administration tasks over an HTTP connection to MarkLogic's REST endpoints.

This Knowledgebase article will cover some examples of common administration tasks and will show some working examples to give you a taste of what can be done if you're using the latest version of MarkLogic Server.

While there are a significant number of examples throughout our extensive documentation in this area, many of these make use of CURL. In this Knowledgebase article, we're going to use XQuery calls to demonstrate how the payloads are structured.

Creating a backup using a call to the REST API (XQuery)

In the example code below, we demonstrate a call that will perform a backup of the Documents forest which places the backup in the /tmp directory.

Running the query in the above code example will return a response (in JSON format) containing a job ID for the requested task:

"job-id": "4903378997555340415", 
"host-name": "yourhostnamehere"

The next example will demonstrate a status check for a given job ID

Query the status of an active or recent job

The above query will return a response that would look like this:

"job-id": "4903378997555340415", 
"host-name": "yourhostnamehere", 
"status": "completed"

Further reading on the MarkLogic REST API:

Alternatives to Configuration Manager


The MarkLogic Server Configuration Manager provided a read-only user interface to the MarkLogic Admin UI and could be used for saving and restoring configuration settings. The Configuration Manager tool was deprecated starting with MarkLogic 9.0-5, and is no longer available in MarkLogic 10.


There are a number of alternatives to the Configuration Manager. Most of the options take advantage of the MarkLogic Admin API, either directly or behind the scenes. The following is a list of the most commonly used options:

  • Manual Configuration
  • ml-gradle
  • Configuration Management API

Manual Configuration

For a single environment, the following Knowledge base covers the process of Transporting Resources to a New Cluster.


For a repeatable process, the most widely used approach is ml-gradle.

A project would be created in Gradle, with the desired configurations. The project can then be used to deploy to any environment - test, prod, qa etc - creating a known configuration that can be maintained under source control, which is a best practice.

Similar to Configuration Manager, ml-gradle also allows for exporting the configuration of an existing cluster. You can refer to transporting configuration using ml-gradle for more details.

While ml-gradle is an open source community project that is not directly supported, it enjoys very good community and developer support.  The underlying APIs that ml-gradle uses are fully supported by MarkLogic.

Configuration Management API

An additional option is to use the Configuration Management API directly to export and import resources.


Both ml-gradle and the Configuration Management API use the MarkLogic Admin API behind the scenes but, for most use cases, our recommendation is to use ml-gradle rather than writing the same functionality from scratch.

Alternatives to Ops Director


The MarkLogic Ops Director provided a basic dashboard for monitoring the health one or more MarkLogic Server clusters, and sending out basic alerts based on predefined conditions. It has been deprecated starting with MarkLogic 10.0-5, and will no longer be supported as of November 14, 2021. Our experience has shown that our customers are most effective monitoring MarkLogic Servers by integrating commercial off the shelf monitoring tools with our Management APIs.

Monitoring DHS

Note: Our Data Hub Service customers are not impacted by this announcement. One of the benefits of using our Data Hub Service is that the MarkLogic Corporation will manage and monitor the underlying MarkLogic Server instances for you.


There are a number of different alternatives to Ops Director, depending on your requirements, and existing monitoring infrastructure. Ops Director used the Management API to obtain the required information, specifically the /manage/v2/logs endpoint to read the logs and look for any "Critical" or "Error" messages using a Regular Expression (Regex). These endpoints are still available, and could be leveraged by administrators with shell or python scripts, which could also include alerting.

Detecting and Reporting Failover Events

If there is also a requirement to monitor at the Host or Database level there are REST API endpoints available for any scripted solution. Performance related data stored in the Meters database can also be accessed via REST.

The MarkLogic Monitoring History can also be extended to provide some basic visualizations.

Hacking Monitoring History

Commercial Alternatives

If your requirements are more complex than can be easily met by the options above, there are many commercial monitoring solutions that can be used to monitor MarkLogic. These include Elastic/Kibana, Splunk, DataDog and NewRelic, among others. Many organizations are already using enterprise monitoring applications provided by a commercial vendor. Leveraging the existing solutions will typically be the simplest option. If a monitoring solution already exists within your organization, you can check to see if there is an existing plugin, extension or library for monitoring MarkLogic.

If a plugin, extension or library does not already exist, most monitoring solutions also allow for retrieving data from REST endpoints, allowing them to pull metrics directly from MarkLogic even if a there is not a pre-existing solution.

Available Plugins - Extensions - Libraries

Here are a sample of some of the available options that are being used by other customers to monitor their MarkLogic infrastructure. These options are being listed here for reference only. MarkLogic Support does not provide support for any issues encountered using the productions mentioned here. Please refer to the solution vendor, or the github project page for any issues encountered.


MarkLogic Monitoring for Splunk provides configurations and pre-built dashboards that deliver real-time visibility into Error, Access, and Audit log events to monitor and analyze MarkLogic logs with Splunk.


Monitor MarkLogic with Datadog




New Relic

MarkLogic New Relic Plugin

Note: Currently there is a published New Relic Plugin that works with the latest versions of MarkLogic. However, New Relic has decided to deprecate plugins in favor of New Relic Integrations. Currently New Relic has limited plugin access to accounts that have access plugins in the last 30 days, but they plan to discontinue this access in June, 2021.

Other Resources


On Internet Explorer 9 and Internet Explorer 10, application services UI should be run in Compatibility Mode.


When using the Application Services UI in Internet Explorer 9 or Internet Explorer 10, you may notice some minor UI bugs.  These minor UI bugs occur just within MarkLogic Application Services, NOT within application built with it.  These UI bugs can be avoided if you run IE 9 or IE 10 in compatibility view.

Instructions on how to configure compatibility modes in IE 9 or IE 10: 

1. Press ALT-T to bring up the Tools menu
2. On the Tools menu, click 'Compatibility View Settings' 
3. Add the domain to the list of domains to render in compatibility view.


A question that customers frequently ask is for advice on managing backups outside the standard XQuery APIs or the web interface provided by MarkLogic.

This Knowledgebase article demonstrates two approaches to allow you to integrate the backup of a MarkLogic database into your dev-ops workflow by allowing such processes to be scripted or managed outside the product.

Creating a backup using the ReST API

You can use the ReST API to perform a database backup and to check on the status at any given time.

The examples listed below use XQuery to make the calls to the ReST API over http but you could similarly adapt the below examples to work with cURL - examples will also be given for this approach.

The process

Here is an example that demonstrates a backup of the Documents database:

Running this should give you a job id as part of the response (in this example, we're using JSON to format the response but this can easily be changed by modifying the headers elements in the above sample to return application/xml instead):

{"job-id":"8774639830166037592", "host-name":"yourhostnamehere"}

Below is an example that demonstrates checking for the status of a given backup with the job-id given in the first step:

Example: using cURL (instead of XQuery)

Adapting the above examples so they work from cURL instead, you can generate a call that looks like this:

curl -s -X POST  --anyauth -u username:password --header "Content-Type:application/json" -d '{"operation": "backup-database", "backup-dir": "/tmp/backup", "journal-archiving": true, "include-replicas": true}'  http://localhost:8002/manage/v2/databases/Documents\?format\=json

And to check on the status, the cURL payload could be modified to look like this:

{"operation": "backup-status", "job-id" : "8774639830166037592","host-name": "yourhostnamehere"}

Further reading


Customers using the MarkLogic AWS Cloud Formation Templates may encounter a situation where someone has deleted an EBS volume that stored MarkLogic data (mounted at /var/opt/MarkLogic).  Because the volume, and the associated data are no longer available, the host is unable to rejoin the cluster.  

Getting the host to rejoin the cluster can be complicated, but it will typically be worth the effort if you are running an HA configuration with Primary and Replica forests.

This article details the procedures to get the host to rejoin the cluster.

Preparing the New Volume and New Host

The easiest way to create the new volume is using a snapshot of an existing host's MarkLogic data volume.  This saves the work of manually copying configuration files between hosts, which is necessary to get the host to rejoin the cluster.

In the AWS EC2 Dashboard:Elastic Block Store:Volumes section, create a snapshot of the data volume from one of the operational hosts.

Next, in the AWS EC2 Dashboard:Elastic Block Store:Snapshots section, create a new volume from the snapshot in the correct zone and note the new volume id for use later.

(optional) Update the name of the new volume to match the format of the other data volumes

(optional) Delete the snapshot

Edit the Auto Scaling Group with the missing host to bring up a new instance, by increasing the Desired Capacity by 1

This will trigger the Auto Scaling Group to bring up a new instance. 

Attaching the New Volume to the New Instance

Once the instance is online, and startup is complete connect to the new instance via ssh

Ensure MarkLogic is not running, by stopping the service and checking for any remaining processes.

  • sudo service MarkLogic stop
  • pgrep -la MarkLogic

Remove /var/opt/MarkLogic if it exists, and is mounted on the root partition.

  • sudo rm -rf /var/opt/MarkLogic

Edit /var/local/mlcmd and update the volume id listed in the MARKLOGIC_EBS_VOLUME variable to the volume created above.

  • MARKLOGIC_EBS_VOLUME="[new volume id],:25::gp2::,*"

Run mlcmd to attach and mount the new volume to /var/opt/MarkLogic on the instance

  • sudo /opt/MarkLogic/mlcmd/bin/mlcmd init-volumes-from-system
  • Check that the volume has been correctly attached and mounted

Remove contents of /var/opt/MarkLogic/Forests (if they exist)

  • sudo rm -rf /var/opt/MarkLogic/Forests/*

Run mlcmd to sync the new volume information to the DynamoDB table

  • sudo /opt/MarkLogic/mlcmd/bin/mlcmd sync-volumes-to-mdb

Configuring MarkLogic With Empty /var/opt/MarkLogic

If you did not create your volume from a snapshot as detailed above, complete the following steps.  If you created your volume from a snapshot, then skip these steps, and continue with Configuring MarkLogic and Rejoining Existing Cluster

  • Start the MarkLogic service, wait for it to complete its initialization, then stop the MarkLogic service:
    • sudo service MarkLogic start
    • sudo service MarkLogic stop
  • Move the configuration files out of /var/opt/MarkLogic/
    • sudo mv /var/opt/MarkLogic/*.xml /secure/place (using default settings; destination can be adjusted)
  • Copy the configuration files from one of the working instances to the new instance
    • Configuration files are stored here: /var/opt/MarkLogic/*.xml
    • Place a copy of the xml files on the new instance under /var/opt/MarkLogic

Configuring MarkLogic and Rejoining Existing Cluster

Note the host-id of the missing host found in /var/opt/MarkLogic/hosts.xml

  • For example, if the missing host is ip-10-0-64-14.ec2.internal
    • sudo grep "ip-10-0-64-14.ec2.internal" -B1 /var/opt/MarkLogic/hosts.xml

  • Edit /var/opt/MarkLogic/server.xml and update the value for host-id to match the value retrieved above

Start MarkLogic and view the ErrorLog for any issues

  • sudo service MarkLogic start; sudo tail -f /var/opt/MarkLogic/Logs/ErrorLog.txt

You should see messages about forests synchronizing (if you have local disk failover enabled, with replicas) and changing states from wait or async replication to sync replication.  Once all the forests are either 'open' or 'sync replicating', then your cluster is fully operational with the correct number of hosts.

At this point you can fail back to the primary forests on the new instances to rebalance the workload for the cluster.

You can also re-enable xdqp ssl enabled, by setting the value to true on the Group Configuration page, if you disabled the setting as part of these procedures.

Update the Userdata In the Auto Scaling Group

To ensure that the correct volume will be attached if the instance is terminated, the Userdata needs to be updated in a Launch Configuration.

Copy the Launch Configuration associated with the missing host.

Edit the details

  • (optional) Update the name of the Launch Configuration
  • Update the User data variable MARKLOGIC_EBS_VOLUME and replace the old volume id with the id for the volume created above.
    • MARKLOGIC_EBS_VOLUME="[new volume id],:25::gp2::,*"
  • Save the new Launch Configuration

Edit the Auto Scaling Group associated with the new node

Change the Launch Configuration to the one that was just created and save the Auto Scaling Group.

Next Steps

Now that normal operations have been restored, it's a good opportunity to ensure you have all the necessary database backups, and that your backup schedule has been reviewed to ensure it meets your requirements.

Backup/Restore settings for Local Disk Failover

When configuring backups for a database, the 'include replica forests' setting is important  in order to handle forest failover events.   When 'include replica forests' is set to 'true', both the master and the replica forests will also be included in the database backup.

This KB article will go over an example failover scenario, and will show how a scheduled backup/restore works with different 'include replica forests' and 'journal archiving' settings.


Consider a 3 node cluster with hosts Host-A, Host-B and Host-C; and a database 'backup-test' with the following forest assignments: (forests ending with 'p' are primary and those ending with 'r' are replica).  Under normal conditions, the primary forests will be in 'open' state, and the replica forests will be in the 'sync replicating' state.

Host A Host B Host C
forest-1p (open) forest-2p(open) forest-3p(open)
forest-3r (sync replicating) forest-1r (sync replicating) forest-2r (sync replicating)

Failover and Forest states

Now consider what happens when Host-A goes offline. When Host-A's primary forests complete failover, it's replica forests will take over.   The following will be the forest state layout when this happens

Host A Host B Host C
forest-1p (disabled) forest-2p (open) forest-3p (open)
forest-3r (disabled) forest-1r (open) forest-2r (sync replicating)

Backup Examples: 

When 'Include replica Forests' is false and 'Journal Archiving' is true

Forest 1p is disabled, and the corresponding replica forest-1r is now Open because of the failover.  In this case a backup task will not succeed during this time because replica forests have not been configured for backups. The following 'Warning' level message will be logged:

Warning: Not backing up database backup-test because first forest master forest-1p is not available, and replica backups aren't enabled

When Host-A is brought up again, the forest states will be

forest-1p - sync replicating
forest-1r - open

At this time, backups will succeed and because journal archiving is enabled, journals will be written to the backup data.

However, you will not be able to do a "point in time restore' using journal archiving. When the configured master is not the acting master and backup is not enabled for replicas, the following error occurs when a restore to a point in time is attempted :

Operation failed with error message: xdmp:database-restore((xs:unsignedLong("5138652658926200166"), "/space/20160927-1125008228810", xs:dateTime("2016-09-27T11:06:21-07:00"), fn:true(), ()) -- Unable to restore replica forest forest-1r because the master forest forest-1p is not also restored, or is not acting master. Check server logs.

To get past this, the forests need to be failed back in order to make the 'configured master' same as the 'acting master'

When 'Include replica forests' is true and 'Journal Archiving' is true

In this case, backups will succeed when forests are failed over to their replica forests because replica forests are configured for backups. And, because journal archiving is enabled, journals will be also written to the backup data.

Even in this case, point in time restore will not work similar to the previous case, until the forests are failed back.

Related documentation

MarkLogic Administrator's Guide: Backing up and Restoring a Database Following Local Disk Failover 

MarkLogic Administrator's Guide: Restoring Databases with Journal Archiving

MarkLogic Knowledgebase Article: Understanding the role of journals in relation to backup and restore journal archiving

MarkLogic Knowledgebase Article: Database backup / restore and local disk failover

Before executing significant operational procedures on production systems, such as

  • Production Go Live events;
  • Major version Upgrades;
  • Adding/removing nodes to a cluster;
  • Deploying a new application or an application upgrade;
  • ...

MarkLogic recommends:

  • Thorough testing of any operational procedures on non-production systems.
  • Opening a ticket with MarkLogic Technical Support to give them a heads up, along with any useful collateral that would help expedite diagnostics of issues if any occur, such as
    • The finalized plan & timeline or schedule of the operational procedure
    • support dump, taken before the operational procedure, in order to record the configuration of the system ahead of time; This may come in handy if an incident occurs as we may want to know the actual changes that had been made. You can create a MarkLogic Server support dump from our Admin UI by selecting the 'Support' tab; select scope=cluster, detail=status only, destination=browser -> save output to disk. Attach the support dump to the ticket as a file either as an email attachment or uploading through our support portal. 
    • A few days of error logs from before the operational event so that we can determine whether artifacts in the error logs are new or whether they existed prior to the event.
    • You can alternatively turn Telemetry on before the event and force an upload of the support dump & error logs.
    • Any architecture or design details of the system that you are able to share.
  • Please make sure that all individuals who are responsible for the event and who may need to contact the MarkLogic Technical Support team are registered MarkLogic Support contacts. They can register for an account per instructions available at  They will want to register before the event as ONLY registered support contacts can create a ticket with MarkLogic Technical Support. We do not want registration and entitlement verification to get in the way of the ability to work on an urgent production issue.
  • Review the MarkLogic Support Handbook - The following sections in the "HOW TO RECEIVE SUPPORT SERVICES" chapter of the handbook are useful to be acquainted with before an incident occurs
    • Section: What to do Prior to Logging a Service Request 
    • Section: Working with Support
    • Section: Escalation Process
    • Section: Understanding Case Priority and Response Time Targets
  • For urgent issues (production outages), remember that you can raise an urgent incident per the instructions in the support handbook; MarkLogic takes urgent incidents seriously, as every urgent issue results in a text message being sent to every support engineer, engineering management and the senior executive at MarkLogic. 
  • Enable Debug level logging so that any issues that arise can be more easily diagnosed.  Debug level logging does not have any noticeable impact on system performance.


In some cases it is required to change the default environment variables of a MarkLogic Server installation or configuration

Making Changes to Defaults

When changes to the default configurations need to be made, we recommend using /etc/marklogic.conf to make those changes. The file will not exist in a default installation, and should be manually created. We recommend the file only contain the variables that are being changed or added. This file will also be unaffected by MarkLogic upgrades.

Note: We do not recommend making changes to /etc/sysconfig/MarkLogic, as this file is part of the MarkLogic installation package, and it may be replaced or changed during a MarkLogic upgrade with no notification. Any direct file customizations will be overwritten and lost, which can result in various problems when the MarkLogic service is restarted.

During startup, MarkLogic will first source its own environment variable file, and then it will source /etc/marklogic.conf, which ensures the locally defined variables take precedence.

Changing the Default Data Directory

A common use of the /etc/marklogic.conf file is to change the default data directory (/var/opt/MarkLogic).

export MARKLOGIC_DATA_DIRECTORY = "/my/custom/path/MarkLogic"

If that file exists when the server is first initialized, then MarkLogic will run from the custom location. If MarkLogic has already been initialized, then you may need to stop the service and manually move /var/opt/MarkLogic to your custom location.

Using the MarkLogic AMI

When using the MarkLogic AMI, without using the MarkLogic Cloud Formation template, it is necessary to create /etc/marklogic.conf to disable the Managed Cluster feature.


If this is done after the instance is launched, then you may encounter the issue mentioned in the KB SVC_SOCHN Warning During Start Up on AWS.

Common Configurable Variables

  • MARKLOGIC_INSTALL_DIR - Where the MarkLogic binaries are installed
  • MARKLOGIC_DATA_DIR - Where MarkLogic stores configurations and forest data
  • MARKLOGIC_EC2_HOST - Whether MarkLogic will utilize EC2 specific features and settings
  • MARKLOGIC_AZURE_HOST - Whether MarkLogic will utilize Azure specific features and settings
  • MARKLOGIC_MANAGED_NODE - Whether MarkLogic will utilize the Managed Cluster feature
  • MARKLOGIC_USER - User that MarkLogic runs as
  • MARKLOGIC_HOSTNAME - Manually set the MarkLogic host name. Must be set prior to initialization or the hostname from the OS will be used
  • TZ - Allows for MarkLogic to operate with a different time zone setting than the OS

Further reading

Best Practice for Adding an Index in Production


It is sometimes necessary to remove or add an index to your production cluster. For a large database with more than a few GB of content, the resulting workload from reindexing your database can be a time and resource intensive process, that can affect query performance while the server is reindexing. This article points out some strategies for avoiding some of the pain-points associated with changing your database configuration on a production cluster.

Preparing your Server for Production

In general, high performance production search implementations run with tight controls on the automatic features of MarkLogic Server. 

  • Re-indexer disabled by default
  • Format-compatibility set to the latest format
  • Index-detection set to none.
  • On a very large cluster (several dozen or more hosts), consider running with expunge-locks set to none
  • On large clusters with insufficient resources, consider bumping up the default group settings
    • xdqp-timeout: from 10 to 30
    • host-timeout: from 30 to 90

The xdqp and host timeouts will prevent the server from disconnecting prematurely when a data-node is busy, possibly triggering a false failover event. However, these changes will affect the legitimate time to failover in an HA configuration. 

Preparing to Re-index

When an index configuration must be changed in production, you should:

  • First, index-detection should be set back to automatic
  • Then, the index configuration change should be made

When you have Database Replication Configured:

If you have to add or modify indexes on a database which has database replication configured, make sure the same changes are made on the Replica cluster as  well. Starting with ML server version 9.0-7, index data is also replicated from the Master to the Replica, but it does not automatically check if both sides have the same index settings. Reindexing is disabled by default on a replica cluster. However, when database replication configuration is removed (such as after a disaster),  the replica database will reindex as necessary. So it is important that the Replica database index configuration matches the Master’s to avoid unnecessary reindexing.

Note: If you are on a version prior to 9.0-7 - When adding/updating index settings, it is recommended that you update the settings on the Replica database before updating those on the Master database; this is because changes to the index settings on the Replica database only affect newly replicated documents and will not trigger reindexing on existing documents.

Further reading -

Master and Replica Database Index Settings

Database Replication - Indexing on Replica Explained

  • Finally, the reindexer should be enabled during off-hours to reindex the content.

Reindexing works by reloading all the Uris that are affected by the index change, this process tends to create lots of new/deleted fragments which then need to be merged. Given that reindexing is very CPU and disk I/O intensive, the re-indexer-throttle can be set to 3 or 2 to minimize impact of the reindex.

After the Re-index

After the re-index has completed, it is important to return to the old settings by disabling the reindexer and setting index-detection back to none.

If you're reindexing over several nights or weekends, be sure to allow some time for the merging to complete. So for example, if your regular busy time starts at 5AM, you may want to disable the reindexer at around midnight to make sure all your merging is completed before business hours.

By following the above recommendations, you should be able to complete a large re-index without any disruption to your production environment.


MarkLogic Server can ingest and query all sorts of data such as XMLtextJSON, binary, generic, etc. There are some things to consider when choosing to simply load data "as-is" vs. doing some degree of data modeling or data transformation prior to ingestion.


Loading data "as-is" can minimize time and complexity during ingest or document creation. That can, however, sometimes mean more complex, slower performing queries. It may also mean more storage space intensive indexing settings.

In contrast, doing some degree of data transformation prior to ingestion can sometimes result in dramatic improvements in query performance and storage space utilization due to reduced indexing requirements.

An Example

A simple example will demonstrate the how a data model can affect performance. Consider the data model used by Apple's iTunes:

<plist version="1.0">
  <key>Major Version</key><integer>10</integer>
  <key>Minor Version</key><integer>1</integer>
  <key>Application Version</key><string>10.1.1</string>
  <key>Show Content Ratings</key><true/>
    <key>Track ID</key><integer>290</integer>
    <key>Name</key><string>01-03 Good News</string>

Note the multiple <key> sibling elements, at multiple levels - where both levels are named the same thing (in this case, <dict>). Let's say you wanted to query a document like this for "Application Version." In this case, time will be spent performing index resolution for the encompassing element (here, <key>). Unfortunately, because there are multiple sibling elements all sharing the same element name, all of those sibling elements will need to be retrieved and then evaluated to see which of them actually match the given query criteria. Consider a slightly revised data model, instead:


<iTunesLibrary version="1.0">
    <name>01-03 Good News</name>

Here, we only need to query and therefore retrieve and evaluate the single <app-version> element, instead of multiple retrievals/evaluations as in the previous example data model.  

At Scale

Although this is a simple example, when processing millions or even billions of records, eliminating small processing steps could have significant performance impact.


Handling large amounts of data can be expensive in terms of both computing resources and runtime. It can also sometimes result in application errors or partial execution. In general, if you’re dealing with large amounts of data as either output or input, the most scalable and robust approach is to break-up that workload into a series of smaller and more manageable batches.

Of course there are other available tactics. It should be noted, however, that most of those other tactics will have serious disadvantages compared to batching. For example:

  • Configuring time limit settings through Admin UI to allow for longer request timeouts - since you can only increase timeouts so much, this is best considered a short term tactic for only slightly larger workloads.
  • Eliminating resource bottlenecks by adding more resources – often easier to implement compared to modifying application code, though with the downside of additional hardware and software license expense. Like increased timeouts, there can be a point of diminishing returns when throwing hardware at a problem.
  • Tuning queries to improve your query efficiency – this is actually a very good tactic to pursue, in general. However, if workloads are sufficiently large, even the most efficient implementation of your request will eventually need to work over subset batches of your inputs or outputs.

For more detail on the above non-batching options, please refer to XDMP-CANCELED vs. XDMP-EXTIME.


1.    If you can’t break-up the data into a series of smaller batches - use xdmp:save to write out the full results from query console to the desired folder, specified by the path on your file system. For details, see xdmp:save.

2.    If you can break-up the data into a series of smaller batches:

            a.    Use batch tools like MLCP, which can export bulk output from MarkLogic server to flat files, a compressed ZIP file, or an MLCP database archive. For details, see Exporting Content from MarkLogic Server.

            b.    Reduce the size of the desired result set until it saves successfully, then save the full output in a series of batches.

            c.    Page through result set:

                               i.     If dealing with documents, cts:uris is excellent for paging through a list of URIs. Take a look at cts:uris for more details.

                               ii.     If using Semantics

                                             1.    Consider exporting the triples from the database using the Semantics REST endpoints.

                                             2.    Take a look at the URL parameters start? and pageLength? – these parameters can be configured in your SPARQL query to return the results in batches.  See GET /v1/graphs/sparql for further details.


1.    If you’re looking to update more than a few thousand fragments at a time, you'll definitely want to use some sort of batching.

             a.     For example, you could run a script in batches of say, 2000 fragments, by doing something like [1 to 2000], and filtering out fragments that already have your newly added element. You could also look into using batch tools like MLCP

             b.    Alternatively, you could split your input into smaller batches, then spawn each of those batches to jobs on the Task Server, which has a configurable queue. See:

                            i.     xdmp:spawn

                            ii.    xdmp:spawn-function

2.    Alternatively, you could use an external/community developed tool like CoRB to batch process your content. See Using Corb to Batch Process Your Content - A Getting Started Guide

3.    If using Semantics and querying triples with SPARQL:

              a.    You can make use of the LIMIT keyword to further restrict the result set size of your SPARQL query. See The LIMIT Keyword

              b.    You can also use the OFFSET keyword for pagination. This keyword can be used with the LIMIT and ORDER BY keywords to retrieve different slices of data from a dataset. For example, you can create pages of results with different offsets. See  The OFFSET Keyword


This article outlines various factors influencing the performance of xdmp:collection-delete function and furthermore provides general best practices for improving the performance of large collection deletes.

What are collections?

Collections in MarkLogic Server are used to organize documents in a database. Collections are a powerful and high-performance mechanism to define and manage subsets of documents.

How are collections different from directories?

Although both collections and directories can be used for organizing documents in a database, there are some key differences. For example:

  • Directories are hierarchical, whereas collections are not. Consequently, collections do not require member documents to conform to any URI patterns. Additionally, any document can belong to any collection, and any document can also belong to multiple collections
  • You can delete all documents in a collection with the xdmp:collection-delete function. Similarly, you can delete all documents in a directory (as well as all recursive subdirectories and any documents in those directories) with a different function call - xdmp:directory-delete
  • You can set properties on a directory. You cannot set properties on a collection

For further details, see Collections versus Directories.

What is the use of the xdmp:collection-delete function?

xdmp:collection-delete is used to delete all documents in a database that belong to a given collection - regardless of their membership in other collections.

  • Use of this function always results in the specified unprotected collection disappearing. For details, see Implicitly Defining Unprotected Collections
  • Removing a document from a collection and using xdmp:collection-delete are similarly contingent on users having appropriate permissions to update the document(s) in question. For details, see Collections and Security
  • If there are no documents in the specified collection, then nothing is deleted, and the function still returns the empty sequence

What factors affect performance of xdmp:collection-delete?

The speed of xdmp:collection-delete depends on several factors:

Is there a fast operation mode available within the call xdmp:collection-delete?

Yes. The call xdmp:collection-delete("collection-uri") can potentially be fast in that it won't retrieve fragments. Be aware, however, that xdmp:collection-delete will retrieve fragments (and therefore perform much more slowly) when your database is configured with any of the following:

What are the general best practices in order to improve the performance of large collection deletes?

  • Batch your deletes
    • You could use an external/community developed tool like CoRB to batch process your content
    • Tools like CoRB allow you to create a "query module" (this could be a call to cts:uris to identify documents from a number of collections) and a "transform module" that works on each URI returned. CoRB will run the URI query and will use the results to feed a thread pool of worker threads. This can be very useful when dealing with large bulk processing. See: Using Corb to Batch Process Your Content - A Getting Started Guide
  • Alternatively, you could split your input (for example, URIs of documents inside a collection that you want to delete) into smaller batches
    • Spawn each of those batches to jobs on the Task Server instead of trying to delete an entire collection in a single transaction
    • Use xdmp:spawn-function to kick off deletions of one document at a time - be careful not to overflow the task server queue, however
      • Don't spawn single document deletes
      • Instead, make batches of size that work most efficiently in your specific use case
    • One of the restrictions on the Task Server is that there is a set queue size - you should be able to increase the queue size as necessary
  • Scope deletes more narrowly with the use of cts:collection-query

Related knowledgebase articles:



MarkLogic Server delivers performance at scale, whether we're talking about large amounts of data, users, or parallel requests. However, people do run into performance issues from time to time. Most of those performance issues can be found ahead of time via well-constructed and well-executed load testing and resource provisioning.

There are three main aspects to load testing against and resource provisioning for MarkLogic:

  1. Building your load testing suite
  2. Examining your load testing results
  3. Addressing hot spots

Building your load testing suite

The biggest issue we see with problematic load testing suites is unrepresentative load. The inaccuracy can be in the form of missing requests, missing query inputs, unanticipated query inputs, unanticipated or underestimated data growth rates, or even a population of requests that skews towards different load profiles compared to production traffic. For example - a given load test might heavily exercise query performance, only to find in production that ingest requests represent the majority of traffic. Alternatively, perhaps one kind of query represents the bulk of a given load test when in reality that kind of query is dwarfed by the number of invocations of a different kind of query.

Ultimately, to be useful, a given load test needs to be representative of production traffic. Unfortunately, the less representative a load test is, the less useful it will be.

Examining your load testing results

Beginning with version 7.0, MarkLogic Server ships a Monitoring History dashboard, visible from any host in your cluster at port 8002/history. The Monitoring History dashboard will illustrate the usage of resources such as CPU, RAM, disk I/O, etc... both at the cluster and individual host levels. The Monitoring History dashboard will also illustrate the occurrence of read and write locks over time. It's important to get a handle on both resource and lock usage in the course of your load test as both will limit the performance of your application - but the way to address those performance issues depends on which class of usage is most prevalent.

Addressing hot spots

By having a representative load test and closely examining your load testing results, you'll likely find hot spots or slow performing parts of your application. MarkLogic Server's Monitoring History allows you to correlate resource and lock usage over time against the workload being submitted by your load tests. Once you find a hot spot, it's worthwhile examining it more closely by either running those requests in isolation or at larger scales. For example, you could run 4x and 16x the number of parallel requests, or 4x and 16x the number of inputs to an individual request - both of which will give you an idea of how the suspect requests scale in response to increased load.

Once you've found a hot spot - what should you do about it? Well, that ultimately depends on the kind of usage you're seeing in your cluster's Monitoring History. If it's clear that your suspect requests are running into a resource bound (for example, 100% utilization of CPU/RAM/disk I/O/etc.), then you'll either need to provision more of that limiting resource (either through more machines, or more powerful machines, or both), or reduce the amount of load on the system provisioned as-is. It may also be possible to re-architect the suspect request to be more efficient with regard to its resource usage.

Alternatively, you may find that your system is not, in fact, seeing a resource bound - where it appears there are plenty of spare CPU cycles/free RAM/low amounts of disk I/O/etc. If you're seeing poor performance in that situation, it's almost always the case that you'll instead see large spikes in the number of read/write locks taken as your suspect requests work through the system. Provisioning more hardware resources may help to some small degree in the presence of read/write locks, but what really needs to happen is the requests need to be re-architected to use as few locks as possible, and preferably to run completely lock free.





While there are many different ways to define schemas in MarkLogic Server, one should be aware of both the location strategy the server will use (defined here:, as well as the different locations in which your particular schema may reside.

Schema Location

Schemas can reside in either the Schemas database defined for your content database, or within the server's Config directory.  If there is no explicit schema map defined, the server will use the following schema location strategy:

1) If the XQuery program explicitly references a schema for the namespace in question, MarkLogic Server uses this reference.
2) Otherwise, MarkLogic Server searches the schema database for an XML schema document whose target namespace is the same as the namespace of the element that MarkLogic Server is trying to type.
3) If no matching schema document is found in the database, MarkLogic Server looks in its Config directory for a matching schema document.
4) If no matching schema document is found in the Config directory, no schema is found.

There can sometimes be issues with step #2 when there are multiple schema documents in the schema database whose target namespace matches the namespace of the element that MarkLogic Server is trying to type. In that situation, it would be best to explicitly define a default schema mapping - schema maps can be defined through the Admin API or the Admin User Interface. Be aware that you can define schema mappings at both the group level (in which case the mapping would then apply to all application servers in the group) or at the individual application server level.

Best Practices

Now that we know how the server locates schemas and where schema can potentially reside - what are the best practices?

In general, it's best to localize your schema impacts as narrowly as possible. For example, instead of using a single Schemas database or the server's one and only Config directory, it would instead be better to define a specific Schemas database that would be used for the relevant content database. Similarly, unless you know you need a defined schema mapping to apply to every application server in a group, it would instead be better to define your schema mappings at the application server level as opposed to the group level.


Although not exhaustive, this article lists some best practices for the use of MarkLogic Server and Amazon's VPC


  1. Nodes within a MarkLogic cluster need to communicate with one another directly, without the presence of a load balancer in-between them.
  2. Whether in the context of a VPC or not, before attempting to join a node to a cluster, one should verify whether each node is able to ping or to ssh from the one node to the other (or vice versa). If you're not able to ping or ssh from one machine to another, then issues seen during a MarkLogic cluster join is very likely to be localized to the network configuration and should be diagnosed at the network layer.
  3. The following items should be double-checked when using VPCs:
    1. If a private subnet is used for any MarkLogic instance, that subnet needs access to the public internet for the following situations:
      1. If Managed Cluster support is used, MarkLogic requires access to AWS services which require outbound connectivity to the internet (at minimum to the AWS service web sites).
      2. If foreign clusters are used then MarkLogic needs to connect to all hosts in the foreign cluster
      3. If Amazon S3 is used then MarkLogic needs to communicate with the S3 public web services.
    2. It is assumed that the creator of the VPC has properly configured all subnets which MarkLogic needs to be installed to have outbound internet. There are many ways that private subnets can be configured to communicate outbound to the public internet. NAT instances are one example [AWS VPC NAT]. Another option is using DirectConnect to route outbound traffic through the organization's internet connection.
    3. All subnets which host instances running MarkLogic in the same cluster need to be able to communicate via port 7999.
    4. Inbound ssh connectivity is required for command line administration of each server requiring port 22 to be accessible from either a VPN or a public subnet.
    5. With regard to application traffic (as opposed to intra-cluster traffic as seen during cluster joining) connectivity to the MarkLogic server(s) needs to be open to whatever applications for which it is required. Application traffic can be sent through an internal or external load balancer, a VPN, direct access from applications in the same subnet or routing through another subnet.


This knowledgebase article contains critical tips and best practices you'll need to know to best use MarkLogic Server with your favorite BI Tools.

BI Tool Q&A

Q: What's a TDE? Is that a Tableau Data Extract?

A: In MarkLogic terms, TDE stands for Template Driven Extraction. A template is a document (XML or JSON) that declares how a view is to be populated. It defines a context -- the root path of all the documents that are involved in this view -- then, for each column in the view, it defines a column name, type, and a path to the data inside the document. You can define the value of a column using several pieces of data in the document, plus some functions, even some programming operations such as IF. For example, if your documents have the "last-updated" year and month and day in different parts of the document, your Template can pull in those three pieces, concatenate them, then cast the result as a date.

Q: When modifying TDEs, do I need to reindex?

A: TDEs map an SQL-like view on top of MarkLogic. If you change an existing view, you do need to reindex the database. Before kicking off a resource- and time- intensive reindex, however, be aware that there are some TDE configurations that cannot be updated. You can read more about exactly which kinds of TDEs may or may not be updated at the following knowledgebase article: Updating a TDE View.

Q: Can MarkLogic handle queries that require a large number of columns?

A: Yes, but you'll want to pay attention to potential performance impacts. In general, it's much better to spread a large number of columns across multiple TDEs, instead of having a single TDE containing all those same columns. Data modeling is also important here - TDEs should be meaningful with regard to their intended use. Definitely check out MLU's Data Modeling Series, in particular Progressive Transformation using the Envelope Pattern and Impact of Normalization: Lessons Learned.

Q: What are some common patterns and antipatterns for good performance with BI tools?

A: First, avoid using Nullable columns in filters and drilldowns. There are optimizations in MarkLogic Server's SQL engine to detect patterns with "null" - but different BI tool generate their code in different ways and can sometimes result in code that circumvents those optimizations. In general, if performance is a priority, it's usually better to use an actual value such as "N/A" or "0".

Second, enable Query Reduction or similar options in your BI tool of choice. Without this option, if you choose to filter on a year - say "2018" - and then also select "2019", multiple SQL queries will be sent to MarkLogic in quick succession unnecessarily.

Q: What do I need to watch out for when connecting my BI tool to MarkLogic?

A: If performance is a priority, exercise caution when using joins. In general, the best practice is to create collections of data in MarkLogic that represent the subsets of data needed externally as closely as possible. You can learn more about what tools are available to see how many and what kind of joins are being used by your query in the What debugging tools are available for Optic, SQL, or SPARQL code in MarkLogic Server? knowledgebase article, and you can learn more about how to create more meaningful data models and subsets of your data models in the aforementioned MLU's Data Modeling Series, as well as in the MarkLogic World presentation Getting the Most from MarkLogic Semantics (also available in video form).



If you're looking to use any of the interfaces built on top of MarkLogic's semantics engine (Optic API, SQL, or SPARQL) - you'll want to make sure you're using the best practices itemized in this knowledgebase article. It's not unusual to see one or even two orders of magnitude performance improvements, as a result. Note that this article is really just a distillation of the MarkLogic World presentation "Getting the Most from MarkLogic Semantics" - available in both pdf and YouTube formats.

Best Practices for Using Semantics at Scale

1) Scope your query - more constrained queries will do less work, and will therefore take less time

  • Trim resultsets early
  • Partition
    • Query partitions or subsets of your data, instead of your entire database
    • Define partitions with Collections
    • Make use of your partitions with collection queries
    • Use cts:query to partition even further
  • Keep like-triples in the same document
  • Use MarkLogic indexes to scope a query
    • Collection query (or SPARQL FROM) to partition the RDF space
    • Put ontologies and other lookup/mapping triples into their own graphs/collections
    • Consider pushing-down some SPARQL FILTERs to the document

2) Pay attention to your data model

3) Resultset size specific tips

  • For small resultsets – from SPARQL, get the docs with a search
  • For large resultsets
    • Get docs in a single read, no joins
    • Large result sets may incur connection churning overhead – paginate large resultsets to ensure connection reuse

4) Hardware tips

  • Add more memory - allows the optimizer to choose faster plans
  • Add more hardware - allows for increased parallelization

5) Avoid unnecessary work

  • Re-use queries with bind variable - query plan is cached for 5 minutes
  • Dedup processing
    • De-duplication has no effect on results if you have no duplicate triples and/or you use DISTINCT
    • Skipping dedup processing can result in substantial performance improvements


Backing up multiple databases simultaneously may make some of the backups fail with error XDMP-FORESTOPIN.



While configuring a scheduled backup, one can also select to backup the associated auxiliary databases like security, schemas, triggers. Generally, all the content databases share these auxiliary databases so issue may arise when more than one scheduled backup tries to backup the same auxiliary database. When two backups try to backup the same auxiliary database, the backup will fail throwing XDMP-FORESTOPIN error. Generally this error comes when the system attempts to start one forest operation (backup, restore, remove, clear, etc.) while another, exclusive operation is already in progress. For example, starting a new backup while a previous backup is still in progress.



One should be extra cautious while configuring scheduled backups and selecting auxiliary databases with them. If one really wants to backup the auxiliary databases with the content database then one needs to pay special attention to the timing and ensure that no two backups pose this timing threat.

As most of the applications don't make frequent changes to their auxiliary databases hence MarkLogic recommends to schedule backup for them separately - instead of selecting them together with the content databases.


Problems can occur when trying to explicitly search (or not search) parts of documents when using a global configuration approach to include and exclude elements.

Global Approach

Including and excluding elements in a document using a global configuration approach can lead to unexpected results that are complex to diagnose.  The global approach will require positions to be enabled in your index settings, expanding the disk space requirements of your indexes and may result in greater processing time of your position dependent queries.  It may also require adjustments to your data model to avoid unintended includes or excludes; and may require changes to your queries in order to limit the number of positions used.

If circumstances dictate that you must instead use the less preferred global configuration approach, you can read more about including/excluding elements in word queries here:

Recommended Approach

In general, it's better to define specific fields, which are a mechanism designed to restrict your query to portions of documents based on elements. You can read more about fields here:




In MarkLogic 8, support for native JSON and server side JavaScript was introduced.  We discuss how this affects the support for XML and XQuery in MarkLogic 8.


In MarkLogic 8, you can absolutely use XML and XQuery. XML and XQuery remain central to MarkLogic Server now and into the future. JavaScript and JSON are complementary to XQuery and XML. In fact, you can even work with XML from JavaScript or JSON from XQuery.  This allows you to mix and match within an application—or even within an individual query—in order to use the best tool for the job.

See also:

Server-side JavaScript and JSON vs XQuery and XML in MarkLogic Server

XQuery and JavaScript interoperability


Sometimes you may find that there are one or more tasks that are taking too long to complete or are hogging too many server resources, and you would like to remove them from the Task Server.  This article presents a way to cancel active tasks in the Task Server.


To cancel active tasks in the Task Server, you can browse to the Admin UI, navigate to the Status tab of the Group's Task Server, and cancel the tasks. However, this may get tedious if there are many tasks to be terminated.

As an alternative, you can use the server monitoring built-ins to programmatically find and cancel the tasks. The documentation for the MarkLogic Server API contains includes information for all the builtin functions you will need (refer to

Sample Script

Here is a sample script that removes the task based on the path to the module that is being executed:

let $host-id := xdmp:host()
let $host-task-server-id := xdmp:host-status($host-id)//*:task-server/*:task-server-id/text()
let $task-server-status := xdmp:server-status($host-id,$host-task-server-id)
let $task-server-requests := $task-server-status/*:request-statuses
let $scheduled-task-request := $task-server-requests/*:request-status[*:request-text = "/PATH/TO/SCHEDULED/TASK/MODULE.XQY"]/*:request-id/text()


MarkLogic stores all signed Certificates, private keys, and Certificate Authority Certificates inside the Security Database. The Security Database also stores Users, Passwords, Roles, Privileges, and many other Authentication related configurations. While setting up DR Cluster, many Administrators prefers to Replicate the Security Database to a DR (Disaster Recovery) cluster to avoid re-configuring DR cluster with Same User/Role/Privileges etc. 

Security Database Replication presents design challenges and issues while Accessing Application Servers on the DR cluster.

  • Certificates installed on the Master Cluster Security Database will get replicated to the DR cluster Security Database; However those Replicated Certificates are not useful to the DR Cluster, since Signed Certificates are typically tied to a single host (though exceptions include SAN and Wild Card Certificates).  
  • At the same time, since replicated databases are read-only, we are not able to install a new Signed Certificates on the DR Cluster as the replicated Security Database is read-only.

This article discusses the different aspect of the above problem and provides a solution.

Configuration: Security Database replicated to DR Cluster

For article discussion purpose, we will consider a 3 node Master cluster coupled to a 3 node DR cluster, where the Security DB is replicated from Master to DR Cluster. We will also have an Application Server configured attached to "DemoTemp1" Template in Master cluster. 

       Master_Cluster_Hosts.png         DR_Cluster_Hosts.png

Issues in DR Cluster.

Certificate Authentication based on CN field 

When client browsers connect to the application server using HTTPS, they check to make sure your SSL Certificate matches the host name in the address bar. There are three ways for browsers to find a match:

  1.    The host name (in the address bar) exactly matches the Common Name (CN) in the certificate's Subject.
  2.    The host name matches a Wildcard Common Name. For example, matches the common name *
  3.    The host name is listed in the Subject Alternative Name field.

The most common form of SSL name matching is the first option -  SSL client compares server name to the Common Name in the server's certificate. 

Since Temporary Signed Certificates have CN field of Master Cluster nodes, the Application Server on the DR Cluster will fail when used with the MarkLogic generated Temporary Signed Certificate.

Certificate Requests

When we attach Template on DR Cluster to any application server and generate a certificate request, MarkLogic Server will generates a Temporary Signed Certificate for all the nodes in Cluster in the Application Server Group.

Master_Cert_Template_Status.png    DR_Cert_Template_Status_1.png

To install Certificate Signed by 3rd party, replacing temporary Signed Certificate, we will need to generate a certificate requests. You can generate a certificate requests in MarkLogic for All nodes using the Request button under "Needed Certificate Request" on Certificate Template "Status" tab.

  • On the Master cluster, MarkLogic will generate 3 Certificate requests with CN field matching for each of 3 nodes. All 3 new Certificate Request are internally stored in the Security Database.
  • On the DR Cluster, Clicking Certificate Request will result in an ERROR, since the DR Cluster has a replicated Security Database that is in a Read-Only ("open replica") state i.e. security database updates arel not allowed.

Pending Certificate Requests

Each Certificate request are intended for specific individual nodes, as Certificate request originator will incorporate client FDQN into Certificate CN field while request generation. MarkLogic Server will use the hostname (which in most cases matches your FDQN) as the CN field value in the Certificate Request.

Certificate request generated on Master Cluster are stored in Security Database, which will get replicated to DR Cluster Security Database (as/when Security DB replication is configured); However Certificate requests generated on Master Cluster are not relevant to DR Cluster as they have Master Cluster nodes FQDN as CN Fields in them.

Master_Cert_Template_Status_Post_Request.png    DR_Cert_Template_Status_Post_Request.png


To install Signed Certificates intended for the DR Cluster, where Certificate CN field matches the FQDN of DR Cluster, we will need to install the DR cluster's Signed Certificates on the Master Cluster.  That certificate will then be replicated to the DR Cluster through the normal database replication of the Security database. 

Step 1. Generate Certificate Request (intended for DR nodes).

You would generate Certificate request using XQuery on QConsole against the Security database on the Master cluster itself, but the values used in your XQuery will be for DR/Replica Cluster nodes FQDN. For example, for the first node in DR Cluster ", you would run below Query from Query Console on any Node on Master Cluster against Security Database. We will change the FQDN value to each node and run Query total 3 times.

xquery version "1.0-ml"; 
import module namespace pki = "" at "/MarkLogic/pki.xqy";

Step 2. Download Certificate Request and Get them Signed.

We should be able to see Certificate request pertaining to each nodes (for Master as well as DR Nodes) on Certificate Template status tab on Master Cluster GUI and DR Cluster GUI both. Download them and get them signed by the favorite Certificate Authority.

Master_Cert_Template_Status_QC_Request.png    DR_Cert_Template_Status_QC_Request.png

Step 3. Install All Signed Certificates (for Master + DR Nodes) on Master Cluster 

Install all Signed Certificates (including Cert intended for Replica Cluster) on Master Cluster Admin GUI Certificate Template Import tab. If we try to Install Certificates on DR/Replica cluster from Admin GUI, we will get XDMP-FORESTNOT --Forest Security not available: open replica Error. Our Application Server on the DR Cluster will find the appropriate Certificates for the node from the list of all Certificates. Below screenshot shows the status of Certificate Template from Master cluster as well as DR cluster (Both should be identical).

Master_Cert_Template_Status_Final.png    DR_Cert_Template_Status_Final.png

Step 4. Importing Pre-Signed Cert where Keys are generated outside of MarkLogic.

Please read "Import pre-signed Certificate and Key for MarkLogic HTTPS App Server" to import Certificate Req/Key generated outside of MarkLogic; For our purpose, we will need to import Certificates (and their respective Keys) for both Clusters (Master as well as DR/Replica) from the QConsole on Master Cluster itself.

Further Reading


Each node in MarkLogic Server Cluster has a hostname, a human-readable nickname corresponding to the network address of the device. MarkLogic retrieves the hostname from underlying operating system during installation. On Linux, we can retrieve platform hostname value by running "$ hostname" from a shell prompt. 

$ hostname

In most environments, hostname is the same as the platform's Fully-Qualified-Domain-Name (FQDN). However, there are scenarios where hostname could be different than the FQDN. On such environments you would use FQDN ( to connect to platform instead of hostname

$ ping

PING ( 56(84) bytes of data.

64 bytes from ( icmp_seq=1 ttl=64 time=0.011 ms

During Certificate Installation to Certificate template on environments where hostname and FQDN mismatch, MarkLogic looks for the CN field in the Installed Certificate to find a matching hostname in the cluster. However since CN field (reflecting FQDN) does not match the hostname known to MarkLogic, MarkLogic does not assign the installed Certificate to any specific host in Cluster.

Subject: C=US, ST=NJ, L=Princeton, O=MarkLogic, OU=Eng,

Installing Certificates in this scenario results in the installed Certificate not replacing the Temporary Certificate, and the Temporary Certificate will still be used with HTTPS App Server instead of the installed Certificates.

This article details different solutions to address this issue. 


1) Hostname change

By default MarkLogic picks the hostname value presented by the underlying operating system. However we can always change the hostname string stored in MarkLogic Server after installation using Admin API admin:host-set-name ( )

Changing the hostname in MarkLogic (to reflect the FQDN name) will not affect the underlying Platform/OS hostname values, but will result in MarkLogic being able to find the correct host for the installed Certificate (CN field = hostname), and thus able to link then installed Certificate to specific host in Cluster.

2) XQuery code linking Installed Cert to specific Host

You can also use below XQuery code from QConsole against Security DB (as content source) to update Certificate xml files in Security DB, linking Installed Certificate to Specific host.

Please change the Certificate Template-Name, and Host-Name in below XQuery to reflect values from your environment.

xquery version "1.0-ml";

import module namespace pki = ""  at "/MarkLogic/pki.xqy";
import module namespace admin = ""  at "/MarkLogic/admin.xqy";

(: Change to your hostname string :)
(: if Qconsole is launched from the same host, then below can be used as well :)
(: let $hostname := xdmp:host-name()    :)
let $hostname :=""
let $hostid := admin:host-get-id(admin:get-configuration(), $hostname)

(: FQDN name matching Certificate CN field value :)
let $fqdn := ""

(: Change to your Template Name string :)
let $templateid := pki:template-get-id(pki:get-template-by-name("YourTemplateName"))

for $i in cts:uris()
(   (: locate Cert file with Public Key :)
    and fn:doc($i)//pki:certificate/pki:authority=fn:false()
    and fn:doc($i)//pki:certificate/pki:host-name=$fqdn
return <h1> Cert File - {$i} 
{xdmp:node-insert-child(doc($i)/pki:certificate, <pki:host-id>{$hostid}</pki:host-id>)}
    (: extract cert-id :)
    let $certid := fn:doc($i)//pki:certificate/pki:certificate-id
    for $j in cts:uris()
        (: locate Cert file with Private key :)
        and fn:doc($j)//pki:certificate-private-key/pki:certificate-id=$certid
    return <h2> Cert Key File - {$j}
    {xdmp:node-insert-child(doc($j)/pki:certificate-private-key, <pki:host-id>{$hostid}</pki:host-id>)}
} </h1>

Also, note that above will not replace/overwrite the temporary Certificate, however our App Server will start using Installed Certificate from this point instead of Temporary Certificate. One can also delete the now unused Temporary Certificate file from QConsole without any negative effect.

3) Certificate with Subject Alternative Name (SAN Cert)

You can also request your IT (or Certificate issuer) to provide a Certificate with altSubjectName that matches MarkLogic's understanding of the host. MarkLogic, during the Installation of the Certificate, will look for Alternative names and link Certificate to correct host based on altSubjectName field.


Further Reading


Introduction: When you may need to change the state of forests

In most cases, all forests in your MarkLogic cluster will be configured to allow all (any) updates to be made.

If we consider running the following example in Query Console:

In the majority of cases, calling the above function should return "all", indicating that the forest is in a state to allow incoming queries to read data from the forest and to allow queries to update content (and to add new content) into that forest.

At any given time, a forest can be configured to be in one of four different states:

  • all
  • read-only
  • delete-only
  • flash-backup

You may want to change the state of the forests in a given database for several reasons

To run your application in maintenance mode where data can be read but no data on-disk can be changed
In a situation where you are migrating data from a legacy database or removing data from a given forest
In a situation where you need to quiesce all forests in a given database for long enough to allow you to make a file level backup of the forest data.

Forest states explained

Sample state management module

Below is an example template for modifying the state of all forests in a given database:

Further reading

Forest States
Setting Forests to "read only"
Setting Forests to "delete only"

Link to Example Code



This article discusses some of the issues you should think about when preparing to change the IP address' of a MarkLogic Server.


If the hostnames stay the same, then changing IP addresses should not have any adverse side effects since none of the default MarkLogic Server settings require an IP address.

Here are some caveats:

  1. Make sure there are no application servers that have an 'address' setting to an IP address that will no longer be accessible/exist after the change.
  2. Similarly, make sure there a no external (to MarkLogic Server) dependencies on the original IP addresses.
  3. Make sure you allow some time (on the order of minutes) for the routing tables to propagate across the DNS servers before bringing up MarkLogic Server.
  4. Make sure the hosts themselves are reachable via the standard Unix channels (ping, ssh, etc) before starting MarkLogic Server.
  5. Make sure you test this in a non-production environment box before you implement it in production.


If you have an existing MarkLogic Server instance running on EC2, there may be circumstances where you need to change the size of available storage.

This article discusses approaches to ensure a safe increase in the amount of available storage for your EC2 instances without compromising MarkLogic data integrity.

This article assumes that you have started your cluster using the CloudFormation templates provided by MarkLogic.

The recommended method (I.) is to shut down the cluster, do the resize using snapshots and start again. If you wish to avoid downtime an alternative procedure (II.) using multiple volumes and rebalancing is described below.

In both procedures we are recommending a single, large EBS volume as opposed to multiple smaller ones because:

1. Larger EBS volumes have faster IO as described by the Amazon EBS Volume types at

2. You have to keep enough spare capacity on every single volume to allow for merges.  MarkLogic disk space requirements are described in our Installation Guide.

I. Resizing using AWS snapshots

This is the recommended method. This procedure follows the same steps as official Amazon AWS documentation, but highlights MarkLogic specific steps. Please review AWS Documentation in detail before proceeding:

1. Make sure that you have an up to date backup of your data and a working restore plan.

2. Stop the MarkLogic cluster by going to AWS Console -> CloudFormation -> Actions -> Update Stack


Click through the pages and leave all other settings intact, but change Nodes to and review and confirm updating the stack. This will stop the cluster.

This is also covered in Marklogic EC2 documentation:

4. Create a snapshot of the volume to resize.

5. Create a new volume from the snapshot.

Ensure that the new volume is sufficiently large to cover MarkLogic disk space requirements (generally at least 1.5x of the planned total forest size).

6. Detach the old volume.

7. Attach the newly expanded volume.

Steps 4-7 are exactly as covered in AWS documentation and have no Marklogic specific parts.

8. Restart MarkLogic cluster, by going to AWS Console -> CloudFormation -> Actions -> Update Stack and changing Nodes to the original setting.

9. Connect to the machine using SSH and resize the logical partition to match the new size. This is covered in AWS documentation, the commands are:

- resize2fs for ext3 and 4

xfs_growfs for xfs

10. The new volume will have a different id. You need to update the CloudFormation template so that the data volumes are retained and remounted when the cluster or nodes are restarted. The easiest way is to use mlcmd shell script provided by Marklogic. Also using SSH, run the following:

/opt/MarkLogic/bin/mlcmd sync-volumes-to-mdb

This will synchronise the EBS volume id with the CloudFormation template.

At this point the procedure is complete and you can delete the old EBS volume and once you have verified that everything is working fine, also delete the snapshot created in step 4.

II. Resizing with no downtime, using MarkLogic Rebalancing

This method avoids cluster downtime but it is slightly more complicated than procedure 1 and rebalancing will take additional time and add load to the cluster during rebalancing. In most cases procedure 1 takes far less time to complete, however, the cluster is down for the duration. With this procedure the cluster can serve requests at all times.

This procedure follows the same steps as official Amazon AWS documentation where possible, but highlights MarkLogic specific steps. Please review AWS Documentation in detail before proceeding:

The procedure is described in more detail in the MarkLogic Server on Amazon EC2 Guide at

1. Create a new volume.

Ensure that the new volume is sufficiently large to cover MarkLogic disk space requirements (generally at least 1.5x of the planned total forest size).

2. Attach the volume to the EC2 instance. Please take a note of the EC2 device mount point, for example /dev/sdg and see here where it maps to in Linux and in RedHat:

3. SSH into the instance and execute the /opt/MarkLogic/bin/mlcmd init-volumes-from-system command to create a filesystem for the volume and update the Metadata Database with the new volume configuration. The init-volumes-from-system command will output a detailed report of what it is doing. Note the mount directory of the volume from this report.

4. Once the volume is attached and mounted to the instance, log into the Administrator Interface on that host and create a forest or forests, specifying host name of the instance and the mount directory of the volume as the forest Data Directory. For details on how to create a forest, see Creating a Forest in the Administrator's Guide.

5. Once the status of the new forest is set to "open", attach the new forest(s) to the database and retire all the forest(s) on the old volume. If you only have 1 data volume then this includes forests for Schemas, Security, Triggers, Modules etc. It is possible to script this part using XQuery, JS or REST:

This will trigger rebalancing - database fragments will start to move to the new forests. This process will take several hours or days, depending on the size of data and the Admin UI will show you an estimate.

The Admin UI for this is covered here:

and here is more information on rebalancing:

6. Once the old forest(s) have 0 fragments in them you can detach them and delete the old forest(s). The migration to a new volume is complete.

7. Optional removing of the old volume. If your original volume was data only, the original volume should be empty after this procedure and you can:

a) unmount the volume in Linux

b) delete the volume in AWS EC2 console

c) issue /opt/MarkLogic/bin/mlcmd sync-volumes-to-mdb. This will preserve the new volume mappings in the Cloud Formation template and the volumes will be preserved and remounted when nodes are restarted or even terminated.


A common use case in many business applications is to find if an element exists in any document or not. This article provide ways to find such documents and explain points that should be taken care of while designing a solution.



In general, existence of an element in a document can checked by using below XQuery.


Note the empty cts:and-query construct here. An empty cts:and-query is used to fetch all fragments.

Hence running below search query will bring back all the documents having element "myElement".


Wrapping the query in cts:not-query will bring back all the documents *not* having element "myElement" 


As a search using cts:not-query is only guaranteed to be accurate if the underlying query that is being negated is accurate from its index resolution, hence to check existence of a specific XPath, we need to index that XPath.
e.g. if you want to find documents having /path/1/A (and not /path/2/A) then you can create a field index for path /path/1/A and then use it in your query instead.


Things to remember

1.) Have unique element name in a single document i.e. try not to use same element name at multiple places within a document if they have different meaning for your use case. Either give them different element names or put them under different namespaces to remove any ambiguity. e.g. if you have element "table" at two places in a single document then you can put them both under different namespaces such as html:table & furniture:table or you can name them differently such as html_table & furniture_table.

2.) If element names are unique within a document then you don't need to create additional indexes. If element names are not unique within a document and you are interested in only a specific XPath then create path(field) indexes on those XPaths and use the same in your not-query.



MarkLogic Server has shipped with full support for the W3C XML Schema specification and schema validation capabilities since version 4.1 (released in 2009).

These features allow for the validation of complete XML documents or elements within documents against an existing XML Schema (or group of Schemas), whose purpose is to define the structure, content, and typing of elements within XML documents.

You can read more about the concepts behind XML Schemas and MarkLogic's support for schema based validation in our documentation:

Caching XML Schema data

In order to ensure the best possible performance at scale, all user created XML Schemas are cached in memory on each individual node within the cluster using a portion of that node's Expanded Tree Cache.

Best practices when making changes to pre-existing XML Schemas: clearing the Expanded Tree Cache

In some cases, when you are redeploying a revised XML Schema to an existing schema database, MarkLogic can sometimes refer to an older, cached version of the schema data associated with a given document.

Therefore, it's important to note that whenever you plan to deploy a new or revised version of a Schema that you maintain, as a best practice, it may be necessary to clear the cache in order to ensure that you have evicted all cached data stored for older versions of your schemas.

If you don't clear the cache, you may sometimes get references to the old, cached schema references and as result, you may get errors like:

XDMP-LEXVAL (...) Invalid lexical value

You can clear all data stored in the Expanded Tree Cache in two ways:

  1. By restarting MarkLogic service on every host in the cluster. This will automatically clear the cache, but it may not be practical on production clusters.
  2. By issuing a call to xdmp:expanded-tree-cache-clear() command on each host in the cluster. You can run the function in query console or via REST endpoint and you will need a user with admin rights to actually clear the cache.

An example script has been provided that demonstrates the use of XQuery to execute the call to clear the Expanded Tree Cache against each host in the cluster:

Please contact MarkLogic Support if you encounter any issues with this process.

Related KB articles and links:


XDMP-ODBCRCVMSGTOOBIG can occur when a non-ODBC process attempts to connect to an ODBC application server.  A couple of reasons that this can happen is that there is an http application that has been accidentally configured to point to the ODBC port, or a load balancer is sending http health checks to an ODBC port. There are a number of common error messages that can indicate whether this is the case.

Identifying Errors and Causes

One method of determining the cause of an XDMP-ODBCRCVMSGTOOBIG error is to take the size value and convert it to Characters.  For example, given the following error message:

2019-01-01 01:01:25.014 Error: ODBCConnectionTask::run: XDMP-ODBCRCVMSGTOOBIG size=1195725856, conn=

The size, 1195725856, can be converted to the hexadecimal value 47 45 54 20, which can be converted to the ASCII value "GET ".  So what we see is a GET request being run against the ODBC application server.

Common Errors and Values

Error Hexadecimal Characters
XDMP-ODBCRCVMSGTOOBIG size=1195725856 47 45 54 20 "GET "
XDMP-ODBCRCVMSGTOOBIG size=1347769376 50 55 54 20 "PUT "
XDMP-ODBCRCVMSGTOOBIG size=1347375956 50 4F 53 54 "POST"
XDMP-ODBCRCVMSGTOOBIG size=1212501072 48 45 4C 50 "HELP"


XDMP-ODBCRCVMSGTOOBIG errors, do not affect the operation of MarkLogic Server, but can cause error logs to fill up with clutter.  Determining that the errors are caused by an http request to an ODBC port can help to identify the root cause, so the issue can be resolved.


Meters data can be a good resource for getting an approximation of the number of requests being managed by the server at a given time. It's also important to understand how Meters data is generated, should there be a discrepancy between the Meters samples, and the entries in the access log.

Meters Request Data

The Meters data is designed to record a sampling of activity, every few seconds. Meters data is not designed to accurately record server request rates much lower than every few seconds. Request rates are 15-second moving averages, recalculated every second and available in real time through the xdmp:host-status, xdmp:server-status and xdmp:forest-status built-in functions.

Meters Samples

The metering subsystem samples these real-time rates on the minute and saves the samples in the Meters database. Meters sampled data of events that occur less frequently than the moving average period will be lower than the number of access log entries. The difference between the two will depend on when the last event happened and when the sample was taken.

This mean that if an event happens once a minute, the request rate will rise when an event happens, but then decay away within a few seconds. If the sample is taken after the event has decayed, the saved meters data will be lower than the actual number of requests


The result of the Meters sampling method means that it is not unusual for Meters to under report the number of requests in certain circumstances.


In MarkLogic Server v7.0-2, the tokenizer keys, for languages where MarkLogic provides generic language support, were removed so that they now all use the same key. For example, Greek falls into this class of languages. This change was made as part of an optimization for languages in which MarkLogic Server has advanced stemming and tokenization support.  

Stemmed searches that include characters from languages that do not have advanced language support, performed on MarkLogic Server v7.0-2 or later releases, against content loaded on a version previous to v7.0-2, may not return the expected results.


In order to successfully run these stemmed searches, you can either:

  • Reindexing the database ; or
  • Reinsert the affected documents (i.e. the documents that contain characters in languages for which MarkLogic Server only has generic language support).

If these are not possible in your environment, you can always run the query unstemmed.

An Example

The following example demonstrates the issue

  1. On MarkLogic Server version 7.0-1, insert a document (test.xml) that contains the Greek character 'ε'.
  2. Run this query 
    xdmp:estimate( cts:search( doc('test.xml'), 'ε')),
    cts:contains( doc('test.xml'), 'ε')
  3. The query will return the correct results: 1, true
  4. Upgrade MarkLogic Server to version 7.0-3 or later and run the query again
  5. The query will return incorrect results: 0, false 
  6. Reindex the database and re-run the query
  7. The query will return the correct result once again.


As the Configuration Manager has been deprecated starting with MarkLogic version 9.0-5, there is a common question on the ways how the configuration of database or an application server from an old version of MarkLogic instance to new version of MarkLogic server or between any two versions of MarkLogic server post 9.0-4

This article outlines the steps on how to migrate the resource configuration information from one server to other using Gradle and ml-gradle plugin.


As a pre-requisite, have the compatible gradle (6.x) and the latest ml-gradle plugin(latest version is 4.1.1) installed and configured on the client (local machine or a machine from where the gradle project has to run) machine. 


The entire process is divided into two major parts Exporting resource configuration from the source cluster and Importing the resource configuration onto the destination cluster.

1. Exporting resource configuration from the source cluster/host:

On the machine where gradle is installed and the plug-in is configured, create a project as suggested in

In the example steps below the source project is  /Migration

1.1 Creating the new project with the source details:

While creating this new project, please provide the host MarkLogic server host, username, password, REST port, multiple environment details in the command line and once the project creation is successful, you can verify the Source server details in the file.

macpro-user1:Migration user1$ gradle mlNewProject
Starting a Gradle Daemon (subsequent builds will be faster)
> Configure project :For Jackson Kotlin classes support please add "com.fasterxml.jackson.module:jackson-module-kotlin" to the classpath 
> Task :mlNewProject
Welcome to the new project wizard. Please answer the following questions to start a new project. Note that this will overwrite your current build.gradle and files, and backup copies of each will be made.

[ant:input] Application name: [myApp]
<--<-<--<-------------> 0% EXECUTING [20s]
[ant:input] Host to deploy to: [SOURCEHOST]
<-------------> 0% EXECUTING [30s]
<-------------> 0% EXECUTI[ant:input] MarkLogic admin username: [admin]
<-------------> 0% EXECUTING [34s]
[ant:input] MarkLogic admin password: [admin]
<-<---<--<-------------> 0% EXECUTING [39s]
[ant:input] REST API port (leave blank for no REST API server):
<---<-------------> 0% EXECUTING [50s]
[ant:input] Test REST API port (intended for running automated tests; leave blank for no server):
<-------------> 0% EXECUTING [1m 1s]
[ant:input] Do you want support for multiple environments?  ([y], n)
<-------------> 0% EXECUTING [1m 6s]
[ant:input] Do you want resource files for a content database and set of users/roles created? ([y], n)
<-------------> 0% EXECUTING [1m 22s]
Making directory: ~/Migration/src/main/ml-config
Making directory: ~/Migration/src/main/ml-modules
Use '--warning-mode all' to show the individual deprecation warnings.


1 actionable task: 1 executed

Once this build was successful, you can see the below directory structure created under the project directory:

1.2 Exporting the configuration of required resources:

Once the new project is created, export the required resources from the source host/cluster by creating a properties file(Not in the project directory but some other directory) as suggested in the documentation with all the resources details that need to be exported to the destination cluster. In that properties file, specify the names of the resources (Databases, Forests, app servers etc..)using the keys mentioned below with comma-delimited values:

For example, a sample properties file looks like below: 


Once the file is created, run the below: 

macpro-user1:Migration user1$ gradle -PpropertiesFile=~/ mlExportResources

> Task :mlExportResources
Exporting resources to: ~/Migration/src/main/ml-config

Exported files:
Export messages:
The 'forest' key was removed from each exported database so that databases can be deployed before forests.
The 'range' key was removed from each exported forest, as the forest cannot be deployed when its value is null.
The exported user files each have a default password in them, as the real password cannot be exported for security reasons.
Use '--warning-mode all' to show the individual deprecation warnings.


1 actionable task: 1 execute

Once this build was successful, the below directory structure is created under the project directory which includes the list of resources that have been exported and their config files (Example screenshot below):

With this step finished, the export of required resources from the source cluster is created. This export is now ready to be imported with these configurations(resources) into the new/destination cluster.

2. Importing Resources and the configuration on new/Destination host/Cluster:

For importing resource configuration on to the destination host/cluster, again create a new project and use the export that has been created in step 1.2 Exporting the configuration of required resources. Once these configuration files are copied to the new project, make the necessary modification to reflect the new cluster (Like hosts and other dependencies) and then deploy the configuration into the new project.

2.1 Creating a new project for the import with the Destination Host/cluster details:

While creating this new project, provide the destination MarkLogic server host, username, password, REST port, multiple environment details in the command line and once the project creation is successful, please verify the destination server details in the file. In the example steps below the source project is  /ml10pro

macpro-user1:ml10pro user1$ gradle mlNewProject
> Task :mlNewProject
Welcome to the new project wizard. Please answer the following questions to start a new project.

Note that this will overwrite your current build.gradle and files, and backup copies of each will be made.
[ant:input] Application name: [myApp]
<-------------> 0% EXECUTING [11s]
[ant:input] Host to deploy to: [destination host]

<-------------> 0% EXECUTING [25s]
[ant:input] MarkLogic admin username: [admin]

<-------------> 0% EXECUTING [28s]
[ant:input] MarkLogic admin password: [admin]

<-------------> 0% EXECUTING [36s]
[ant:input] REST API port (leave blank for no REST API server):

<-------------> 0% EXECUTING [41s]
[ant:input] Do you want support for multiple environments?  ([y], n)

<-------------> 0% EXECUTING [44s]
[ant:input] Do you want resource files for a content database and set of users/roles created? ([y], n)

<-------------> 0% EXECUTING [59s]

Making directory: /Users/rgunupur/Downloads/ml10pro/src/main/ml-config
Making directory: /Users/rgunupur/Downloads/ml10pro/src/main/ml-modules
Use '--warning-mode all' to show the individual deprecation warnings.


1 actionable task: 1 executed

Once the project is created, you can observe the below directory structure created:


2.2 Copying the required configuration files from Source project to destination project:

In this step, copy the configuration files that have been created by exporting the resource configuration from the source server in step “ 1.2 Exporting the configuration of required resources”

For example, 

macpro-user1:ml10pro user1$ cp ~/Migration/src/main/ml-config  ~/ml10pro/src/main/ml-config

After copying, the directory structure in this project looks like below:


Please make sure that after copying configuration files from source to destination, review each and every configuration file and make the necessary changes for example, the host details should be updated to Destination server host details. Similarly, perform any other changes that are needed per the requirement.

For example, under ~/ml10pro/src/main/ml-config/forests/<database>/<forestname>.xml file you see the entry:

"host" : "Sourceserver_IP_Adress",
change the host details to reflect the destination host details. So after changing, it should look like:
"host" : "Destination_IP_Adress",
Similarly, For each forest, please define the host details of the specific node that is required.
For example for forest 1, if it has to be on node 1, define forest1.xml with 
"host" : "node1_host",
Similarly, any other configuration parameters that have to be updated, it has to be updated in that specific resource.xml file under the destination ml-config directory.
Best Practice:
As this involves modifying the configuration files, it is advised to have back up and maintain version control(like GitHub or svn) to track back the modifications.
If there is a requirement to deploy the same configuration to multiple environments (like PROD, QA, TEST) all that is needed is to have files created for a different environment where this configuration needs to be deployed. As explained in step 2.1 Creating a new project for the import with the Destination Host/cluster details, the property values for different environments need to be provided while creating the project so that the file for different environments are created.

2.3 Importing the configuration (Running mlDeploy):

In this step, import the configuration that has been copied/exported from a resource. After making sure that the configuration files are all copied from the source and then modified for the correct host details and other required changes, run the below:

macpro-user1:ml10pro user1$ gradle mlDeploy
> Task :mlDeleteModuleTimestampsFile

Module timestamps file /Users/rgunupur/Downloads/ml10pro/build/ml-javaclient-util/ does not exist, so not deleting
Use '--warning-mode all' to show the individual deprecation warnings.See


3 actionable tasks: 3 executed

Once the build is successful, go to the admin console of the destination server and verify that all the required configurations have been imported from the source server.


Further read:

For more information, refer to our documentation and knowledge base articles:



This Knowledgebase article outlines the procedure to enable HTTPS on an AWS Elastic Load Balancer (ELB) using Route 53 or an external supplier as the DNS provider and with an AWS generated certificate.

The AWS Certificate Manager (ACM) automatically manages and renews the certificate and this certificate will be accepted by all current browsers without any security exceptions.

The downside is that you do need control over your Hosted DNS name entry - either through Route 53 or through another provider.


  1. MarkLogic AWS Cluster
  2. An AWS Route 53 hosted Domain or similar externally hosted Domain; the procedure described in this article assumes that Route 53 is being used, however where possible we have tried to detail the changes needed and these should also be applicable for another external DNS provider.


  1. Click on your hostname in Route 53 to edit it

  1. Create a new Alias Record Set to point to your Elastic Load Balancer.

  1. In the Record Set entry on the right hand side, enter an Alias name for your ELB host, select Alias and from the Alias Target select the ELB load balancer to use, then click the Create button to update the Route 53 entry.

  1. In can take a little while for AWS to propagate the DNS update throughout the network but once it is available it is worth checking that you are able to reach your MarkLogic cluster using the new address, e.g.

  1. Once the Route 53 entry is updated and available you will need to request a new certificates through ACM, if you have other certificates already in ACM you can select Request a certificate

Otherwise select Get Started with Provision Certificates and select Request a public certificate

  1. Enter your required Certificate domain name and click Next:

Note: This should match your DNS Alias name entry created in Step 3.

In addition you can also add additional records such as a "Wildcard" entry, this is particularly useful if you want to use the same certificate for multiple hostnames, e.g if you have Clusters identified by versions such as ml9.[yourdomain].com & ml10.[yourdomain].com

  1. Select DNS as the Validation Method and click "Review"

  1. Before confirming and proceeding check the Hostnames are correct as certificates with invalid hosts names will not be usable.

  1. To complete validation, AWS will require you to add random CNAME entries to the DNS record to confirm that you are the owner. If you are using Route 53 this is as simple as selecting each entry in turn, numbers will vary depending on the number of Doamin name entries you specified in step 6, and clicking "Create record in Route 53". Once all entries have been created click Continue

  1. If the update is successful a Success message is displayed

  1. If your DNS Hostname is provided by an external provider you will need to download the entries using the "Export DNS configuration to a file link" and provide this information to your DNS provider to make the necessary updates.

The file is a simple CSV file and specifies one or more CNAME entries that need to be created with the required name and values. Once the AWS DNS validation process picks up these changes have been made the certificate creation process will be completed automatically.

Domain Name,Record Name,Record Type,Record Value
  1. Once the Certificate has been validated by either of the methods in Steps 9 or 11 the certificate will be marked as Issued and be available for the Load Balancer to use.

  1. Configure the ELB for HTTPS And the new AWS generated Certificate
  2. Edit the ELB Listeners and change the Cipher

  1. (Optional) For production environments it is recommended to allow TLSv1.2 only

  1. Next select the Certificate and repeat Steps 15 and 16 for each listener that you want to secure.

  1. From the ACM available certificates select the newly generated certificate for this domain and click Save

  1. Save the Listeners updates and ensure the update was successful.

  1. You should now be able to access your MarkLogic cluster securely over HTTPS using the AWS generated certificate.


HAProxy ( is a free, fast and reliable solution offering high availability, load balancing and proxying for TCP and HTTP-based applications.

MarkLogic 8 (8.0-8 and above) and MarkLogic 9 (9.0-4 and above) include improvements to allow you to use HAProxy to connect to MarkLogic Server.

MarkLogic Server supports balancing application requests using both the HAProxy TCP and HTTP balancing modes depending on the transaction mode being used by the MarkLogic application as detailed below:

  1. For single-statement auto-commit transactions running on MarkLogic version 8.0.7 and earlier or MarkLogic version 9.0.3 and earlier, only TCP mode balancing is supported. This is due to the fact that the SessionID cookie and transaction id (txid) are only generated as part of a multi-statement transaction.
  2. For multi-statement transactions or for single-statement auto-commit transactions running on MarkLogic version 8.0.8 and later or MarkLogic version 9.0.4 and later both TCP and HTTP balancing modes can be configured.

The Understanding Transactions in MarkLogic Server and Single vs. Multi-statement Transactions in the MarkLogic documentation should be referenced to determine whether your application is using single or multi-statement transactions.

Note: Attempting to use HAProxy in HTTP mode with Single-statement transactions prior to MarkLogic versions 8.0.8 or 9.0.4 can lead to unpredictable results.

Example configurations

The following example configurations detail only the parameters relevant to enabling load balancing of a MarkLogic application, for details of all parameters that can be used please refer to the HAProxy documentation.

TCP mode balancing

The following configuration is an example of how to balance requests to a 3-node MarkLogic application using the "roundrobin" balance algorithm based on the source IP address. The health of each node is checked by a TCP probe to the application server every 1 second.

backend app
mode tcp
balance roundrobin
stick-table type ip size 200k expire 30m
stick on src
default-server inter 1s
server app1 ml-node-1:8012 check id 1
server app2 ml-node-2:8012 check id 2
server app3 ml-node-3:8012 check id 3

HTTP mode balancing

The following configuration is an example of how to balance requests to a 3-node MarkLogic application using the "roundrobin" balance algorithm based on the "SessionID" cookie inserted by the MarkLogic server.

The health of each node is checked by issuing an HTTP GET request to the MarkLogic health check port and checking for the "Healthy" response.

backend app
mode http
balance roundrobin
cookie SessionID prefix nocache
option httpchk GET / HTTP/1.1\r\nHost:\ monitoring\r\nConnection:\ close
http-check expect string Healthy
server app1 ml-node-1:8012 check port 7997 cookie app1
server app2 ml-node-2:8012 check port 7997 cookie app2
server app3 ml-node-3:8012 check port 7997 cookie app3


MarkLogic Server organizes Trusted Certificate Authorities (CA) by Organization Name.  Trusted Certificate Authorities are the issuers of digital certificates, which in turn are used to certify the public key on behalf of the named subject as given in the certificate.  These certificates are used in the authentication process by:

  1. A MarkLogic Application Server configured to use SSL (HTTPS).
  2. Any Web Client which is making a connection to a MarkLogic Application Server over HTTPS (in the case of SSL Client Authentication).

Example Scenarios

Consider the following example:

$openssl x509 -in CA.pem -text -noout
        Version: 3 (0x2)
        Serial Number: 18345409437988140316 (0xfe97fcaf8a61b51c)
    Signature Algorithm: sha1WithRSAEncryption
        Issuer: C=US, ST=CA, L=San Carlos, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic CA
            Not Before: Nov 30 04:08:31 2015 GMT
            Not After : Nov 29 04:08:31 2020 GMT
        Subject: C=US, ST=CA, L=San Carlos, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic CA

In this example, From viewing the Trusted CA Subject field, the CA Certificate name will be listed with the organisation name of "MarkLogic Corporation" (O=MarkLogic Corporation) in MarkLogic's list of Certificate Authorities.

You can view the full list of currently configured Trusted Certificate Authorities by logging into the MarkLogic administration Application Server (on port 8001) and viewing the status page: Configure -> Security -> Certificate Authorities

Trusted CA Certificate without Organization name (O=)

In some cases, there are legitimate Trusted CA Certificates which do not contain any further information about the Organization responsible for the certificate.

The example below shows a sample self signed root CA (DemoLab CA) which highlights this scenario:

$openssl x509 -in DemoLabCA.pem  -text -noout
        Version: 3 (0x2)
        Serial Number: 12836463831212471403 (0xb22447d80f91b46b)
    Signature Algorithm: sha1WithRSAEncryption
        Issuer: CN=DemoLab CA
            Not Before: Nov 30 05:23:13 2015 GMT
            Not After : Nov 29 05:23:13 2020 GMT
        Subject: CN=DemoLab CA

If this Certificate were to be loaded into the MarkLogic, no name would appear under the list of <em>Certificate Authorities</em>in the list provided through the administration Application Server at Configure -> Security -> Certificate Authorities

In the case of the above example, it would be difficult to use the certificate validated by DemoLab CA (and to use DemoLab CA as our Trusted Certificate Authority) as MarkLogic will only list certificates that are associated with an Organization.


To workaround this issue, we can configure MarkLogic to use the certificate through some scripting with Query Console.

1) Loading the CA using Query Console

Start by using a call to pki:insert-trusted-certificates to load the Trusted CA into MarkLogic.  The sample Query Console code below demonstrates this process (Please ensure this query is executed against the Security database)

Make a note of value of the id returned by MarkLogic. It will return an unsigned long (xs:unsignedLong) which is the id value that can be used later to retrieve that certificate

2) Attach Trusted CA with "SSL Client Certificate Authorities" using Query Console

The next step is to associate the certificate that we just inserted from our filesystem (DemoLabCA.pem) with a given MarkLogic Application Server. Once this is done, any client connecting to that application server over SSL will be presented with the cerificate and DemoLab CA will be used to match the certificate using the Common Name value (Common Name eq "DemoLab CA")

3) Verify attached Trusted CA for Client Cetificate Authorities

Executing the above code should return the same identifier (for the Trusted CA) as returned as result of the code executed in step 1. Additionally, we can see that our Application Server (DemoAppServer) is now configured to expect an SSL Client Certificate Authority signed by DemoLab CA.

Further Reading


MarkLogic Server is engineered to scale out horizontally by easily adding forests and nodes. Be aware, however, that when adding resources horizontally, you may also be introducing additional demand on the underlying resources.


On a single node, you will see some performance improvement in adding additional forests, due to increased parallelization. This is a point of diminishing returns, though, where the number of forests can overwhelm the available resources such as CPU, RAM, or I/O bandwidth. Internal MarkLogic research (as of April 2014) shows the sweet spot to be around six forests per host (assuming modern hardware). Note that there is a hard limit of 1024 primary forests per database, and it is a general recommendation that the total number of forests should not grow beyond 1024 per cluster.

At cluster level, you should see performance improvements in adding additional hosts, but attention should be paid to any potentially shared resources. For example, since resources such as CPU, RAM, and I/O bandwidth would now be split across multiple nodes, overall performance is likely to decrease if additional nodes are provisioned virtually on a single underlying server. Similarly, when adding additional nodes to the same underlying SAN storage, you'll want to pay careful attention to making sure there's enough I/O bandwidth to accommodate the number of nodes you want to connect.

More generally, additional capacity above a bottleneck generally exacerbates performance issues. If you find your performance has actually decreased after horizontally scaling out some part of your stack, it is likely that a part of your infrastructure below the part at which you made changes is being overwhelmed by the additional demand introduced by the added capacity.


MarkLogic Application Servers will keep a connection open after completing and responding to a request, waiting for another new request, until the Keep Alive timeout expires. However, there is an exception scenario where the connection will close regardless of timeout settings when the content is larger then 1 MB. This article is intended to provide further insight into connection close with respect to Payload size.

HTTP Header


In general, Application Servers communicating in HTTP send the Content-Length header as part of their response HTTP Headers to indicate how many bytes of data the client application should expect to receive. For example

HTTP/1.1 200 OK
Content-type: application/sparql-results+json; charset=UTF-8
Server: MarkLogic
Content-Length: 1264
Connection: Keep-Alive
Keep-Alive: timeout=5

This requires Application Servers to know the length of the entire response data before the very first bytes (Response HTTP Headers) are put on to the wire. For small amounts of data, the time to calculate the content-length is fast; For large amounts of content, the calculation may be time consuming with the extreme being that the client finds the server unresponsive due to the delay in calculating the entire response length. Additionally, the server may need to bring the entire content into Memory Buffer, putting further burden on server resources.


To allow servers to begin transmitting dynamically-generated content before knowing the total size of that content, HTTP 1.1 supports chunked encoding. This technique is widely used in music & video streaming and other industries. Chunked encoding eliminates the need of knowing the entire content length before sending a portion of the data, thus making the server looks more responsive.

At the time of this writing, MarkLogic Server (v8.0-6 and earlier releases) does not support chunked encoding. However, do look for this feature in future releases of MarkLogic Server.

Connection Close

In MarkLogic Server v7 and v8, MarkLogic Server closes the connection after transmitting content greater 1MB, which allows MarkLogic to avoid calculating content length in advance. The client will not see Content-Length Header for Larger (>1MB) content in HTTP Response from MarkLogic. Instead it will receive a Connection Close header in HTTP Response. After sending the entire content, MarkLogic Server will terminate the connection, to indicate to Client that the end of content has been reached.

Closing the existing connection for content larger then 1MB is an exception to the Keep-Alive configuration. This may result in unexpected behavior on clients that relying on MarkLogic Server respecting the Keep-Alive configuration, so this behavior should be accounted while designing Client Application Connection Pool.

Client Applications may have to send TCP SYN again to establish new connection to send subsequent request, which will add overhead of TCP 3 way handshake before sending next request. However, in the context of the data transfer for larger payload (>1MB), where many more round trips are added in overall communication, overhead of TCP 3 way handshake is very nominal.

Further Reading


CSV files are a very common data exchange format. It is often used as an export format for spreadsheets, databases or any other application. Depending on the application, you might be able to change the delimiter character to a #hash or *asterix etc. One of the default delimiter definitions is a tab character. Content Pump supports reading and loading such CSV files.


The Content Pump -delimiter option defines which delimiter will be used to split the columns. Defining a tab as a value for the delimiter option on the command line isn't straight forward.

Loading tab delimited data files with content pump can result in an error massage like the following:

mlcp>bin/ IMPORT -host localhost -port 9000 -username admin -password secret -input_file_path sample.csv -input_file_type delimited_text -delimiter '    ' -mode local
13/08/21 15:10:20 ERROR contentpump.ContentPump: Error parsing command arguments: 
13/08/21 15:10:20 ERROR contentpump.ContentPump: Missing argument for option: delimiter
usage: IMPORT [-aggregate_record_element <QName>]

Depending on the command line shell, a tab needs to be escaped to be understand from the shell script: 

On bash shell, this should work: -delimiter $'\t'
On Bourne shell, this should work: -delimiter 'Ctrl+V followed by tab' 
Alternative way would be to use: -delimiter \x09 

If none of these work, another approach you can try is to use the -options_file /path/to/options-file parameter. The options file can contains all of the same parameters as the command line does. The benefit of using an option file is that the command line is simpler and characters are interpreted as intended. The options file will contain multiple lines where the first line is always the action like IMPORT,  EXPORT etc. followed by a pair of lines. The first line is the option parameter and second the value for the option.

A sample could look like the following:

' '

Make sure the file is saved in UTF-8 format to avoid any parsing problems. To define a tab as delimiter, place a real tab between single quotes (i.e. '<tab>')

To use this option file with mlcp execute the following command:

Linux, Mac, Solaris:

mlcp>bin/ -options_file /path/to/sample.options


mlcp>bin/mlcp.bat -options_file /path/to/sample.options

The options file can take any paramter which mlcp understands. It is important that the action command is defined on the first line. It is also possible to use both command line parameters and the option file. Command line parameters take precedence over those defined in the options file.


There are sometimes circumstances where the MarkLogic data directory owner can be changed.  This can create problems where MarkLogic Server is unable to read and/or write its own files but is easily corrected.

MarkLogic Server user

There are sometimes circumstances where the MarkLogic data directory owner can be changed; this can create problems where MarkLogic Server is unable to read and/or write its own files.

The default location for the data directory on Linux is /var/opt/MarkLogic and the default owner is daemon.

If you are using a nondefault (non-daemon) user to run MarkLogic, for example mlogic, you would usually have 

    export MARKLOGIC_USER=mlogic



Correct the data directory ownership

If the file ownership is incorrect, the way forward is to change the ownership back to the correct user.  For example, if using the default user daemon:

1.  Stop MarkLogic Server.

2.  Make sure that the user you are using is correct and available on this machine.

3.  Change the ownership of all the MarkLogic files (by default /var/opt/MarkLogic and any/all forests for this node) to daemon.  The change needs to be made recursively below the directory to include all files.  Assuming all nodes in the cluster run as daemon, you can use another unaffected node as a check.  You may need to use root/sudo permissions to change owner.  For example:

chown -R daemon:daemon /var/opt/MarkLogic

4.  Start MarkLogic Server.  It should now come up as the correct user and able to manage its files.



MarkLogic Server allows you to set-up an alerting application to notify users when new content is available that matches a predefined query. This can be achieved through the Alerting API with the Content Processing Framework (CPF). CPF is designed to keep state for documents, so it is easy to use CPF to keep track of when a document in a particular scope is created or updated, and then perform some action on that document. However, although alerting works for document updates and inserting, it does not occur for document deletes. You will have to create a custom CPF pipeline to catch the delete through an appropriate status transition.


To achieve alerting for document delete, you will have to write your own custom pipeline with status transition to handle deletes. For example:

   <annotation>custom delete action</ annotation>

The higher 'priority' value and 'always' = true indicates that the custom pipeline has precedence over the default status change handling pipeline to handle document deletes.  Similarly, in the action module, you can write your custom code for alerting.

Note: By default, when a document is deleted, the on-delete pre-commit trigger is fired and it calls the action in the Status Change Handling pipeline (if enabled) for ‘delete’ status transition. It is recommended that you do not modify this pipeline as it can cause compatibility problems in future upgrades and releases of MarkLogic server.


Packer from HashiCorp is a provisioning tool, allowing for the automated creation of machine images, extending the ability to manage infrastructure to machine images. Packer supports a number of different image types including AWS, Azure, Docker, VirtualBox and VMWare.

Packer can be used to create a customized MarkLogic Amazon Machine Image (AMI) which can then be deployed to AWS and used in a Cluster. We recommend using the official MarkLogic AMIs whenever possible, and making the necessary customizations to the official images. This ensures that MarkLogic Support is able to quickly diagnose any issues that may occur, as well as reducing the risk of running MarkLogic in a way that is not fully supported.

The KB article, Customizing MarkLogic with Packer and Terraform, covers the process of customizing the official MarkLogic AMI using Packer.

Setting Up Packer

For the purpose of this example, I will assume that you have already installed the AWS CLI, with the correct credentials, and you have installed Packer.

Packer Templates

A Packer template is a JSON configuration file that is used to define the image that we want to build. Templates have a number of keys available for defining the machine image, but the most commonly used ones are builders, provisioners and post-processors.

  • builders are responsible for creating the images for various platforms.
  • provisioners is the section used to install and configure software running on machines before turning them into images.
  • post-processors are actions applied to the images after they are created.

Creating a Template

For our example, we are going to take build from the official Amazon Linux 2 AMI, where we will install the required prerequisite packages, install MarkLogic, and apply some customizations before creating a new image.

Defining Variables

Variables help make the build more flexible, so we will utilize a separate variables file, marklogic_vars.json, to define parts of our build.

  "vpc_region": "us-east-1",
  "vpc_id": "vpc-06d3506111cea30d0",
  "vpc_public_sn_id": "subnet-03343e69ae5bed127",
  "vpc_public_sg_id": "sg-07693eb077acb8635",
  "instance_type": "t3.large",
  "ssh_username": "ec2-user",
  "ami_filter": "amzn2-ami-hvm-2.*-ebs",
  "ami_owner": "amazon",
  "binary_source": "./",
  "binary_dest": "/tmp/",
  "marklogic_binary": "MarkLogic-10.0-4.2.x86_64.rpm"

Here we've identified the instance details so our image can be launched, as well as the filter values, ami_filter and ami_owner, that will help us retrieve the correct base image for our AMI. We are also identifying the name of the MarkLogic binary, along with some path details on where to find it locally, and where to place it on the remote host.

Creating Our Template

Now that we have some of the specific build details defined, we can create our template, marklogic_ami.json. In this case we are going to use the build and provisioners keys in our build.

    "builders": [
        "type": "amazon-ebs",
        "region": "{{user `vpc_region`}}",
        "vpc_id": "{{user `vpc_id`}}",
        "subnet_id": "{{user `vpc_public_sn_id`}}",
        "associate_public_ip_address": true,
        "security_group_id": "{{user `vpc_public_sg_id`}}",
        "source_ami_filter": {
          "filters": {
          "virtualization-type": "hvm",
          "name": "{{user `ami_filter`}}",
          "root-device-type": "ebs"
          "owners": ["{{user `ami_owner`}}"],
          "most_recent": true
        "instance_type": "{{user `instance_type`}}",
        "ssh_username": "{{user `ssh_username`}}",
        "ami_name": "ml-{{isotime \"2006-01-02-1504\"}}",
        "tags": {
          "Name": "ml-packer"
    "provisioners": [
        "type": "shell",
        "script": "./"
        "destination": "{{user `binary_dest`}}",
        "source": "{{user `binary_source`}}{{user `marklogic_binary`}}",
        "type": "file"
        "type": "shell",
        "inline": [ "sudo yum -y install /tmp/{{user `marklogic_binary`}}" ]

In the build section we have defined the network and security group configurations and the source AMI details. We have also defined the naming convention (ml-YYYY-MM-DD-TTTT) for the our new AMI with ami_name and added a tag, ml-packer. Both of those will make it easier to find our AMI when it comes time to deploy it.


In our example, we are using the shell provisioner to execute a script against the machine, the file provisioner to copy the MarkLogic binary file to the machine, and the shell provisioner to install the MarkLogic binary, all of which will be run prior to creating the image. There are also provisioners available for Ansible, Salt, Puppet, Chef, and PowerShell, among others.

Provisioning Script

For our custom image, we've determined that we need install Git, to create a symbolic link MarkLogic needs on Amazon Linux 2, and to setup /etc/marklogic.conf to disable the MarkLogic Managed Cluster feature, all of which we will do inside a script. We've named the script, and it is stored in the same directory as our Packer template.

#!/bin/bash -x
echo "**** Starting ****"
echo "**** Creating LSB symbolic link ****"
sudo ln -s /etc/system-lsb /etc/redhat-lsb
echo "**** Installing Git ****"
sudo yum install -y git
echo "**** Setting Up /etc/marklogic.conf ****"
echo "export MARKLOGIC_MANAGED_NODE=0" >> /tmp/marklogic.conf
sudo cp /tmp/marklogic.conf /etc/
echo "**** Finishing ****"

Executing Our Build

Now that we've completed setting up our build, it's time to use Packer to create the image.

packer build -debug -var-file=marklogic_vars.json marklogic_ami.json

Here you can see that we are telling Packer to do a build using marklogic_ami.json and referencing our variables file with the -var-file flag. We've also added the -debug flag which will disable parallelism and enable debug mode. In debug mode, Packer will stop after each step and prompt you to hit Enter to go to the next step.

The last part of the build output will print out the details of our new image:

Wrapping Up

We have now created a customized MarkLogic AMI using Packer, which can be used to deploy a self managed cluster.


If you're looking at the MarkLogic Admin UI on port 8001, you may have noticed that the status page for a given database displays the last backup dateTime for a given database.

We have been asked in the past how this gets computed so the same check can be performed using your own code.

This Knowledgebase article will show examples that utilise XQuery to get this information and will explore the possibility of retrieving this using the MarkLogic ReST API

XQuery: How does the code work?

The simple answer is in the forest status for each of the forests in the database (note these values only appear if you have created a backup already).  For the sake of these examples, let's say we have a database (called "test") which contains 12 forests (test-1 to test-12).  We can get the backup status for these using a call to our ReST API:


In the results returned, you should see something like this:

last-backup : 2016-02-12T12:30:39.916Z datetime
last-incr-backup : 2016-02-12T12:37:29.085Z datetime

In generating that status page, what the MarkLogic code does is to create an aggregate: a database doesn't contain documents in MarkLogic; it contains forests and those forests contain documents.

Continuing the example above (with a database called "test" containing 12 forests) if I run the following:

This will return the forest status(es) for all forests in the database "test" and return the forest names using XPath, so in this case, we would see:

<forest-name xmlns="">test-1</forest-name>
<forest-name xmlns="">test-12</forest-name>

Our admin UI is interrogating each forest in turn for that database and finding out the metrics for the last backup.  So to put that into context, if we ran the following:

This gives us:

<last-backup xmlns="">2016-02-12T12:30:39.946Z</last-backup>
<last-backup xmlns="">2016-02-12T12:30:39.925Z</last-backup>

The code (or the status report) doesn't want values for all 12 forests, it just wants the time the last forest completed the backup (because that's the real time the backup completed), so our code is running a call to fn:max:

Which gives us the max value (as these are all xs:dateTimes, it's finding the most recent date), which in the case of this example is:


The same is true for the last incremental backup (note all that we're changing here is the XPath to get to the correct element:

So we can get the max value for this by getting the most recent time across all forests:

This would give us 2016-02-12T12:37:29.161Z

Using the ReST API

The ReST API also allows you to get this information but you'd need to jump through a few hoops to get to it; the ReST API status for a given database would give you the names of all the forests attached to that database:


And from there you could GET the information for all of those forests:


Once you'd got all those values, you could do what MarkLogic's admin code does and get the max values for them - although at this stage, it might make more sense to write a custom endpoint that returns this information, something like:

Where you could make a call to that module to get the aggregates (e.g.):


This would return the database status for any given parameter-name that is passed in.



When searching for matches using OR'ed word-queries, and in the case where there are overlapping matches, (i.e. one query contains the text of another query), the results of a cts:highlight query are not as desired.


For example:


let $p := <p>From the memoirs of an accomplished artist</p>


let $query :=



(cts:word-query("accomplished artist"),

cts:word-query("memoirs of an accomplished artist"))



return cts:highlight($p, $query, <m>{$cts:text}</m>)


 The desired outcome of this would be:

               <p>From the <m>memoirs of an accomplished artist</m> </p>

 Whereas, the actual results are:

                <p>From the <m>memoirs of an </m> <m>accomplished artist</m></p>


This behavior is by design and the results are expected. It is because cts:highlight  breaks up overlapping  areas into separate matches.

The cts:highlight built-in variables – $cts:queries and $cts:action help in understanding how this works, as well as to work-around this problem.

  $cts:queries --> returns the matching queries for each of the matched texts.

  $cts:action --> can be used with xdmp:set to specify what should happen next

  • "continue" - (default) Walk the next match. If there are no more matches, return all evaluation results.
  • "skip" - Skip walking any more matches and return all evaluation results
  • "break" - Stop walking matches and return all evaluation results

   For eg., replacing the return statement with the following in the original query:


 cts:highlight($p, $query,






<p>From the

     <m>memoirs of an



      <cts:word-query xmlns:cts="">

       <cts:text xml:lang="en">memoirs of an accomplished artist</cts:text>





   <m>accomplished artist



      <cts:word-query xmlns:cts="">

     <cts:text xml:lang="en">memoirs of an accomplished artist</cts:text>


      <cts:word-query xmlns:cts="">

    <cts:text xml:lang="en">accomplished artist</cts:text>




These results give us a better understanding of how the text is being matched. We can see that " accomplished artist" is matched by both the word-queries 'accomplished artist' and 'memoirs of an accomplished artist'; hence the results of cts:highlight seem different.

To work around this problem, we can insert a small piece of code: 


let $p := <p>From the memoirs of an accomplished artist</p>

let $query :=


        (cts:word-query("accomplished artist"),

        cts:word-query("memoirs of an accomplished artist")))


     return cts:highlight($p,$query,


       ( if (count($cts:queries) gt 1) then xdmp:set($cts:action, "continue")


       ( let $matched-text := <x>{$cts:queries}</x>/cts:word-query/cts:text/data(.)

        return <m>{$matched-text}</m> )





<p>From the <m>memoirs of an accomplished artist</m></p>



Please note that this solution relies on assumptions about what's inside the or-query, but this example could be modified to handle other overlapping situations.




      These results giv

      e us a better understanding of how the text is being matched. We can see that " accomplished artist" is matched by both the word-queries, and hence the results of cts:highlight seem different.


      Packer from HashiCorp is an open source provisioning tool, allowing for the automated creation of machine images, extending the ability to manage infrastructure to machine images. Packer supports a number of different image types including AWS, Azure, Docker, VirtualBox and VMWare.

      These powerful tools can be used together to deploy a MarkLogic Cluster to AWS using the MarkLogic CloudFormation Template, using a customized Amazon Machine Image (AMI). The MarkLogic CloudFormation Template is the preferred method recommended by MarkLogic for building out MarkLogic clusters within AWS. By default the MarkLogic CloudFormation Template uses the official MarkLogic AMIs.

      While this guide will cover a some portions of Terraform, the primary focus will be using Packer to customize an official MarkLogic AMI. For more detailed information on Terraform, we recommend reading Deploying MarkLogic to AWS with Terraform, which includes more detailed information on using Terraform, as well as the example files referenced later in this article.

      Setting Up Packer

      For the purpose of this example, I will assume that you have already installed the AWS CLI, with the correct credentials, and you have installed Packer.

      Packer Templates

      A Packer template is a JSON configuration file that is used to define the image that we want to build. Templates have a number of keys available for defining the machine image, but the most commonly used ones are builders, provisioners and post-processors.

      • builders are responsible for creating the images for various platforms.
      • provisioners is the section used to install and configure software running on machines before turning them into images.
      • post-processors are actions applied to the images after they are created.

      Creating a Template

      For our example, we are going to take the official MarkLogic AMI and apply some customizations before creating a new image.

      Defining Variables

      Variables help make the build more flexible, so we will utilize a seperate variables file, vars.json, to define parts of our build.

      "vpc_region": "us-east-1",
      "vpc_id": "vpc-06d3506111cea30d0",
      "vpc_public_sn_id": "subnet-03343e69ae5bed127",
      "vpc_public_sg_id": "sg-07693eb077acb8635",
      "ami_filter": "release-MarkLogic-10*",
      "ami_owner": "679593333241",
      "instance_type": "t3.large",
      "ssh_username": "ec2-user"

      Creating Our Template

      Now that we have some of the specific build details defined, we can create our template, base_ami.json. In this case we are going to use the build and provisioners keys in our build.

        "builders": [
            "type": "amazon-ebs",
            "region": "{{user `vpc_region`}}",
            "vpc_id": "{{user `vpc_id`}}",
            "subnet_id": "{{user `vpc_public_sn_id`}}",
            "associate_public_ip_address": true,
            "security_group_id": "{{user `vpc_public_sg_id`}}",
            "source_ami_filter": {
              "filters": {
              "virtualization-type": "hvm",
              "name": "{{user `ami_filter}}",
              "root-device-type": "ebs"
              "owners": ["{{user `ami_owner`}}"],
              "most_recent": true
            "instance_type": "{{user `instance_type`}}",
            "ssh_username": "{{user `ssh_username`}}",
            "ami_name": "ml-{{isotime \"2006-01-02-1504\"}}",
            "tags": {
              "Name": "ml-packer"
        "provisioners": [
            "type": "shell",
            "script": "./"
            "destination": "/tmp/",
            "source": "./marklogic.conf",
            "type": "file"
            "type": "shell",
            "inline": [ "sudo mv /tmp/marklogic.conf /etc/marklogic.conf" ]

      In the build section we have defined the network and security group configurations and the source AMI details. We have also defined the naming convention (ml-YYYY-MM-DD-TTTT) for the our new AMI with ami_name and added a tag, ml-packer. Both of those will make it easier to find our AMI when it is time to use it with Terraform.


      In our example, we are using the shell provisioner to execute a script against the machine, the file provisioner to copy the marklogic.conf file to the machine, and the shell provisioner to move the file to /etc/, all of which will be run prior to creating the image. There are also provisioners available for Ansible, Salt, Puppet, Chef, and PowerShell, among others.

      Provisioning Script

      For our custom image, we've determined that we need an additional piece of software installed, which we will do inside a script. We've named the script, and it is stored in the same directory as our packer template.

      echo "**** Starting ****"
      echo "Installing Git"
      sudo yum install -y git
      echo "**** Finishing ****"

      Executing Our Build

      Now that we've completed setting up our build, it's time to use packer to create the image.

      packer build -debug -var-file=vars.json base_ami.json

      Here you can see that we are telling packer to do a build using base_ami.json and referencing our variables file with the -var-file flag. We've also added the -debug flag which will disable parallelism and enable debug mode. In debug mode, packer will stop after each step and prompt you to hit Enter to go to the next step.

      The last part of the build output will print out the details of our new image:

      ==> Builds finished. The artifacts of successful builds are:
      --> amazon-ebs: AMIs were created:
      us-east-1: ami-0100....

      Terraform and the MarkLogic CloudFormation Template

      At this point we have our image and want to use it when deploying the MarkLogic CloudFormation Template. Unfortunately there is no simple way to do this, as the MarkLogic CloudFormation Template does not have the option to specify a custom AMI. Fortunately Terraform has some functions available that we can use to make the changes to the Template.


      First we want to add a couple entries to our existing Terraform variables file.

      variable "ami_tag" {
        type = string
        default = "ml-packer"

      variable "search_string" {
        type = string
        default = "ImageId: "

      The first variable, ami_tag is the tag we added to AMI when it was built. The second variable, search_string will be described in the Updates to Terraform Root Module section below.

      Data Source

      To retrieve the AMI, we need to define a data source. In this case it will be an aws_ami data source. We are going to call the file

      data "aws_ami" "ml_ami" {
        filter {
          name = "state"
          values = ["available"]

        filter {
          name = "tag:Name"
          values = ["${var.ami_tag}"]
        owners = ["self"]
        most_recent = true

      So we are filtering the available AMIs, only looking at ones that are owned by our own account (self), tagged with the value that we defined in our variables file, and then if more than one AMI is returned, using the most recent.

      Updates to Terraform Root Module

      Now we are ready to make a couple of updates to our Terraform root module file to integrate the new AMI into our deployment. In our last example, we used the MarkLogic CloudFormation template from its S3 bucket. For this deployment, we are going to use a local copy of the template, mlcluster-template.yaml.

      Replace the template_url line with the following line:

      template_body = replace(file("./mlcluster-template.yaml"), "/${var.search_string}.*/","${var.search_string} ${}")

      When we updated the variables in our Terraform variable file, we created the variable search_string. In the MarkLogic CloudFormation Template, the value for the Image ID is identified by the region and whether you are running the Essential Enterprise or Bring Your Own License version of MarkLogic Server. Here we are taking a regular expression, and using the replace function to manually update the line to reference the AMI we just created with Packer, which we have already retrieved already.

      Deploying with Terraform

      Now we are ready to run Terraform to deploy our cluster. First we want to double check that the template looks correct before we attempt to create the CloudFormation stack. The output of terraform plan will show the CloudFormation template that will be deployed. Check the output to make sure that the value for ImageId shows our desired AMI

      Once we have confirmed our new AMI is being referenced, we can then run terraform apply to create a new stack using the template. This can be validated by opening a command line on one of the new hosts, and checking to see if Git is installed, and if /etc/marklogic.conf exists:

      Wrapping Up

      At this point, we have now customized the official MarkLogic AMI to create our own AMI using Packer. We have then used Terraform to update the MarkLogic CloudFormation Template and to deploy a CloudFormation stack based on the updated template.


      In the Scalability, Availabilty & Failover Guide, the node communication section describes a quorum as >50% of the nodes in a cluster.

      Is it possible for a database to be available for reads and writes, even if a quorum of nodes is not available in the cluster?

      The answer is yes, there are configurations and sequences of events that can lead to forests remaining online when there are fewer than 50% of the hosts being online.


      If a single forest in a database is not available, the database is not be accessible. It is also true that as long as all of a database's forests are available in the cluster, the database will be available for reads and writes regardless of any quorum issues.

      Of course, the Security database must also be available in the cluster for the cluster to function.

      Forest Availability: Simple Case

      In the simplest case, if you have a forest that is not configured with either local disk failover or shared disk failover and as long as the forest's host is online and exists in the cluster, the forest will be available regardless of any quorum issues.

      To explain this case in more detail: if we have a 3-node MarkLogic cluster containing 3 hosts (let's call them host-a, host-b and host-c); if we were to then initialize host-a as the primary host (so this is the first host is set up in the cluster and is the host containing the master security database) and we then join host-b and host-c to host-a to complete the cluster. 

      Shortly after that, if we shut both the joiner hosts (host-b and host-c) down, so only host host-a remained online, we would see a chain of messages in the primary host's ErrorLog that indicated there was no longer quorum within the cluster:

      2020-05-21 01:19:14.632 Info: Detected quorum (3 online, 1 suspect, 0 offline)
      2020-05-21 01:19:18.570 Warning: Detected suspect quorum (3 online, 2 suspect, 0 offline)
      2020-05-21 01:19:29.715 Info: Disconnecting from domestic host because it has not responded for 30 seconds.
      2020-05-21 01:19:29.715 Info: Disconnected from domestic host
      2020-05-21 01:19:29.715 Info: Detected suspect quorum (2 online, 1 suspect, 1 offline)
      2020-05-21 01:19:33.668 Info: Disconnecting from domestic host because it has not responded for 30 seconds.
      2020-05-21 01:19:33.668 Info: Disconnected from domestic host
      2020-05-21 01:19:33.668 Warning: Detected no quorum (1 online, 0 suspect, 2 offline)

      Under these circumstances, we would be able to access the host's admin GUI on port 8001 and it would respond without issue.  We would be able to access Query Console on that host on port 8000 and would be able to inspect the primary host's databases.  We would also be able to access the Monitoring History on port 8002 - all directly from the primary host.

      In this scenario, because the primary host remains online and the joining hosts are offline; and because we have not yet set up failover anywhere, there is no requirement for quorum, so host-a remains accessible.

      If host-a also happened to have a database with forests that only resided on that host, these would be available for queries at this time.  However, this is a fairly limited use case because in general, if you have a 3-node cluster, you would have a database whose forests reside on all three hosts in the cluster with failover forests configured on alternating hosts. 

      As soon as you do this, if you lose one host and you don't have failover configured, the database would now become unavailable (due to a crucial forest being offline) and if you had failover forests configured, you would still be able to access the database on the remaining two hosts.

      However, if you then shut down another host, you would lose quorum (which is a requirement for failover).

      Forest Availability: Local Disk Failover

      For forests configured for local disk failover, the sequence of events is important:

      In response to a host failure that makes an "open" forest inaccessible, the forest will failover to the configured forest replica as long as a quorum exists and the configured replica forest was in the "sync replicating" state. In this case, the configured replica forest will transition to the "open" state; the configured replica forest becomes the acting master forest and is available to the database for both reads and writes.

      Additionally, an "open" forest will not go offline in response to another host being evicted from the cluster.

      However, once cluster quorum is lost, forest failovers will no longer occur.


      Depending on how your forests are distributed in the cluster and depending of the order of host failures, it is possible that a database can remain online even when there is no longer a quorum of hosts in the cluster.

      Of course, databases with many forests spread across many hosts typically can't stay online if you lose quorum because some forest(s) will become unavailable.


      Even though it is possible to have a functioning cluster with less than a quorum of hosts online, you should not architect your high availability solution to depend on it.


      This article discusses what happens when you backup or restore your database after a local disk failover event on one of the database forests.


      MarkLogic Server provides high availability in the event of a data node failure. Data node failures can include operating system crashes, MarkLogic Server restarts, power failures, or persistent system failures; for example hardware failures. With Forest level failover enabled and configured, a machine that hosts a forest can go down and the MarkLogic Server cluster automatically recovers from the outage and keep continuing to process queries without any immediate action needed by an administrator. In MarkLogic Server, if a forest becomes unavailable then the entire database to which this forest is attached becomes unavailable for further query operations. Without failover, such a failure requires a manual intervention (such as administrator) to either reconfigure the forest to another host or to remove this forest from the configuration (cluster). With failover, you can configure the forest to automatically switch to a replica forest on a different host. MarkLogic Server Failover provides for high availability and maintains data and transactional integrity in the event of a data node failure.

      The failover scenarios are well documented on our developer web site.

      Local Disk Failover

      You to configure a forest on another host to serve as a replica forest which will take over when a primary master forest's host goes offline. Local-disk failover allows you to create one or more replica forests for each primary forest. Replica forests contain the exact same data as the primary forest and are kept consistent transactionally. 

      It is helpful to use the following terms to refer to the forest configurations and states:

      • Configured Master is the forest which is originally configured as the primary forest.
      • Configured Replica is a forest on another host that is configured as a replica forest of the primary. 
      • Acting Master is the forest that is server as the master forest, regardless of the configuration.
      • Acting Replica is the forest that is server as the replica forest, regardless of the configuration.

      Database Backup when a forest is failed over

      If you attempt to take a Database back up or perform a database restore when One of the forests of the database had failed over to the replica (i.e. Configured Replica is serving as Acting Master), it may result in XDMP-FORESTNOTOPEN or XDMP-HOSTDOWN errors.

      When a database backup takes place, by default, everything associated with database gets backed up. You can also choose to backup any individual forests (only the forests selected while configuring backup are backed up). T

      Replica Forest will only be backed up when the 'Include replica forests' are enabled.  If you have not configured the backup to include replica forests, then the replica forests will not be backed up even if it is the acting master. If the Configured Master is also not available, then neither forest will be backed up. In this circumstance, you may see a message in the error logs similar to "Warning: Not backing up database test because first forest master is not available, and replica backups aren't enabled."

      Restore when a forest is failed over

      Restore's will fail if executed when a forest is failed over (i.e. Configured Replica is serving as Acting Master). In this circumstance, you may see a message in the error logs similar to "Operation failed with error message. Check server logs." or "XDMP:HOSTDOWN".

      How to detect if a forest is failed over

      In the Admin UI:

      1. Click the Forests icon in the left tree menu;
      2. Click the Summary tab;
      3. You see the configured replica in open state; (This indicates that the Configured Replica is serving as Acting Master).

      At the time of the failover event, you may see messages in the Error Log similar to:
      2013-10-03 12:49:53.873 Info: Disconnecting from domestic host in cluster 16599165797432706248 because it has not responded for 30 seconds.
      2013-10-03 12:49:53.873 Info: Disconnected from host
      2013-10-03 12:49:53.873 Info: Unmounted forest test_P
      2013-10-03 12:49:53.875 Info: Forest test_R assuming the role of master with new precise time 13808297938747190
      2013-10-03 12:49:53.875 Debug: Recovering undo on forest test_R
      2013-10-03 12:49:53.875 Debug: Recovered undo at endTimestamp 13807844927734200 minQueryTimestamp 0 on forest test_R

      Revert back from the failover state:

      When the configured master is the acting replica, this is considered the "failover state".  In order to revert back, you must either restart the acting master forest or restart the host in which the acting master forest is locally mounted. After restarting, the forest will automatically revert to Configured Master if it's host is online. To check the status of the forests, see the Forests Summary tab in the Admin Interface. 


      For backup and restore to work correctly, clusters configured with local disk failover must have no forests in a failed over state. If a cluster is configured with local disk failover, and if some of its forests are failed over to their local disk replicas, the conditions causing the fail over must be resolved, and the cluster must be returned to the original forest configuration before backup and restore operations may resume.


      From the documentation:

      Queries on a Replica database must run at a timestamp that lags the current cluster commit timestamp due to replication lag. Each forest in a Replica database maintains a special timestamp, called a Non-blocking Timestamp, that indicates the most current time at which it has complete state to answer a query. As the Replica forest receives journal frames from its Master, it acknowledges receipt of each frame and advances its nonblocking timestamp to ensure that queries on the local Replica run at an appropriate timestamp. Replication lag is the difference between the current time on the Master and the time at which the oldest unacknowledged journal frame was queued to be sent to the Replica.

      To read more:


      Consider the following customer scenario:

      • The storage the database resides on at one site fails.
      • This requires the customer to run for a period of time on a single site.
      • The storage / MarkLogic server are recovered at the site where the failure occurred.
      • The customer needs to re-establish replication between the two sites


      Q: Should we tune the lag limit to suit our application?

      AWe have found in our own performance testing that increasing the lag limit beyond the default is typically not helpful.

      When the master has a sustained rate of updates, a large lag limit causes it to run quickly ahead of the replica, then stall for an extended period of time until the replica catches up. This pattern repeats over and over and gives inconsistent performance on the master.

      A smaller lag limit causes the master to suspend updates more frequently but for shorter periods of time, resulting in more consistent perceived performance.

      Q: Is there any option to restore the replica database to a point in time from a backup of the master database & re-initiate replication from that point onwards?

      A: It's fine to restore a backup to the failed system when it comes back online and before configuring replication in the reverse direction.

      Q: Is there a limit to how old a backup of the replica database can be (e.g. can a replica be restored from months back in comparison to the master) and will it still sync back to the master without issue? And does this depend on what journal data is available?

      A: There is no limit to how old a backup can be; the system will calculate all the deltas and apply them.

      Q: Are there any documented API built-ins for any of these things?

      A: Indeed; all the replication information is available through a call to xdmp:forest-status()


      For further information:

      Q: Can you also advise if the replication lag limit mentioned in section 1.2.5 and the related possibility of transactions stalling on the master database applies during the bulk replication phase?

      A: As long as the replica's forests are in "open replica" state, the replica will respond to queries at any commit timestamp it is able to support irrespective of whether replication is lagged.

      A new feature in MarkLogic 5 is an application server setting for multi-version concurrency control (by default this is set to contemporaneous - meaning it will run from the latest timestamp that any query has committed - irrespective of whether there are still transactions in-flight).

      Conversely, if nonblocking is chosen (i.e. if you create an application server to query a replica database and you set multi-version concurrency control to nonblocking), the server will choose the last timestamp where all pending transactions are known to have successfully committed.

      If you wish to evaluate a query against a replica database you can use xdmp:database-nonblocking-timestamp() to determine the most current query timestamp that will not block.


      Database Replication replicates fragments/documents from a source database to a target database. You may see different database sizes (even when active fragment counts are then same) between Master and Replica Databases. This article provides overview of variables and reasons behind such observation.

      Database Replication:

      Database Replication operates at the forest level by copying journal frames from a forest in the Master database and replaying them on a corresponding forest in the foreign replica database. In other words, this means that when Journal frames are replayed in the replica database, the same group of documents in a single stand of the master database, does not necessarily reside in the same stand on the replica database - i.e. the distribution of fragments within stands are different between the master and replicas. 

      Also, Note that Master and Replica forests can be distributed differently across hosts in each cluster. Even when they are distributed identically (Master DB forest name to Replica DB forest name) you could still see a different number stand between them.

      Database Size, Deleted Fragment and Merge:

      Current Database Size depends on number of factors like number of documents, index, deleted fragments in Stand etc. Deleted Fragments in any stand itself depends on Merge Policy, Background Merge process, Processing Cycle available, Linux Memory Config, Memory Usage at any given time, and application usage pattern.


      Master Cluster and Replica Cluster are separate entities. Although connected, they operate independently. Replica Database on target cluster provides data consistency. However how data can be spread across different stands than the corresponding master, including the retention of deleted fragments, will differ between Master and Replica Cluster. Hence you may see different sizes between Master and Replica Databases, even where the active fragments are the same.

      Further Reading


      If your MarkLogic Server has it's logging level set to "Debug", it's common to see a chain of 'Detecting' and 'Detected' messages that look like this in your ErrorLogs:

      2015-01-27 11:11:04.407 Debug: Detected indexes for database Documents: ss, fp, fcs, fds, few, fep, sln
      2015-01-27 11:11:04.407 Debug: Detecting compatibility for database Documents
      2015-01-27 11:11:04.407 Debug: Detected compatibility for database Documents

      This message will appear immediately after forests are unmounted and subsequently remounted by MarkLogic Server.

      What would cause the forests to be unmounted and remounted

      • Heavy network activity leading to a cluster (XDQP) "Heartbeat" timeout
      • Changes made to forest configuration or indexes
      • Any incident that may cause a "Hung" message

      What are "Hung" messages?

      Whenever you see a "Hung" message it's very often indicative of a loss of connection to the IO subsystem (especially the case when forests are mounted on network attached storage rather than local disk). Hung messages are explained in a little more detail in this Knowledgebase article:

      What do the "Detected" messages mean and what can I do about them?

      Whenever you see a group of "Detecting" messages:

      2015-01-14 13:06:26.016 Debug: Detecting indexes for database XYZ

      There was an event where MarkLogic chose to (or was required to) attempt to unmount and remount forests (and the event may also be evident in your ErrorLogs).

      The detecting index message will occur soon after a remount, indicating that MarkLogic Server is examining forest data to check whether any reindexing work is required for all databases available to the node which have Forests attached:

      2015-01-14 13:06:26.687 Debug: Detected indexes for database XYZ: ss, wp, fp, fcs, fds, ewp, evp, few, fep

      The line immediately below indicates that the scan has been completed and the database has been identified as having been configured with a number of indexes. For the line above, these are:

      stemmed searches
      word positions
      fast phrase searches
      fast case sensitive searches
      fast diacritic sensitive searches
      element word positions
      element value positions
      fast element word searches
      fast element phrase searches

      From this list, we are able to determine which indexes were detected.  These messages will occur after every remount if you have index detection set to automatic in the database configuration.

      Every time the forest is remounted, in addition to a recovery process (where the Journals are scanned to ensure that all transactions logged were safely committed to on-disk stands), there are a number of other tests the server will do. These are configured with three options at database level:

      • format compatibility
      • index detection
      • expunge locks

      By default, these three settings are configured with the "automatic" setting (in MarkLogic 7), so if you have logging set to "Debug" level, you'll know that these options are being worked through on remount:

      2015-01-14 13:06:26.016 Debug: Detecting indexes for database XYZ (represents the task for "automatic" index detection where the reindexer checks for configuration changes)
      2015-01-14 13:06:26.687 Debug: Detecting compatibility for database XYZ (represents the task for "automatic" format compatibility where the on-disk stand format is detected)

      These default values may change in accross releases of MarkLogic Server. In MarkLogic 8, expunge locks is set to none but the other two are still set to automatic.

      Can these values be changed safely and what happens if I change these?

      Unmounting / remounting times can be made much shorter by configuring these settings away from automatic but there are some caveats involved; if you need to upgrade to a future release of the product, it's likely that the on-disk stand format may change (it's still 5.0 even when MarkLogic 8 is released) and so setting format compatibility to 5.0 should cause the "Detecting compatibility" messages to disappear and speed up remount times.

      The same is true for disabling index detection but it's important to note that changing index settings on the database will no longer cause the reindexer to perform any checks on remount; in this case you would need to enable this for changes to database index settings to be reindexed.


      This article will provide steps to debug applications using the Alerting API that are not triggering an alert.


      1) Check that all required components are present in the database where alerting is setup: config, actions, rules.   Run the attached script 'getalertconfigs.xqy' through the Query Console and review the output.  

      2) As documented in our Search Developer's Guide, Test the alert manually with alert:invoke-matching-actions(). 


            <doc>hello world</doc>, <options/>)

      3) Use the rule's query to test against the database to check that the expected documents are returned by the query.

      Take the query text from the rule and run it through Query Console using a cts:search() on the database.  This will confirm whether the expected documents are a positive match.  If the documents are returned and no alert is triggered, then further debugging will be needed on the configuration or the query may need to be modified.


      Division operations involving integer or long datatypes may generate XDMP-DECOVRFLW in MarkLogic 7. This is the expected behavior but it may not be obvious upon initial inspection.  

      For example, similar queries with similar but different input values executed in Query Console on Linux/Mac machine running MarkLogic 7 gives the following results

      1. This query returns correct results

      let $estimate := xs:unsignedLong("220")

      let $total := xs:unsignedLong("1600")

      return $estimate div $total * 100

      ==> 13.75

      2. This query returns the XDMP-DECOVRFLOW Error


      let $estimate := xs:unsignedLong("227")

      let $total := xs:unsignedLong("1661")

      return $estimate div $total * 100

      ==> ERROR : XDMP-DECOVRFLW: (err:FOAR0002)


      The following defines relevant behaviors in MarkLogic 7 and previous releases.

      • In MarkLogic 7, if all the operands involved in div operations are integer, long or integer sub-types in XML, then the resulting value of the div operation are stored as xs:decimal.
      • In versions previous to MarkLogic 7, if an xs:decimal value is large and occupies all digits then it was implicitly cast into an xs:double for further operations - i.e. beginning with MarkLogic, implict casting no longer occurs in this situation .
      • xs:decimal can accomodate 18 digits as a datatype.
      • In MarkLogic 7 on Linux & Mac, xs:decimal can occupy all digits depending upon actual value ( 227 div 1661 = 0.1366646598434677905 ), all 18 digits occupied in xs:decimal
      • MarkLogic 7 on Windows does not perform division with full decimal precision ( 227 div 1661 produces 0.136664659843468 ); as a result, not all 18 digits occupied in xs:decimal
      • MarkLogic 7 will generates Overflow Exception : FOAR0002, when an operation is performed on an xs:decimal that is already at full decimal precision

      In the example above, multiplying the result with 100 gives an error in Linux/Mac, while its OK on Windows.


      We recommend xs:double be used for all division related operations in order to explicitly cast resulting value to larger data-type.

      For example: These will return results

      xs:double($estimate) div $total * 100

      $estimate div $total * xs:double(100)






      There are options 'maintain last modified' and 'maintain directory last modified' on the Admin UI for a database, which when turned on add properties to every document inserted in the database.  There may be a need to remove all the property fragments of all the documents in the database when the properties no longer need to be retained.


      Turning these options off for a database ensure that properties will not be created for new documents. However, existing document properties will not be removed by turning these settings off.


      To delete existing document properties, the following query can be used:




      Please make sure that 'maintain last modified' and 'maintain directory last modified' options are turned off for the database, so that the property fragment does not get recreated for the document.




      Terraform from HashiCorp is a deployment tool that many organizations use to manage their infrastructure as code. It is platform agnostic, allowing for the deployment and configuration of on-site physical infrastructure, as well as cloud infrastructure such as AWS, Azure, VSphere and more.

      Terraform uses the Hashicorp Configuration Language (HCL) to allow for concise descriptions of infrastructure. HCL is JSON compatible language, and was designed to be both human and machine friendly.

      This powerful tool can be used to deploy a MarkLogic Cluster to AWS using the MarkLogic CloudFormation Template. The MarkLogic CloudFormation Template is the preferred method recommended by MarkLogic for building out MarkLogic clusters within AWS.

      Setting Up Terraform

      For the purpose of this example, I will assume that you have already installed Terraform, the AWS CLI and you have configured the credentials. You will also need to have a working directory that has been initialized using terraform init.

      Terraform Providers

      Terraform uses Providers to provide access to different resources. The Provider is responsible for understanding API interactions and exposing resources. The AWS Provider is used to provide access to AWS resources.

      Terraform Resources

      Resources are the most important part of the Terraform language. Resource blocks describe one or more infrastructure objects, like compute instances and virtual networks.

      The aws_cloudformation_stack resource, allows Terraform to create a stack from a CloudFormation template.

      Choosing a Template

      MarkLogic provides two templates for creating a managed cluster in AWS.

      • MarkLogic cluster in new VPC
      • MarkLogic cluster in an existing VPC
      I've chosen to deploy my cluster to an VPC. When deploying to an existing VPC, you will need to gather the VPC ID, as well as the Subnet IDs for the public and private subnets.

      Defining Variables

      The MarkLogic CF Template takes a number of input variables, including the region, availability zones, instance types, EC2 keys, encryption keys, licenses and more. We have to define our variables so they can be used as part of the resource.

      Variables in HCL can be declared in a separate file, which allows for deployment flexibility. For instance, you can create a Development resource and a Production resource, but using different variable files.

      Here is a snippet from our variables file:

      variable "cloudform_resource_name" {
      type = string
      default = "Dev-Cluster-CF"
      variable "stack_name" {
      type = string
      default = "Dev-Cluster"
      variable "ml_version" {
      type = string
      default = "10.0-4"
      variable "availability_zone_names" {
      type = list(string)
      default = ["us-east-1a","us-east-1b","us-east-1c"]

      In the snippet above, you'll notice that we've defined the availability_zone_names as a list. The MarkLogic CloudFormation template won't take a list as an input, so later we will join the list items into a string for the template to use.

      This also applies to any of the other lists defined in the variable files.

      Using the CloudFormation Resource

      So now we need to define the resource in HCL, that will allow us to deploy a CloudFormation template to create a new stack.

      The first thing we need to do, is tell Terraform which provider we will be using, defining some default options:

          provider "aws" {
          profile = "default"
          #access_key = var.access_key
          secret_key = var.secret_key
          region = var.aws_region

      Next, we need to define the `aws_cloudformation_stack` configuration options, setting the variables that will be passed in when the stack is created:

          resource "aws_cloudformation_stack" "marklogic" {
          name = var.cloudform_resource_name
          capabilities = ["CAPABILITY_IAM"]
          parameters = {
          IAMRole = var.iam_role
          AdminUser = var.ml_admin_user
          AdminPass = var.ml_admin_password
          Licensee = "My User - Development"
          LicenseKey = "B581-REST-OF-LICENSE-KEY"
          VolumeSize = var.volume_size
          VolumeType = var.volume_type
          VolumeEncryption = var.volume_encryption
          VolumeEncryptionKey = var.volume_encryption_key
          InstanceType = var.instance_type
          SpotPrice = var.spot_price
          KeyName = var.secret_key
          AZ = join(",","${var.avail_zone}")
          LogSNS = var.log_sns
          NumberOfZones = var.number_of_zones
          NodesPerZone = var.nodes_per_zone
          VPC = var.vpc_id
          PublicSubnets = join(",","${var.public_subnets}")
          PrivateSubnets = join(",","${var.private_subnets}")
          template_url = "${var.template_base_url}${var.ml_version}/${var.template_file_name}"

      Deploying the Cluster

      Now that we have defined our variables and our resources, it's time for the actual deployment.

      $> terraform apply

      This will show us the work that Terraform is going to attempt to perform, along with the settings that have been defined so far.

      Once we confirm that things look correct, we can go ahead and apply the resource.

      Now we can check the AWS Console to see our stack

      And we can also use the ELB to login to the Admin UI

      Wrapping Up

      We have now deployed a 3 node cluster to an existing VPC using Terraform. The cluster is now ready to have our Data Hub, or other application installed.

      Deploying MarkLogic in AWS with Ansible


      Ansible, owned by Red Hat, is an open source provisioning, configuration and application deployment tool that many organizations use to manage their infrastructure as code. Unlike options such as Chef and Puppet, it is agentless, utilizing SSH to communicate between servers. Ansible also does not need a central host for orchestration, it can run from nearly any server, desktop or laptop. It supports many different platforms and services allowing for the deployment and configuration of on-site physical infrastructure, as well as cloud and virtual infrastructure such as AWS, Azure, VSphere, and more.

      Ansible uses YAML as its configuration management language, making it easier to read than other formats. Ansible also uses Jinja2 for templating to enable dynamic expressions and access to variables.

      Ansible is a flexible tool can be used to deploy a MarkLogic Cluster to AWS using the MarkLogic CloudFormation Template. The MarkLogic CloudFormation Template is the preferred method recommended by MarkLogic for building out MarkLogic clusters within AWS.

      Setting Up Ansible

      For the purpose of this example, I will assume that you have already installed Ansible, the AWS CLI, and the necessary python packages needed for Ansible to talk to AWS. If you need some help getting started, Free Code Camp has a good tutorial on setting up Ansible with AWS.

      Inventory Files

      Ansible uses Inventory files to help determine which servers to perform work on. They can also be used to customize settings to indiviual servers or groups of servers. For our example, we have setup our local system with all the prerequisites, so we need to tell Ansible how to treat the local connections. For this demonstration, here is my inventory, which I've named hosts

      localhost              ansible_connection=local

      Ansible Modules

      Ansible modules are discreet units of code that are executed on a target. The target can be the local system, or a remote node. The modules can be executed from the command line, as an ad-hoc command, or as part of a playbook.

      Ansible Playbooks

      Playbooks are Ansible's configuration, deployment and orchestration language. Playbooks are how the power of Ansible, and its modules is extended from basic configuration, or manangment, all the way to complex, multi-tier infrastructure deployments.

      Chosing a Template

      MarkLogic provides two templates for creating a managed cluster in AWS.

      1. MarkLogic cluster in new VPC
      2. MarkLogic cluster in an existing VPC

      I've chosen to deploy my cluster to an VPC. When deploying to an existing VPC, you will need to gather the VPC ID, as well as the Subnet IDs for the public and private subnets.

      Defining Variables

      The MarkLogic CF Template takes a number of input variables, including the region, availability zones, instance types, EC2 keys, encryption keys, licenses and more. We have to define our variables so they can be used as part of the resource.

      Variables in Ansible can be declared in a separate file, which allows for deployment flexibility.

      Here is a snippet from our variables file:

      # vars file for marklogic template and version
      ml_version: '10.0-latest'
      template_file_name: 'mlcluster.template'
      template_base_url: ''


      # CF Template Deployment Variables
      aws_region: 'us-east-1'
      stack_name: 'Dev-Cluster-An3'
      IAMRole: 'MarkLogic'
      AdminUser: 'admin'

      Using the CloudFormation Module

      So now we need to create our playbook, and choose the module that will allow us to deploy a CloudFormation template to create a new stack. The cloudformation module allows us to create a CloudFormation stack.

      Next, we need to define the cloudformation configuration options, setting the variables that will be passed in when the stack is created.

      # Use a template from a URL
      - name: Ansible Test
        hosts: local


          - ml-cluster-vars.yml


          - cloudformation:
              stack_name: "{{ stack_name }}"
              state: "present"
              region: "{{ aws_region }}"
              capabilities: "CAPABILITY_IAM"
              disable_rollback: true
              template_url: "{{ template_base_url+ml_version+'/'+ template_file_name }}"
                IAMRole: "{{ IAMRole }}"
                AdminUser: "{{ AdminUser }}"
                AdminPass: "{{ AdminPass }}"
                Licensee: "{{ Licensee }}"
                LicenseKey: "{{ LicenseKey }}"
                KeyName: "{{ KeyName }}"
                VolumeSize: "{{ VolumeSize }}"
                VolumeType: "{{ VolumeType }}"
                VolumeEncryption: "{{ VolumeEncryption }}"
                VolumeEncryptionKey: "{{ VolumeEncryptionKey }}"
                InstanceType: "{{ InstanceType }}"
                SpotPrice: "{{ SpotPrice }}"
                AZ: "{{ AZ | join(', ') }}"
                LogSNS: "{{ LogSNS }}"
                NumberOfZones: "{{ NumberOfZones }}"
                NodesPerZone: "{{ NodesPerZone }}"
                VPC: "{{ VPC }}"
                PrivateSubnets: "{{ PrivateSubnets | join(', ') }}"
                PublicSubnets: "{{ PublicSubnets | join(', ') }}"
                Stack: "ansible-test"

      Deploying the cluster

      Now that we have defined our variables created our playbook, it's time for the actual deployment.

      ansible-playbook -i hosts ml-cluster-playbook.yml -vvv

      The -i option allows us to reference the inventory file we created. As the playbook runs, it will output as it starts and finishes tasks in the playbook.

      PLAY [Ansible Test] ************************************************************************************************************


      TASK [Gathering Facts] *********************************************************************************************************
      ok: [localhost]


      TASK [cloudformation] **********************************************************************************************************
      changed: [localhost]

      When the playbook finishes running, it will print out a recap which shows the overall results of the play.

      PLAY RECAP *********************************************************************************************************************
      localhost                  : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

      This recap tells us that 2 tasks ran successfully, resulted in 1 change, and no failed tasks, which is our sign that things worked.

      If we want to see more information as the playbook runs we can add one of the verbose flags (-vor -vvv) to provide more information about the parameters the script is running, and the results.

      Now we can check the AWS Console to see our stack:

      And we can also use the ELB to login to the Admin UI

      Wrapping Up

      We have now deployed a 3 node cluster to an existing VPC using Ansible. The cluster is now ready to have our Data Hub, or other application installed.  We can now use the git module to get our application code, and deploy our code using ml-gradle.

      Deploying REST API Search/Query Options in DHS

      REST API Query Options Overview

      You can use persistent or dynamic query options to customize your queries. MarkLogic Server comes configured with default query options. You can extend and modify the default options using /config/query/default.

      REST API Search options are defined per Group and App Server. When using ml-gradle, they are typically deployed by putting the files defining the options in the src/main/ml-modules/options directory of your gradle project. By default the options will be deployed to the Group/App Server that gradle is pointing at in the data-hub-MODULES database under /[GroupName]/[App Server]/rest-api/options/[name of file].

      REST API Query Options in DHS

      In DHS, query options are created under the Evaluator Group for the data-hub-FINAL app server. One side effect of the permissions for DHS, is that users will not be able to see the files after they are deployed. The default permissions for the options file are rest-reader-internal and rest-admin-internal, which is not provided to the data-hub roles.

      To check that the search options have been deployed you can use the following curl command:

      • curl --anyauth --user username:password -k -X GET -H "Content-type: application/xml" https://myService.[a or z][myOptions]

      If the options exist, you will get results. If the options do not exist, then you will get a 400 return, with a REST-INVALIDPARAM error.

      Deploying Options to Other App Servers and Groups

      Deploying Options to the Staging App Server

      Using src/main/ml-modules/options will only deploy the options to the Final app server. If you want to deploy the options to the Staging app server, then you will need to define the options under src/main/ml-modules/root/Evaluator/data-hub-STAGING/rest-api/options

      Deploying Options to Other Groups

      If the cluster is configured for auto-scaling, the dynamic e-nodes will belong to either the Analyzer, Curator or Operator group, so the search options will not be available for the dynamic e-nodes.

      To set the options for the app servers in other groups, you will also use src/main/ml-modules/root/[Group Name]/[App Server Name]/rest-api/options

      • src/main/ml-modules/root/Analyzer/data-hub-FINAL/rest-api/options
      • src/main/ml-modules/root/Operator/data-hub-FINAL/rest-api/options
      • ...etc

      When deploying the options files in this way, they get different permissions than when they are deployed vi ml-modules/options. The permissions are rest-extension-user, data-hub-module-reader, data-hub-module-writer, tde-admin, and tde-view, but the permission differences do not appear to make a difference in functionality.

      Deployment Failures

      When options are deployed with the rest of the non-REST modules in ml-modules/root/..., it uses the /v1/documents endpoint, which allows you to set the file permissions.

      When options are deployed from ml-modules/options, it uses the /v1/config/query endpoint, which does not allow you to set the file permissions.

      One effect of this difference is if you attempt to deploy the search options using both ml-modules/options and src/main/ml-modules/root/Evaluator/data-hub-FINAL/rest-api/options you will encounter a SEC-PERMDENIED error and the deployment will fail. If you encounter this error, ensure you aren't attempting to deploy the options in both locations.


      This KB article lists some available tools for continuous integration and automatically deploying the MarkLogic Server

      Deployment Options

      ml-gradle is a gradle plugin that can be used for configuration and application deployments. Application deployments are maintained as projects, which can deployed to any environment - Development, QA, Production, etc.

      The MarkLogic Configuration Management API is a RESTful API that allows retrieving, generating, and applying configurations for MarkLogic clusters, databases, and application servers.

      The MarkLogic The Management API is a REST-based API that allows you to administer MarkLogic Server and access MarkLogic Server instrumentation with no provisioning or set-up. You can use the API to perform administrative tasks such as initializing or extending a cluster; creating databases, forests, and App Servers; and managing tiered storage partitions. The API also provides the ability to easily capture detailed information on MarkLogic Server objects and processes, such as hosts, databases, forests, App Servers, groups, transactions, and requests from a wide variety of tools.

      The MarkLogic Admin APIs provide a flexible toolkit for creating new and managing existing configurations of MarkLogic Server.

      Integration Testing

      MarkLogic Unit Test is a testing component that was originally part of the Roxy project. This component enables you to build unit tests that are written in and can test against both XQuery and Server-side JavaScript.

      Implementation Specific Tools

      CloudFormation Templates

      MarkLogic CloudFormation templates enable you to launch clusters with an Elastic Load Balancer, Elastic Block Storage, Auto Scaling Group, and so on. Your cluster can be in either one Availability Zone or three Availability Zones. Multiple nodes can be placed within each Availability Zone. You can choose whether to deploy to an existing VPC, or a new VPC. The templates can also be used with tools like Terraform and Ansible


      The MarkLogic Python API aims to provide complete coverage of the capabilities in the MarkLogic REST API in idiomatic Python.


      Jenkins is often used with MarkLogic Server for building deployable artifacts, staging build artifacts, running automated tests, and deploying said artifacts. Jenkins has great REST endpoints that make it easy to get / put job configurations, and enable / disable jobs from scripts.

      Jenkins provides a driver to the continuous integration / continuous delivery process that can integrate with other tools. In combination with ml-gradle, it can be used to run deploy module/unit test on code check-in.

      One pipeline example used with Jenkins is to:

      1. Pull the code from Git
      2. Deploy to DEV with ml-gradle
      3. Run MarkLogic Unit Test
      4. Email a report of the success/failure
      5. Kick off job to deploy to another environment

      Also noted that the most important best practice here would be to make sure Jenkins runs primarily off of a host other than a MarkLogic host.


      This article will help MarkLogic Administrators to monitor the health of their MarkLogic cluster. By studying the attached scripts, you will learn how to find out which hosts are down and which forests have failed over, enabling you to take the necessary recovery actions.

      Initial Setup

      On a separate Linux host (not a member of the cluster), download the file attachments from this article, making sure that they all reside within the same directory.

      Here is a general description of each file:

      cluster-name.conf - Example configuration file used by script. Configures information for monitoring one ML cluster. - A very simple, low-load check that all the nodes of a cluster are up and running. - A more detailed check for essential cluster functionality with alerting (paging and/or emails to DBAs) if warranted. This script relies on at least one external XQuery file (mon-report-failed-over-forests.xqy) and makes use of the REST MGMT API as well as REST XQuery requests.

      mon-report-failed-over-forests.xqy - External XQuery file used by


      Preparing the CONF File for Use on Your Cluster

      Before running the scripts, the cluster-name.conf needs to be customized for your specific cluster. Start by changing the file name to match the name of your cluster, e.g.,

      $ mv cluster-name.conf some-other-name.conf

      Where "some-other-name" is the actual name of the cluster, or of the application that is hosted on that cluster.

      Next, you will need to customize some of the internal variables inside the CONF file itself. Here is the contents of the cluster-name.conf file, as downloaded:

      # MarkLogic Credentials for the REST Management port - 8002
      # MarkLogic Credentials for the XQuery eval port - 8000

      ---------  end of listing ---------

      For CLUSTER_NAME, provide the cluster-name listed in the cluster's /var/log/MarkLogic/clusters.xml file.

      For CLUSTER_NODES, write in the host-names for each node in your cluster.

      For USER_PW_MGMT, provide the user-name and password for the REST MANAGEMENT user, the format is name:password.

      For USER_PW_XQ, provide the user-name and password for the user who will execute the XQuery scripts, the format is name:password.

      The UNIX_USER is a local Unix username with the correct rwx access rights for this directory.

      The PAGE_ADDRESSES & MAIL_ADDRESSES are alert email addresses who will be notified whenever there is a failover event.


      The script was created with the idea it would be run repeatedly at a certain interval to keep tabs on system health. For example, it can be configured to be invoked with a cron job. A frequency of 5 to 120 minutes is a good candidate range. Ten minutes is a good time if you would like to be woken up (on average) within 5 minutes of a failover event.

      Setting up SSH Passwordless Login

      In monitoring script, section (6) FOREST STATUS CHANGE, requires ssh access to the cluster hosts. That is because this section greps through MarkLogic server ErrorLogs. To enable this part of the script to run without prompting the user, "ssh passwordless login" should be setup between the monitoring host and all the cluster hosts.There are many examples of how to do this on the internet, for example: Alternatively, this monitoring section can be commented out.

      Also regarding section (6), the “grep” command is setup up to grep the latest 10 minutes from the ErrorLog. If this script is configured to be run less often then every 10 minutes, the “grep” command line should be adapted to cover the desired period between script runs.

      Example Usage

      You are now ready to execute the failover monitoring scripts! Here is how you would execute them:

      $ ./ some-other-name.conf MY-CLUSTER-NAME

      $ ./ some-other-name.conf

      [where "some-other-name" and MY-CLUSTER-NAME are your actual CONF and cluster-name, as described above]

      Monitoring Multiple Clusters

      So, given a monitoring machine with a directory of cluster configuration files in the style of cluster-name.conf, those configuration files could be iterated through to monitor a suite of clusters from a single monitoring machine. It should be fairly easy to build a custom shell script to iterate through various cluster CONF files.

      Final thought and Limitations

      Please be aware that the script is only partially implemented. In particular, the Replication Lag and Replication Failure sections are left as exercises for the user.

      This script is being presented as a backup, lowest common denominator monitoring solution. For a more complete solution, you should explore other options, such as Splunk or Nagios.





      According to Wikipedia, DevOps is a set of practices that combines software development (Dev) and IT operations (Ops) with the goal of shortening the Systems Development Lifecycle, and providing continuous delivery with high software quality. This KB will provide some guidance for system deployment and configuration, which can be integrated into an organizations DevOps processes.

      For more information on using MarkLogic as part of a Continuous Integration/Continuous Delivery process, see the KB  Deployment and Continuous Integration Tools.

      Deploying a Cluster

      Deploying a MarkLogic cluster that will act as the target environment for the application code being developed is one piece of the DevOps puzzle. The approach that is chosen will depend on many things, including the tooling already in use by an organization, as well as the infrastructure that will be used for the deployment.  We will cover two of the most common environments, On-Premise and Cloud.

      On-Premise Deployments

      On-Premise deployments, which can include using bare metal servers, or Virtual Machine infrastructure (such as VMware), are one common environment. You can deploy a cluster to an on-premise environment using tools such as shell scripts, or Ansible. In the Scripting Administrative Tasks Guide, there is a section on Scripting Cluster Management, which provides some examples of how a cluster build can be automated.

      Once the cluster is deployed, some of the specific configuration tasks that may need to be performed on the cluster can be done using the Management API.

      Cloud Deployments

      Cloud deployments utilize flexible compute resources provided by vendors such as Amazon Web Services (AWS), or Microsoft Azure.

      For AWS, MarkLogic provides an example CloudFormation template, that can be used to deploy a cluster to Amazon's AWS EC2 Environment. Tools like the AWS Command Line Interface (CLI), Terraform or Ansible can be used to extend the MarkLogic CloudFormation template, and automate the process of creating a cluster in the AWS EC2 environment.  MarkLogic has provided an example , which can be utilized to . The template can be used to deploy a cluster using the AWS CLI. The template can also be used to Deploy a Cluster Using Terraform, or it can be used to Deploy a Cluster Using Ansible.

      For Azure, MarkLogic has provided Solution Templates for Azure which can be extended for automated deployments using the Azure CLI, Terraform or Ansible.

      As with the on-premise deployments, configuration tasks can be performed on the cluster using the Management API


      This is just a brief introduction into some aspects of DevOps processes for deploying and configuring a MarkLogic Cluster.


      After adding or removing a forest and correspond replica forest in a database, we have seen instances where the Rebalancer does not properly distribute the documents amongst existing and newly added forests.

      For this particular instance, XDMP-HASHLOCKINGRETRY debug level error message reported repeatedly in the error logs.  The messages would look something like: 

      2016-02-11 18:22:54.044 Debug: Retrying HTTPRequestTask::handleXDBCRequest 83 because XDMP-HASHLOCKINGRETRY: Retry hash locking. Forests config hash does not match.

      2016-02-11 18:22:54.198 Debug: Retrying ForestRebalancerTask::run P_initial_p2_01 50 because XDMP-HASHLOCKINGRETRY: Retry hash locking. Forests config hash does not match.


      Gather statistics about the rebalancer to see the number of documents being scheduled. If you run attached script “rebalancer-preview.xqy” in the query console of your MarkLogic Server cluster, it will produce rebalancer statistics in tabular format.

      • Note that you must first change the database name (YourDatabaseNameOnWhichNewForestsHaveBeenAdded) on the 3rd line of the XQuery script “rebalancer-preview.xqy”:

      declare variable $DATABASE as xs:string := xdmp:get-request-field("db", "YourDatabaseNameOnWhichNewForestsHaveBeenAdded");

      If experiencing this issue, the newly added forests will show zero in the “Total to be moved” column in the generated html page.


      Perform a cluster wide restart in order to get past this issue.  The restart is required to reload all of the configuration files across the cluster.  The rebalancer will also check to see if additional rebalancing work needs to occur. The rebalancer should work as expected now and the  XDMP-HASHLOCKINGRETRY messages should no longer appear in the logs. If you run the rebalancer-preview.xqy script again, the statistics should now show the the number of documents being scheduled to be moved.

      You can also validate the rebalancer status from the Database Status page in the Admin UI.

      The XDMP-HASHLOCKINGRETRY rebalancer issue has fixed in the latest MarkLogic Server releases.  However, the rebalancer-preview.xqy script can be used to help diagnose other perceived issues with the Rebalancer.


      Search fundamentals


      Difference between cts:contains and fn:contains

       1) fn:contains is a substring match, where as cts:contains performs query matching

       2) cts:contains therefore can utilize general queries and stemming, where fn:contains does not


      For example:-



      <test>daily running makes you fit</test>


      •         fn:contains(fn:doc(“Example.xml”),”ning”)


      •          cts:contains(fn:doc(“Example.xml”),”ning”)




      •         fn:contains(fn:doc(“Example.xml”),”ran”)


      •         cts:contains(fn:doc(“Example.xml”),”ran”)





      The cts:contains examples are checking the document against cts:word-querys.  Stemming reduces words down to their root, allowing for smaller term lists.


      1) Words from different languages are treated differently, and will not stem to the same root word entry from another language.

      2) Note: Nouns will not stem to verbs and vice versa. For example, the word “runner” will not stem to “run”.



      MarkLogic Server provides a variety of  disaster recovery (DR) facilities including full backup, incremental backup, and journal archiving that when combined with other ML features can create a complete disaster recovery strategy. This paper shows some examples of how these features can be combined. It is not comprehensive nor does it reflect features offered only in the latest releases.


      This article will cover three perspectives. First, a quick overview of the metrics used by businesses to measure the quality of their Disaster Recovery strategies will be covered. Next, an overview of how to combine the features that MarkLogic offers in various categories will be given.

      More?: High Availability and Disaster Recovery features ,  High Availability & Disaster Recovery datasheetScalability, Availability, and Failover Guide 

      Disaster Recovery Criteria

      In order to configure MarkLogic Server to perform well in Disaster Recovery situations, we should first define what parameters we will use to measure each possible approach. For most situations, these four measures are used: 

      Long Term Retention Policy (LTR): Long Term Retention Policy can be driven by any number of business, regulatory and other criteria. It is included here because MarkLogic's backup files are often a key part of an LTR strategy. 

      Recovery Point Objective (RPO)The requirement for how up-to-date the database has to be post-recovery with respect to its state immediately before the incident that required recover.

      Recovery Time Objective (RTO)The requirement for the time elapsed between the incident and the recovery to the RPO.

      CostThe storage cost, the computational resource cost and  the operations cost of the overall deployment strategy.

      Flexible Replication Features

      Flexible replication can be used to support LTR objectives but is generally not useful for Disaster Recovery

      More? Flexible Replication Guide

      Platform Support Features

      Flash backup provides a way to leverage backup features of your deployment platform while maintaining transaction integrity. Platform specific solutions can often achieve RPO and RTO targets that would be impossible through other means.

      More? Flash Backup

      High Availability Features

      Forest replication provides recovery from host failures.

      More? Scalability, Availability, and Failover Guide

      Disaster Recovery Features

      Database Replication

      Database Replication is the process of maintaining copies of forests on databases in multiple MarkLogic Server clusters.

      More? Understanding Database Replication


      Of all your backup options, full backups restore the quickest, but take the most time to backup and possibly the most storage space. Each full backup is a backup set in that it contains everything you need to restore to the point of the backup.

      Full backups with journal archiving allow restores to a point after the backup, but the journal archive grows in an unbounded way with the number of transactions, and replaying the journals to get to your recovery point takes time proportional to the number of transactions in the journal archive, so over time, this becomes less efficient.

      With full + incremental backups, a backup set is a full backup, plus the incremental backups taken after that full backup. Incremental backups are quick to backup, but take longer to restore, and over time the backup set gets larger and larger, so it may end up consuming more backup space than a full backup alone (depending on your backup retention policy).

      Full + incremental backups with journal archiving have the same characteristics as incremental backups, except that you can roll forward from the most recent incremental. With this strategy, the journal archive doesn't grow in an unbounded way because the archive is purged when you take the next incremental backup. Note that if your RPO is between incremental backups, you must also enable a merge timestamp by setting the merge timestamp to a negative value (see below).

      More?: Administrator’s Guide to Backing Up and Restoring a Database  How does "point-in-time" recovery work with Journal Archiving? 

      Forest Merge Configurations

      Forest merges recover the disk space occupied by deleted documents. A negative merge timestamp delays that permanent deletion. If we want incremental backups to contain all the fragments that were deleted since the last incremental backup then we want to set the delay to a period greater than the incremental backup period. This requires more disk space for the incremental backups and also requires additional space in the live database, but provides the most flexibility.

      Setting retain-until-backup on a given database (thru the Admin UI or thru an API call) has a similar effect by telling the server to keep the deleted fragments until a full backup or an incremental backup completes. Many clients choose to use both the negative merge timestamp and retain until backup options together.

      More?: admin:database-set-merge-timestamp  admin:database-set-retain-until-backup

      Other Features

      The need for a negative merge timestamp can be understood by remembering that forest merges recover the disk space occupied by deleted documents. A negative merge timestamp delays that permanent deletion. If we want incremental backups to contain all the fragments that were deleted since the last incremental backup then we want to set the delay to a period greater than the incremental backup period. This requires more disk space for the incremental backups and also requires additional space in the live database, but provides the most flexibility.

      Setting retain-until-backup on a given database (thru the Admin UI or thru an API call) has a similar effect by telling the server to  keep the deleted fragments until a full backup or an incremental backup. Many clients choose to use both the negative merge timestamp and retain until backup options together.

      More?: admin:database-set-merge-timestamp,  admin:database-set-retain-until-backup 


      Planning to meet a Long Term Retention (LTR) policy, a Recovery Point Objective (RPO) and a Recovery Time Objective (RTO) and a Cost goal is a key part of developing an overall MarkLogic deployment plan. MarkLogic offers a wealth of tools that can complement each other when they are properly coordinated. As is clear from this article, the choices are many, broad, and interrelated.

      Regardless of the server version, MLCP does not support concurrent jobs if they are importing from/exporting to the same file.

      In general, MLCP jobs will perform best by maximizing the number of threads in a single MLCP job. Before 10.0-4.2, each MLCP job used 4 threads by default. Starting in 10.0-4.2, each MLCP job now uses the maximum number of threads available on the server as the default thread count (you can read more about this change in the 10.0-4.2 release notes).


      In the more recent versions of MarkLogic Server, there are checks in place to prevent the loading of invalid documents (such as documents with multiple root nodes).  However, documents loaded in earlier versions of MarkLogic Server can now result in duplicate URI or duplicate document errors being reported.

      Additionally, under normal operating conditions, a document/URI is saved in a single forest. If somehow the load process gets compromised, then user may see issues like duplicate URI (i.e. same URI in different forests) and duplicate documents (i.e. same document/URI in same forest).


      If the XDMP-DBDUPURI (duplicate URI) error is encountered, refer to our KB article "Handling XDMP-DBDUPURI errors" for procedures to resolve.

      If one doesn't see XDMP-DBDUPURI errors but running fn:doc() on a document returns multiple nodes then it could be a case of duplicate document in same forest.

      To check that the problem is actually duplicate documents, one can either do an xdmp:describe(fn:doc(...)) or fn:count(fn:doc((...)). If these commands return more than 1 e.g. xdmp:describe(fn:doc("/testdoc.xml")) returns (fn:doc("/testdoc.xml"), fn:doc("/testdoc.xml")) or fn:count(fn:doc("/testdoc.xml")) returns 2 then the problem is of duplicate documents in the same forest (and not duplicate URIs).

      To fix duplicate documents, the document will need to be reloaded.


      This article talks about effects of case sensitivity of search term on search score and thus on final order of search results for a secondary query which is using cts:boost-query and weight. The case-insensitive word term is treated as the lower case word term, so there can be no difference in the frequencies and scores of results for any-case/case-insensitive search term and lowercase search term with “case-sensitive” option or when neither "case-sensitive" nor "case-insensitive" is present. If neither "case-sensitive" nor "case-insensitive" is present, text of search term is used to determine case sensitivity.

      Understanding relevance score

      In MarkLogic Search results are returned in a relevance order. The most relevant results are first in result sequence and least relevant are last.
      More details on relevance score and its calculation are available at,

      Of many ways to control this relevance score one way is to use a secondary query to boost relevance score, . This article takes advantage of examples using secondary query to boost relevance scores and impact of text case (upper, lower or unspecifed) of search terms on relevance score on order of results returned.

      A few examples to understand this scenario

      Consider a few scenarios where below mentioned queries are trying to boost certain search results up using cts:boost-query and weight for word "washington" in returned results.

      Example 1: Search with lowercase search term and option for case not specified

      xquery version "1.0-ml";
      declare namespace html = "";

      for $hit in
      ( cts:search(

      cts:boost-query(cts:element-word-query(xs:QName("test"),"George" ),
      cts:element-word-query(xs:QName("test"),"washington",(), 10.0) )

      return element hit {
      attribute score { cts:score($hit) },
      attribute fit { cts:fitness($hit) },
      attribute conf { cts:confidence($hit) },

      Results for Query1:
      <hit score="28276" fit="0.9393904" conf="0.2769644">
      <test>Washington, George... </test>
      <hit score="16268" fit="0.7125317" conf="0.2100787">
      <test>George washington was the first President of the United States of America...</test>


      Example 2: Search with lowercase search term and case-sensitive option

      xquery version "1.0-ml";
      declare namespace html = "";

      for $hit in
      ( cts:search(

      cts:boost-query(cts:element-word-query(xs:QName("test"),"George" ),
      cts:element-word-query(xs:QName("test"),"washington",("case-sensitive"), 10.0) )

      return element hit {
      attribute score { cts:score($hit) },
      attribute fit { cts:fitness($hit) },
      attribute conf { cts:confidence($hit) },

      Results for Query2:
      <hit score="28276" fit="0.9393904" conf="0.2769644">
      <test>Washington, George... </test>
      <hit score="16268" fit="0.7125317" conf="0.2100787">
      <test>George washington was the first President of the United States of America...</test>


      Example 3: Search with uppercase search term and option case-insensitive, in cts:boost-query like below with rest of query similar to above queries


      cts:boost-query(cts:element-word-query(xs:QName("test"),"George" ),
      cts:element-word-query(xs:QName("test"),"Washington",("case-insensitive"), 10.0) )

      Results for Query3:
      <hit score="28276" fit="0.9393904" conf="0.2769644">
      <test>Washington, George... </test>
      <hit score="16268" fit="0.7125317" conf="0.2100787">
      <test>George washington was the first President of the United States of America...</test>

      Clearly above queries are producing the same scores with same fitness and confidence scores. This is because the case-insensitive word term is treated as the lower case word term, so there can therefore be no difference in the frequencies of those two terms (any-case/case-insensitive and lowercase/case-sensitive), and therefore no difference in scoring. Thus no difference in scores of results for Query3 and Query2.
      And for cases where case sensitivity is not specified, text of search term is used to determine case sensitivity. For Query3 text of search term contains no uppercase hence it treated as "case-insensitive".


      Now let us now take look at a query with a word with uppercase and case-sensitive option in query.

      Example 4: Search with uppercase search term and option case-sensitive, in cts:boost-query like below with rest of query similar to above queries


      cts:boost-query(cts:element-word-query(xs:QName("test"),"George" ),
      cts:element-word-query(xs:QName("test"),"Washington",("case-sensitive"), 10.0) )

      Results for Query4:
      <hit score="44893" fit="0.9172696" conf="0.3489831">
      <test>Washington, George was the first... </test>
      <hit score="256" fit="0.0692672" conf="0.0263533">
      <test>George washington was the first President of the United States of America...</test>


      As we can clearly see the scores are changed for results for Query4 and thus final order of results is also updated.


      While using a secondary query having cts:boost-query and weight, to boost certain search results up, it is important to understand the impact of case of search text on result sequence. A case-insensitive word term is treated as the lower case word term, so there can therefore be no difference in the frequencies of any-case/case-insensitive and lowercase/case-sensitive search terms, and therefore no difference in scoring. For search term with upper case alphabets in text and with “case-sensitive” option scores are boosted up as expected in comparison with a “case-insensitive search”. If neither "case-sensitive" nor "case-insensitive" is present, text of search term is used to determine case sensitivity. If text of search term contains no uppercase, it specifies "case-insensitive". If text of search term contains uppercase, it specifies "case-sensitive".



      MarkLogic Server includes element level security (ELS), an addition to the security model that allows you to specify security rules on specific elements within documents. Using ELS, parts of a document may be concealed from users who do not have the appropriate roles to view them. ELS can conceal the XML element (along with properties and attributes) or JSON property so that it does not appear in any searches, query plans, or indexes - unless accessed by a user with appropriate permissions.

      ELS protects XML elements or JSON properties in a document using a protected path, where the path to an element or property within the document is protected so that only roles belonging to a specific query roleset can view the contents of that element or property. You specify that an element is part of a protected path by adding the path to the Security database. You also then add the appropriate role to a query roleset, which is also added to the Security database.

      ELS uses query rolesets to determine which elements will appear in query results. If a query roleset does not exist with the associated role that has permissions on the path, the role cannot view the contents of that path.


      1. A user with admin privileges can access documents with protected elements by using fn:doc to retrieve documents (instead of using a query). However, to see protected elements as part of query results, even a user with admin privileges will need to have the appropriate role(s).
      2. ELS applies to both XML elements and JSON properties; so unless spelled out explicitly, 'element' refers to both XML elements and JSON properties throughout this article.

      You can read more about how to configure Element Level Security here, and can see how this all works at this Element Level Security Example.


      One of the commonly used document level capabilities is 'update'. Be aware, however, that document level update is too powerful to be used with ELS permissions as someone with document level update privileges could update not only a node, but also delete the whole document. Consequently, a new document-level capability - 'node-update' - has been introduced. 'node-update' offers finer control when combined with ELS through xdmp:node-replace and xdmp:node-delete functions as they can be used to update/delete only the specified nodes of a document (and not the document itself in its entirety).

      Document-level vs Element-level security

      Unlike at the document-level:

      • 'update' and 'node-update' capabilities are equivalent at the element-level. However, at the document-level, if a user only has a 'node-update' privilege to a document, you cannot delete the document. In contrast, 'update' privileges allows that user to delete the document
      • 'Read', 'insert' and 'update' are checked separately at the element level i.e.:
        • read operations - only permissions with 'read' capability are checked
        • node update operations - only permissions with 'node-update' (update) capability are checked
        • node insert operations - only permissions with  'insert' capability are checked

      Note: read, insert, update and node-update can all be used at the element-level i.e., they can be part of the protected path definition.



      1. update: A node can be updated by any user that has an 'update' capability at the document-level
      2. node-update:  A node can be updated by any user with a 'node-update' capability as long as they have sufficient privileges at the element-level


      1. If a node is protected but no 'update/node-update' capabilities are explicitly granted to any user, that node can be updated by any user as long as they have 'update/node-update' capabilities at the document-level
      2. If any user is explicitly granted 'update/node-update' capabilities to that node at the element level, only that specific user is allowed to update/delete that node. Other users who are expected to have that capability must be explicitly granted that permission at the element level

      How does node-replace/node-delete work?

      When a node-replace/node-delete is called on a specific node:

      1. The user trying to update that node must have at least a 'node-update' (or 'update') capability to all the nodes up until (and including) the root node
      2. None of the descendant nodes of the node being replaced/deleted can be protected by a different roles. If they are protected:
        1. 'node-delete' isn’t allowed as deleting this node would also delete the descendant node which is supposed to be protected
        2. 'node-replace' can be used to update the value (text node) of the node but replacing the node itself isn’t allowed

      Note: If a caller has the 'update' capability at the document level, there is no need to do element-level permission checks since such a caller can delete or overwrite the whole document anyway.


      1. 'node-update' was introduced to offer finer control with ELS, in contrast to the document level 'update'
      2. 'update' and 'node-update' permissions behave the same at element-level, but differently at the document-level
        1. At document-level, 'update' is more powerful as it gives the user the permission to delete the entire document
        2. All permissions talk to each other at document-level. In contrast, permissions are checked independently at the element-level
          1. At the document level, an update permission allows you to read, insert to and update the document
          2. At the element level, however, read, insert and update (node-update) are checked separately
            1. For read operations, only permissions with the read capability are checked
            2. For node update operations, only permissions with the node-update capability are checked
            3. For node insert operations, only permissions with the insert capability are checked (this is true even when compartments are used).
      3. Can I use ELS without document level security (DLS)?
        1. ELS cannot be used without DLS
        2. Consider DLS the outer layer of defense, whereas ELS is the inner layer - you cannot get to the inner layer without passing through the outer layer
      4. When to use DLS vs ELS?
        1. ELS offers finer control on the nodes of a document and whether to use it or not depends on your use-case. We recommend not using ELS unless it is absolutely necessary as its usage comes with serious performance implications
        2. In contrast, DLS offers better performance and works better at scale - but is not an ideal choice when you need finer control as it doesn’t allow node-level operations 
      5. How does ELS performance scale with respect to different operations?
        1. Ingestion - depends on the number of protected paths
          1. During ingestion, the server inspects every node for ELS to do a hash lookup against the names of the last steps from all protected paths
          2. For every protected path that matches the hash, the server does a full test of the node against the path - the higher the number of protected paths, the higher the performance penalty
          3. While the hash lookup is very fast, the full test it comparatively much slower - and the corresponding performance penalty increases when there are a large number of nodes that match the last steps of the protected paths
            1. Consequently, we strongly recommend avoiding the use of wildcards at the leaf-level in protected paths
            2. For example: /foo/bar/* has a huge performance penalty compared to /foo/*/bar
        2. Updates - as with ingestion, ELS performance depends on the number of protected paths
        3. Query/Search - in contrast to ELS ingestion or update, ELS query performance depends on the number of query rolesets
          1. Because ELS query performance depends on the number of query rolesets, the concept of Protected PathSet was introduced in 9.0-4
          2. A Protected PathSet allows OR relationships between permissions on multiple protected paths that cover the same element
          3. Because query performance depends on the number of relevant query rolesets, it is highly recommended to use helper functions to obtain the query rolesets of nodes configured with element-level security

      Further Reading


      Some customers have reported problems when attempting to access the Configuration Manager application. In the past, this has been attributed to part of the upgrade process failing for some reason (for example: a port required by MarkLogic already being used) or in some cases it was due to a default databases being removed by the customer at some previous stage.

      XDMP-ARGTYPE Error

      If you see this error when you attempt to access the Configuration Manager:

      500 Internal Server Error XDMP-ARGTYPE XDMP-ARGTYPE: (err:XPTY0004) fn:concat( "could not initialize management plugins with scope: ", $reut:PLUGIN-SCOPE, ": ", xdmp:quote($e)) -- arg1 is not of type xs:anyAtomicType?

      Resolving the error

      Ensure you have an Extensions database configured by doing the following:

      • Log into the MarkLogic Admin interface on port 8001 - http://[your-host]:8001/
      • Under "Databases" box, ensure a database called Extensions is listed

      If it does not exist, download and run the script attached to this article (create-extensions-db.xqy).


      Does MarkLogic provide encryption at rest?

      MarkLogic 9

      MarkLogic 9 introduces the ability to encrypt 'data at rest' - data that is on media (on disk or in the cloud), as opposed to data that is being used in a process. Encryption can be applied to newly created files, configuration files, or log files. Existing data files can be encrypted by triggering a merge or re-index of the data.

      For more information about using Encryption at Rest, see Encryption at Rest in the MarkLogic Security Guide.

      MarkLogic 8 and Earlier releases

      MarkLogic 8 does not provide support for encryption at rest for its own forests.

      Memory consumption

      Memory consumption patterns will be different when encryption is used:

      • To access unencrypted forest data MarkLogic normally uses memory-mapped files. When files are encrypted, MarkLogic instead decrypts the entire index to anonymous memory.
      • As a result, encrypted MarkLogic forests use more anonymous memory and less file-mapped memory than unencrypted forests.  
      • Without encryption at rest, when available memory is low, the operating system can throw out file pages from the working set and later page them in directly from files.  But with encryption at rest, when memory is low, the operating system must write them to swap.

      Using Amazon S3 Encryption For Backups

      If you are hosting your data locally, would like to back up to S3 remotely, and your goal is that there cannot possibly exist unencrypted copies of your data outside your local environment, then you could backup locally and store the backups to S3 with AWS Client-Side encryption. MarkLogic does not support AWS Client-Side encryption, so this would need to be a solution outside MarkLogic.

      See also: MarkLogic documentation: S3 Storage.

      See also: AWS: Protecting Data Using Encryption.


      Here we compare XDBC servers and the Enhanced HTTP server in MarkLogic 8.


      XDBC servers are still fully supported in MarkLogic Server version 8. You can upgrade existing XDBC servers without making any changes and you can create new XDBC servers as you did in previous releases.

      The Enhanced HTTP Server is an additional feature on HTTP servers which is protocol and binary transport compatible with XCC clients, as long as you use the xcc.httpcompliant=true system property.

      The XCC protocol is actually just HTTP, but the details of how to handle body, headers, responses, etc., are "built in" to the XCC client libraries and the XDBC server. The HTTP server in MarkLogic 8 now shares the same low-level code and can dispatch XCC-like requests.


      This article talks about best practices for use of external proxies vs using rewriter rules in the Enhanced HTTP server.


      Whether to use external proxies versus using rewriter rules in the Enhanced HTTP application server is an application design tradeoff not dissimilar to using a single HTTP application server and a XQuery rewriter or endpoint that can dynamically dispatch to different databases and modules (using eval-in).  The Enhanced HTTP server does this type of dispatching much more efficiently, but the concept is similar, with the same pros and cons.

      It is mostly an application and business management issue—by sharing the same port you share the same server configuration (authentication, server settings) and the "outside world" only sees one port, so configuring port-based security on firewalls, routers, or load balancers is more difficult.


      A forest reindex timeout error may occur when there are transactions holding update locks on documents for an extended period of time. A reindexer process is started as a result of a database index change or a major MarkLogic Server upgrade.  The reindexer process will not complete until after update locks are released.

      Example error text seen in the MarkLogic Server ErrorLog.txt file:

      XDMP-FORESTERR: Error in reindex of forest Documents: SVC-EXTIME: Time limit exceeded


      Long running transactions can occur if MarkLogic Server is participating in a distributed transaction environment. In this case transactions are managed through a Resource Manager. Each transaction is executed in a two phase commit. In the first phase, the transaction will be prepared for a commit or a rollback. The actual commit or rollback will occur in the second phase. More details about XA transactions can be found in the Applicactions Developer Guide - Understanding Transactions in MarkLogic Server

      In a situation where the Resource Manager get's disconnected between the two phases, all transactions may be left in a "prepare" state within MarkLogic Server. The Resource Manager maintains transaction information and will clean up transactions left in "prepare" state after a successful reconnect. In the rare case where this doesn't happen, all transactions left in "prepare" state will stay in the system until they are cleaned up manually. The method to manually intervene is described in the XCC Developers Guide - Heuristically Completing a Stalled Transaction.

      In order for a XA transaction to take place, it needs to prepare the execution for the commit. If updates are being made to pre-existing documents, update locks are held against the URIs for those documents. When reindexing is occuring during this process, the reindexer will wait for these locks to be released before it can successfully reindex the new documents.   Because the reindexer is unable to complete due to these pending XA transactions, the hosts in the cluster are unable to completely finish the reindexing task and will eventually throw a timeout error.


      To avoid these kind of reindexer timeouts, it is recommended that the database is checked for outstanding XA transactions in "prepare" state before starting a reindexing process. There are two ways to verify if the database has outstanding transactions in "prepare" state:

      • In the Admin UI, navigate  to each forest of the database and review the status page; or
      • Run the following XQuery code (in Query Console):

        xquery version "1.0-ml"; 
        declare namespace fo = "";   

        for $f in xdmp:database-forests(xdmp:database()) 
          xdmp:forest-status($f)//fo:transaction-coordinator[fo:decision-state = 'prepare']

      In the case where there are transactions in the "prepare" state, a roll-back can be executed:

      • In the Admin UI, click on the "rollback" link for each transaction; or
      • Run the following XQuery code (in Query Console):

        xquery version "1.0-ml"; 
        declare namespace fo = "";

        for $f in xdmp:database-forests(xdmp:database()) 
          for $id in xdmp:forest-status($f)//fo:transaction-coordinator[fo:decision-state = 'prepare']/fo:transaction-id/fn:string()
            xdmp:xa-complete($f, $id, fn:false(), fn:false())


      Query Console is an interactive web-based query development tool for writing and executing ad-hoc queries in XQuery, Server-Side JavaScript, SQL and SPARQL. Query Console enables you to quickly test code snippets, debug problems, profile queries, and run administrative XQuery scripts.  Query Console uses workspaces to assist users with organizing queries.  A user can have multiple workspaces, and each workspace can have multiple queries.


      In MarkLogic Server v9.0-11, v10.0-3 and earlier releases, users may experience delays, lag or latency between when a key is pressed on the keyboard, and when it appears in the Query Console query window.  This typically happens when there are a large number of queries in one of the users workspaces.


      A workaround to improve performance is to reduce the number of queries in each workspace.  The same number of queries can be managed by increasing the number of workspaces and reducing the number of queries in each workspace.  We suggest keeping no more than 30 queries in a workspace to avoid these latency issues.  

      The MarkLogic Development team is looking to improve the performance of Query Console, but at the time of this writing, this performance issue has not yet been resolved. 

      Further Reading

      Query Console User Guide


      Users of Java based batch processing applications, such as CoRB, XQSync, mlcp and the hadoop connector may have seen an error message containing "Premature EOF, partial header line read". Depending on how exceptions are managed, this may cause the Java application to exit with a stacktrace or to simply output the exception (and trace) into a log and continue.

      What does it mean?

      The premature EOF exception generally occurs in situations where a connection to a particular application server connection was lost while the XCC driver was in the process of reading a result set. This can happen in a few possible scenarios:

      • The host became unavailable due to a hardware issue, segfault or similar issue;
      • The query timeout expired (although this is much more likely to yield an XDMP-EXTIME exception with a "Time limit exceeded" message);
      • Network interruption - a possible indicator of a network reliability problem such as a misconfigured load balancer or a fault in some other network hardware.

      What does the full error message look like?

      An example:

      INFO: completed 5063408/14048060, 103 tps, 32 active threads
       Feb 14, 2013 7:04:19 AM com.marklogic.developer.SimpleLogger logException
       SEVERE: fatal error
       com.marklogic.xcc.exceptions.ServerConnectionException: Error parsing HTTP
       headers: Premature EOF, partial header line read: ''
       [Session: user=admin, cb={default} [ContentSource: user=admin,
       cb={none} [provider: address=localhost/, pool=0/64]]]
       [Client: XCC/4.2-8]
       at com.marklogic.xcc.impl.SessionImpl.submitRequest(
       at Source)
       at Source)
       at java.util.concurrent.FutureTask$Sync.innerRun(
       at java.util.concurrent.Executors$
       at java.util.concurrent.FutureTask$Sync.innerRun(
       Caused by: Error parsing HTTP headers: Premature EOF,
       partial header line read: ''
       at com.marklogic.http.HttpHeaders.nextHeaderLine(
       at com.marklogic.http.HttpHeaders.parseResponseHeaders(
       at com.marklogic.http.HttpChannel.parseHeaders(
       at com.marklogic.http.HttpChannel.receiveMode(
       at com.marklogic.http.HttpChannel.getResponseCode(
       ... 11 more
       2013-02-14 07:04:19.271 WARNING [12] (AbstractRequestController.runRequest):
       Cannot obtain connection: Connection refused

      Configuration / Code: things to try when you first see this message

      A possible cause of errors like this may be due to the JVM starting garbage collection and this process taking long enough as to exceed the server timeout setting. If this is the case, try adding the -XX:+UseConcMarkSweepGC java option

      Setting the "keep-alive" value to zero for the affected XDBC application server will disable socket pooling and may help to prevent this condition from arising; with keep-alive set to zero, sockets will not be re-used. With this approach, it is understood that disabling keep-alive should not be expected to have a significant negative impact on performance, although thorough testing is nevertheless advised.


      Here we discuss various methods for sharing metering data with Support:  telemetry in MarkLogic 9 and exporting monitoring data.



      In MarkLogic 9, enabling telemetry collects, encrypts, packages, and sends diagnostic and system-level usage information about MarkLogic clusters, including metering, with minimal impact to performance. Telemetry sends information about your MarkLogic Servers to a protected and secure location where it can be accessed by the MarkLogic Technical Support Team to facilitate troubleshooting and monitor performance.  For more information see Telemetry.

      Meters database

      If telemetry is not enabled, make sure that monitoring history is enabled and data has been collected covering the time of the incident.  See Enabling Monitoring History on a Group for more details.  

      Backup of Meters database

      A backup of the full Meters database will provide all the available raw data and is very useful, but is often very large and difficult to transfer, so an export of a defined time range is often requested.

      Exporting data

      One of the attached scripts can be used in lieu of a Meters database backup. They will provide the raw metering XML files from a defined period of time and can be reloaded into MarkLogic and used with the standard tools.


      This XQuery export script needs to be executed in Query Console against the Meters database and will generate zip files stored in the defined folder for the defined period of time.

      Variables for start and end times, batch size, and output directory are set at the top of the script.

      This bash version will use MLCP to perform a similar export but requires an XDBC server and MLCP installed. By default the script creates output in a subdirectory called meters-export. See the attached script for details. An example command line is

      ./ localhost admin admin "2018-04-12T00:00:00" "2018-04-14T00:00:00"


      Within a MarkLogic deployment, there can be multiple primary and replica objects. Those objects can be forests in a database, databases in a cluster, nodes in a cluster, and even clusters in a deployment. This article walks through several examples to clarify how all these objects hang together.

      Shared-disk vs. Local-disk failover

      Shared-disk failover requires a shared filesystem visible to all hosts in a cluster, and involves one copy of a data forest, managed by either its primary host, or its failover host (so forest1, assigned to host1, failing over to host2).

      Local-disk failover involves two copies of data in a primary and local disk failover replica forest (sometimes referred to as an "LDF"). Primary hosts manage primary forests, and failover hosts manage the corresponding synchronized LDF (forest1 on host1, failing over to replicaForest1 on host2).

      Database Replication

      In the same way that you can have multiple copies of data within a cluster (as seen in local-disk failover), you can also have multiple copies of data across clusters as seen in either database replication or flexible replication. Within a replicated environment you'll often see reference to primary/replica databases or primary/replica clusters. So this will often look like forest1 on host1 in cluster1, replicating to forest1 on host1 in cluster2. We can shorten forest names here to c1.h1.f1 and c2.h1.f1. Note that you can have both local disk failover and database replication going at the same time - so on your primary cluster, you'll have c1.h1.f1, as well as failover forest c1.h2.rf1, and your replica cluster will have c2.h1.f1, as well as its own failover forest c2.h2.rf1. All of these forest copies should be synchronized both within a cluster (c1.h1.f1 synced to c1.h2.rf1) and across clusters (c1.h1.f1 synced to c2.h1.f1).

      Configured/Intended vs. Acting

      At this point we've got two clusters, each with at least two nodes, where each node has at least one forest - so four forest copies, total (bear in mind that databases can have dozens or even hundreds of forests - each with their own failover and replication copies). The "configured" or "intended" arrangement is what your deployment looks like by design, when no failover or any other kind of events have occurred that would require one of the other forest copies to serve as the primary forest. Should a failover event occur, c1.h2.rf1 will transition from the intended LDF to the acting primary, and its host c1.h2 will transition from the intended failover host to the acting primary host. At this point, the intended primary forest c1.h1.f1 and its intended primary host c1.h1 will likely both be offline. Failing back is the process of reverting hosts and forests from their acting arrangement (in this case, acting primary forest c1.h2.rf1 and acting primary host c1.h2), back to their intended arrangement (c1.h1.f1 is both intended and acting primary forest, c1.h1 is both intended and acting primary host).

      This distinction between intended vs. acting can even occur at the cluster level, where c1 is the intended/acting primary cluster, and c2 is the intended/acting replica cluster. Should something happen to your primary cluster c1, the intended replica cluster c2 will transition to the acting primary cluster while c1 is offline.


      • It's possible to have multiple copies of your data in a MarkLogic Server deployment
      • Under normal operations, these copies are synchronized with one another
      • Should failover events occur in a cluster, or catastrophic events occur to an entire cluster, you can shift traffic to the available previously synchronized copies
      • Failing back to your intended configuration is a manual operation
        • Make sure to re-synchronize copies that were offline with online copies
        • Shifting previously offline copies to acting primary before re-synchronization may result in data loss, as offline forests can overwrite updates previously committed to forests serving as acting primaries while the intended primary forests were offline


      To avoid index bloat, MarkLogic only records positions in its indexes for words once for word-query fields. When word positions are necessary to accurately match element-word queries, they are normally used from the word-query field. When elements are excluded from the word query field, words under those elements are not indexed - so their positions are not recorded. In MarkLogic 7.0-5 and 8.0-1, a code change was included to avoid false negatives resulting from an element-word query expecting positions from words in elements descended from excluded elements. This code change was to not use positions from the word-query field for element-word searches if the word-query field has exclusions.


      Unfortunately, this solution can sometimes result in false positives - which is captured in 7.0-5 bug #33207 and 8.0-1 bug #32686 (you can read more about both of these bugs in our Fixed Bugs Report). Consequently, a follow-up refinement was shipped in 7.0-5.1 & 8.0-2 to allow for the affected queries to be fully resolveable via indexes. To take advantage of this update, three changes are required:

      1) Upgrade to 7.0-5.1 or later, or 8.0-2 or later

      2) Database index settings must be updated to tell MarkLogic Server to use positions in this scenario and therefore avoid the previously seen false positives. There are two changes that could be made. Either:

      2a. The element in the element-word query must be explicitly included in the word-query field


      2b. All the word-query excluded elements must be configured as phrase-around elements.

      3) After the relevant database index settings are updated and the upgrade has been applied, a reindex must be performed

      If these changes are made, positions in the word-query field should then be used, which should then ultimately result in the elimination of false positives.


       A "fast data directory" is configurable for each forest, and can be set to a directory built on a fast file system, such as one using SSDs. Refer to Using a mix of SSD and spinning drives. If configured MarkLogic Server will try to put as many writes and seeks to the Fast Data Directory (FDD) as it can. As such, it will try to put as many on disk stands as possible onto the FDD. Frequently updated documents tend to reside in the smaller stands and thus are more likely to reside on the FDD.

      This article attempts to explain how you should account for the FDD when sizing disk space for your MarkLogic Server.


      Forest journals will be placed on the fast data directory. 

      Each time an automatic merge is performed, MarkLogic Server will attempt to save the results onto the forest's fast data directory. If there is not sufficient space on the FDD, MarkLogic Server will use the forest's primary data directory. To preserve space for future small stands, MarkLogic Server is conservative in deciding whether to put the merge destination stands on the FDD, which means that even if there is enough available space, it may store the result to the forests regular data directory. For more details, refer to the fundamental of resource consumption white paper. 

      It is also important to know when the Fast Data Directory is not used: Stands created from a manually triggered merges do not get stored on the fast data directory, but in the forest's primary data directory. Manual merges can be executed by calling the xdmp:merge function or from within the Admin UI; Forest-migrate  and Restoring backups do not put stands in the fast data directory.


      MarkLogic Server maintains some disk space in the FDD for checkpoints and journaling. However, since the Fast Data Directory is not used in some procedures, we should not count the size of the FDD when sizing the disk space needed for forest data.


      The Performance Considerations section of the Loading Content Into MarkLogic Server documentation states 

      "When you load content, MarkLogic Server performs updates transactionally, locking documents as needed and saving the content to disk in the journal before the transaction commits. By default, all documents are locked during an update and the journal is set to preserve committed transactions, even if the MarkLogic Server process ends unexpectedly."

      There are two types of locking which are specified at the database level:

      • Fast locking employs a hashed locking scheme (based on the URI) where each fragment URI has a designated forest, so the lock created during the insert is restricted only to that forest.
      • Setting up a database with "strict" locking will force the coordination of an update lock across all forests in the database (and across the cluster) until the insert has taken place.

      Fast locking has been the default setting for newly created MarkLogic databases since MarkLogic 5 (released October 2011)

      When should I use strict locking?

      If at any point in your code, you are specifying the forest to insert document or fragment into (using a technique commonly referred to as in-forest evaluation), configuring the setting for that database at "strict" is definitely the safest choice. If your code always allows the server to determine the target forest for the document/fragment, you're perfectly safe using fast locking.

      In the situation where two different people create the same document (with the same URI) and where fast locking was taking place, this would result in:

      • A transaction culminating in an insert into a given forest (as assigned by the ML node servicing the request) for the first fragment
      • An "update" transaction (in the same forest) where the first fragment is then marked as deleted
      • A new fragment takes place of the first fragment to complete the second transaction

      Subsequent merges would then remove the stand entry for the first fragment (now deleted/replaced by the subsequent transaction)

      The fast option would not create a dangerous race condition unless your application would allow two different people to insert a document with the same URI into two different forests as two separate transactions and where URI assignment is handled by your XQuery/application layer; if the code responsible for making those transactions were to inadvertently assign the same URI to two different forests in a cluster, this could cause a problem that strict locking would guard against. If your application always allows MarkLogic to assign the forest for the document, there is no danger whatsoever in keeping to the server default of "fast" locking.

      Additionally - consider what kind of failover you system is using. When using fast journaling with local disk replication, the journal disk write needs to fail on both master and replica nodes in order for data loss to occur - so there's no need for strict in this scenario. In contrast, strict journaling should be used with shared-disk failover, as data loss is possible if using fast journaling and a single node fails before the OS flushes the buffer to disk.

      Is there a performance implication in switching to strict locking?

      Fast locking will be faster than strict locking, but the performance penalty is largely going to be dependent on a number of factors; the number of forests in a given database, the number of nodes across which the database forests are spread and the speed at which all nodes in the cluster can coordinate a transaction across the cluster (Network/IO) will all have some (potentially minimal) impact.

      If the conditions of your application suit, we recommend staying with the default of fast locking on all your databases.

      There may be reasons for using 'strict' locking - especially if you are considering loading documents using in-forest-evaluation in your code.

      Further reading


      There are situations where the SVC-DIRREM, SVC-DIROPEN and SVC-FILRD errors occur on backups to an NFS mounted drive. This article explains how this condition can occur and describes a number of recommendations to avoid such errors.

      Under normal operating conditions, with proper mounting options for a remote drive, MarkLogic Server does not expect to report SVC-xxxx errors.  Most likely, these errors are a result of improper nfs disk mounting or other IO issues.

      We will begin by exploring methods to narrow down the server which has the disk issue and then list some things to look into in order to identify the cause.

      Error Log and Sys Log Observation

      The following errors are typical MarkLogic Error Log entries seen during an NFS Backup that indicate an IO subsystem error.   The System Log files may include similar messages.

              Error: SVC-DIRREM: Directory removal error: rmdir '/Backup/directory/path': {OS level error message}

              Error: SVC-DIROPEN: Directory open error: opendir '/Backup/directory/path': {OS level error message}

              Error: Backup of forest 'forest-name' to 'Bakup path' SVC-FILRD: File read error: open '/Backup/directory/path': {OS level error message}

      These SVC- error messages include the {OS level error message} retrieved from the underlying OS platform using generic C runtime strerror() system call.  These messages are typically something like "Stale NFS file handle" or "No such file or directory".

      If only a subset of hosts in the cluster are generating these types of errrors ...

      You should compare the problem host's NFS configuration with rest of the hosts in the cluster to make sure all of the configurations are consistent.

      • Compare nfs versions (rpm -qa | grep -i nfs)
      • Compare nfs configurations (mount -l -t nfs, cat /etc/mtab, nfsstat)
      • Compare platform version (uname -mrs, lsb_release -a) 

      NFS mount options 

      MarkLogic recommends the NFS Mount settings - 'rw,bg,hard,nointr,noac,tcp,vers=3,timeo=300,rsize=32768,wsize=32768,actimeo=0'

      • Vers=3 :  Must have NFS client version v3 or above
      • TCP : NFS must be configured to use TCP instead of default UDP
      • NOAC : To improve performance, NFS clients cache file attributes. Every few seconds, an NFS client checks the server's version of each file's attributes for updates. Changes that occur on the server in those small intervals remain undetected until the client checks the server again. The noac option prevents clients from caching file attributes so that applications can more quickly detect file changes on the server.
        • In addition to preventing the client from caching file attributes, the noac option forces application writes to become synchronous so that local changes to a file become visible on the server immediately. That way, other clients can quickly detect recent writes when they check the file's attributes.
        • Using the noac option provides greater cache coherence among NFS clients accessing the same files, but it extracts a significant performance penalty. As such, judicious use of file locking is encouraged instead. The DATA AND METADATA COHERENCE section contains a detailed discussion of these trade-offs.
        • NOTE: The noac option is a combination of the generic option sync, and the NFS-specific option actimeo=0.
      • ACTIME=0 : Using actimeo sets all of acregminacregmaxacdirmin, and acdirmax to the same "0" value. If this option is not specified, the NFS client uses the defaults for each of these options listed above.
      • NOINTR : Selects whether to allow signals to interrupt file operations on this mount point. If neither option is specified (or if nointr is specified), signals do not interrupt NFS file operations. If intr is specified, system calls return EINTR if an in-progress NFS operation is interrupted by a signal.
        • Using the intr option is preferred to using the soft option because it is significantly less likely to result in data corruption.
        • The intr / nointr mount option is deprecated after kernel 2.6.25. Only SIGKILL can interrupt a pending NFS operation on these kernels, and if specified, this mount option is ignored to provide backwards compatibility with older kernels.
      • BG : If the bg option is specified, a timeout or failure causes the mount command to fork a child which continues to attempt to mount the export. The parent immediately returns with a zero exit code. This is known as a "background" mount.
      • HARD (vs soft) : Determines the recovery behavior of the NFS client after an NFS request times out. If neither option is specified (or if the hard option is specified), NFS requests are retried indefinitely. If the soft option is specified, then the NFS client fails an NFS request after retrans retransmissions have been sent, causing the NFS client to return an error to the calling application.
        • Note: A so-called "soft" timeout can cause silent data corruption in certain cases. As such, use the soft option only when client responsiveness is more important than data integrity. Using NFS over TCP or increasing the value of the retrans option may mitigate some of the risks of using the soft option. 

      Issue persists => Further debugging 

      If after checking NFS configuration and after implementing the MarkLogic recommended NFS mount settings, the issue persists, then you will need to debug the NFS connection during an issue period.    You should enable rpcdebug for NFS on the hosts showing the NFS errors, and then analyze the resulting syslogs during a period that is experiencing the issues

              rpcdebug -m nfs -s all

       The resulting logs may give you additional information to help understand what the source of the failures are.



      It has long been possible to store binary files in MarkLogic. In the MarkLogic 5 release in 2011, binary support was enhanced to allow for even more control over binary files.

      The purpose of this Knowledgebase article is not to cover MarkLogic's binary support in depth but to demonstrate a technique for retrieving a list of URIs for binary files which are managed in a MarkLogic Database.

      Retrieving a list of binary document URIs from MarkLogic Server

      The following code will use a call to cts:uris to get back a list of all URIs pointing to binary documents for a given MarkLogic database; note that this example assumes that you have the uri lexicon enabled in your database:

      Further reading

      People often want fine-grained entitlement control in the applications they build on top of MarkLogic Server. This article discusses two options and their performance implications.

      Best Practice

      Often, we'll see people attempt an implementation using MarkLogic users and roles. While MarkLogic Server can easily handle a large number of roles in total, you'll run into scalability and performance issues if you have a large number of roles per user. Additionally, you'll want to minimize the number of updates to documents in your Security database as every update requires Security caches to be re-validated, thus incurring a performance penalty.

      Instead, for a more scalable and performant solution, you will want to build your entitlements into your documents at the application level, then query those entitlement values with element range indexes on the elements containing those entitlement values.


      When attempting to start MarkLogic Server on older versions of Linux (Non-supported platforms), a "Floating Point Exception" may prevent the server from starting.

      Example of the error text from system messages:

      kernel: MarkLogic[29472] trap divide error rip:2ae0d9eaa80f rsp:7fffd8ae7690 error:0


      Older Linux kernels will, by default, utilize older libraries.  When a software product such as MarkLogic Server is built using a newer version of gcc, it is possible that it will fail to execute correctly on older systems.  We have seen it in cases where the glibc library is out of date, and not containing certain symbols that were added in newer versions. Refer to the RedHat bug that explains this issue:

      The recommended solution is to upgrade to a newer version of your Linux distribution.  While you may be able to resolve the immediate issue by only upgrading the glibc library, it is not recommended.


      Attached to this article is an XQuery module: "appserver-status.xqy", which will generate a report on all requests currently "in-flight" across all application servers in your cluster


      Run this in Query Console (be sure to display results as html output), it will generate an html table showing all requests currently "in-flight" across all application servers in your cluster. For any transaction taking over 60 seconds, it provides extra detail to help understand and identify bottlenecks where specific modules (or tasks) may be having an adverse effect on the overall performance of the cluster.

      The information generated by this module can be used in conjunction with any ticket opened with the support team where assistance is required to better understand and resolve performance issues relating to specific modules. This module could also be used in a situation where DBAs want to perform routine health checks on their cluster to find and identify slow running queries.


      At the time of this writing (MarkLogic 9), MarkLogic Server cannot perform spherical queries, as the geospatial indexes do not support a true 3D coordinate system.  In situations where cylindrical queries are sufficient, you can create a 2D geospatial index and a separate range index on an altitude value. An "and-query" with these indexes would result in a cylindrical query.


      Consider the following sample document structure:

      Configure these 2 indexes for your content database:

      1. Geospatial Element Pair index specifying latitude localname as ‘lat’ , longitude localname ‘long’ and ‘parent localname’ as ‘location’ in configuration
      2. Range element index with localname as ‘alt’ with int scalar type

      Assuming you have data in your content database matching above document structure, this query:

      will return all the documents with location i.e., points falling in the cylinder with center at 37.655983, -122.425525 having a radius of 1000 miles and with an altitude of less than 4 miles.

      Note that in MarkLogic Server 9 geospatial region match was introduced, so the above technique can be extended beyond cylinders.


      The MarkLogic Monitoring History dashboard (http://localhost:8002/history/) is probably the easiest way to gather monitoring history data, but almost all of this information available within the monitoring dashboard is also available over our ReST APIs:

      Application Server Status details

      Information on Application Severs can be found at and here's an example for getting detailed metrics - http://localhost:8002/manage/v2/servers?group-id=Default&view=metrics&format=xml

      For Application Server status information - and here's an example with detailed metrics http://localhost:8002/manage/v2/servers?view=status&group-id=Default&format=xml&fullrefs=true

      To access status information for a specific Application Server (for example, the TaskServer), you can get the current status by adding the name to the URI - http://localhost:8002/manage/v2/servers/TaskServer?group-id=Default&view=status&format=xml

      You can also get the configuration information for a given application server (for example: "Admin") over the ReST API - http://localhost:8002/manage/v2/servers/Admin/properties?group-id=Default&format=xml

      Database and Forest status details

      For databases and forests, you can similarly use the endpoints for /databases or /forests:

      Database level examples include:

      Forest level examples include:

      MarkLogic default Group Level Cache and Huge Pages settings

      The table below shows the default (and recommended) group level cache settings based on a few common RAM configurations for the 9.0-9.1 release of MarkLogic Server:

      Total RAM List Cache Compressed Tree Cache Expanded Tree Cache Triple Cache Triple Value Cache Default Huge Page Ranges
      8192 (8GB) 1024 (1 partition) 512 (1 partition) 1024 (1 partition) 512 (1 partition) 1024 (2 partitions) 1280 to 1994
      16384 (16GB) 2048 (1 partition) 1024 (2 partitions) 2048 (1 partition) 1024 (2 partitions) 2048 (2 partitions) 2560 to 3616
      24576 (24GB) 3072 (1 partition) 1536 (2 partitions) 3072 (1 partition) 1536 (2 partitions) 3072 (4 partitions) 3840 to 4896
      32768 (32GB) 4096 (2 partitions) 2048 (3 partitions) 4096 (2 partitions) 2048 (3 partitions) 4096 (6 partitions) 5120 to 6176
      49152 (48GB) 6144 (2 partitions) 3072 (4 partitions) 6144 (2 partitions) 3072 (4 partitions) 6144 (8 partitions) 7680 to 8736
      65536 (64GB) 8064 (3 partitions) 4032 (6 partitions) 8064 (3 partitions) 4096 (6 partitions) 8192 (11 partitions) 10080 to 11136
      98304 (96GB) 12160 (4 partitions) 6080 (8 partitions) 12160 (4 partitions) 6144 (8 partitions) 12160 (16 partitions) 15200 to 16256
      131072 (128GB) 16384 (6 partitions) 8192 (11 partitions) 16384 (6 partitions) 8192 (11 partitions) 16384 (22 partitions) 20480 to 21020
      147456 (144GB) 18432 (6 partitions) 9216 (12 partitions) 18432 (6 partitions) 9216 (12 partitions) 18432 (24 partitions)

      23040 to 24096

      262144 (256GB) 32768 (9 partitions) 16384 (11 partitions) 32768 (9 partitions) 16128 (22 partitions) 32256 (32 partitions)

      40320 to 42432

      Note that these values are safe to use for MarkLogic 7 and above.

      For all the databases that ship with MarkLogic Server, the Huge Pages ranges on this table will cover the out-of-the box configuration. Note that adding more forests will cause the second value in the range to increase.

      From MarkLogic Server 9.0-7 and above

      In the 9.0-7 release and above (and all versions of MarkLogic 10), automatic cache sizing was introduced; this setting is usually recommended.

      Note: For RAM size greater than 256GB, group cache settings are configured the same as for 256GB with automatic cache sizing. These can be changed using manual cache sizing.

      Maximum group level cache settings

      Assuming a Server configured with 256GB RAM (and above), these are the maximum sizes for the three main group level caches and will utilise 180GB (184320MB) per host for the Group Level Caches:

      • Expanded Tree Cache - 73728 (72GB) (with 9 8GB partitions)
      • List Cache - 73728 (72GB) (with 9 8GB partitions)
      • Compressed Tree Cache - 36864 (36GB) (with 11 3 GB partitions)

      We have found that configuring 4GB partitions for the Expanded Tree Cache and the List Cache generally works well in most cases; for this you would set the number of partitions to 18

      For the Compressed Tree Cache the number of partitions can be set to 22.

      Important note

      The maximum number of configurable partitions is 32

      Each cache partition should be no more than 8192 MB


      MarkLogic Server has a notion of groups, which are sets of similarly configured hosts within a cluster.

      Application servers (and their respective ports) are scoped to their parent group.

      Therefore, you need to make sure that the host and its exposed port to which you're trying to connect both exist in the group where the appropriate application server is defined. For example, if you attempt to connect to a host defined in a group made up of d-nodes, you'll only see application servers and ports defined in the d-nodes group. If the application server you actually want is in a different group (say, e-nodes), you'll get a connection error, instead.


      Can I use any xdmp builtins to show which application servers are linked to particular groups?

      The code example below should help with this:


      The errors 'XDMP-MODNOTFOUND - Module not found' and 'XDMP-NOPROGRAM - Server unable to build program from request' may occur when the requested module does not exist or the user does not have the right permissions on the module.


      When either of these errors is encountered, the first step would be to check if the requested XQuery/JS module is actually present in the modules database. Make sure the the document uri matches the 'root' of the relevant app-server.

      'Modules' field of the app-server configuration specifies the name of the database in which this app-server locates the application code (if it is not set to 'File-system'). When it is set to a specific database, then only documents in that database whose URI begin with the specified root directory are executable. For example, if 'root'  of the database is set to "/codebase/xquery/", then only documents in the database which start with this uri "/codebase/xquery/" are executable.

      If set to 'File-system' make sure the requested module exists in the location specified in the 'root' directory of the app-server. 

      Defining a 'File-system' location is often used on single node DEV systems but not recommended on a clustered environment. To keep the deployment of code simple it is recommended to use a Modules database in clustered production system.

      Once you made sure that the module does exist, the next step is to check if the user has the right permissions to execute the database. More often, it is likely that the error is caused because of a permissions issue.

      (i) Check app-server privileges

      The 'privilege' field in the app-server configuration, when set, specified the execute privilege required to access the server. Only users who are assigned this privilege can access the server and the application code. Absence of this privilege may cause the XDMP-NOPROGRAM error.

      Make sure the user accessing the app-server has the specified privileges. This can be checked by using sec:user-privileges() (Should be run against the Security database).

      The documentation here - contains more detailed information about privileges.

      (ii) Check permission on the requested module

      The user trying to access the application code/modules is required to have the 'execute' permission on the module. Make sure all the xquery documents have 'read' and 'execute' permissions for the user trying to access them. This can be verified by executing the following query against your 'modules' database:


      This returns a list of permission on the document - with the capability that each role has, in the below format:

                    <sec:permission xmlns:sec="">
                    <sec:permission xmlns:sec="">

      You can then map the role-ids to their role names as below: (this should be done against the Security database)

                    import module namespace sec="" at "/MarkLogic/security.xqy";

      If you see that the module does not have execute permission for the user, the required permissions can be added as below: (


                   xdmp:permission("role-name", "execute")))








      Recent exploits in the TLS protocol such as POODLE, FREAK, LogJam, and SLOTH have rendered TLSv1.0 and SSLv3 largely obsolete.  Additionally, standards councils such as PCI (Payment Card Industry) and NIST (National Institute of Standards & Technology) are moving to disallow the use of these protocols.

      This article will describe the MarkLogic configuration changes needed to harden a MarkLogic HTTP Application Server so that only secure versions of TLS are used and where clients attempting to connect with TLSv1.0 or earlier protocols are rejected.

      Note: Since this article was first written MarkLogic server has added an administrator function to disable individual SSL and TLS protocol versions. If you are still running MarkLogic version 8.0-5 or earlier you can continue to use the solution outlined below, otherwise, users of MarkLogic 9 or later should use the new AppServer Set SSL Disabled Protocols function to control which SSL and TLS protocol versions are available.


      The TLS protocol versions accepted and the Cipher suites selected are controlled by the specification list set in the "SSL Ciphers" field on the HTTP App Server Configuration panel:

      The format of the specification list follows the OpenSSL format as described in the OpenSSL Cipher suite documentation and comprises one or more colon ":" separated ciphers strings which control which cipher suites are enabled or disabled. 

      The default specification used by MarkLogic enables ALL ciphers except those that are considered of LOW encryption and places them in order of @STRENGTH 


      While sufficient for a lot of needs the default settings still allow for cipher negotiations that are no longer considered secure or weak signature algorithms such as MD2 and MD5. The following cipher specification string enhances security by only permitting AES and Triple DES (3DES) ciphers while at the same time disabling MD2 and MD5 signature algorithms.


      PCI DSS 3.2 & NIST SP 800-52 compliance

      At this stage, while the MarkLogic HTTP Application Server is now using stronger security it will still permit a client to connect using TLSv1.0. In order to comply with PCI DSS 3.2, compliant sites must stop using TLSv1.0 by 30th June 2018 while NIST SP 800-52 requires that sites only use TLSv1.1 with a recommendation to use TLSv1.2 where possible.

      TLSv1.2 and browser support

      For TLSv1.2, older browsers should be upgraded to current versions.

      Making these changes may require users accessing your application to upgrade older browsers such as Firefox < 27.0 or Internet Explorer < 11.0 as these versions do not support TLSv1.2 by default.

      The MarkLogic App Server utilizes OpenSSL which does not explicitly support enabling or disabling a specific TLS protocol version, however by disabling the all cipher suites associated with a particular version you effectively get the same outcome.

      SSLv3, TLSv1.0 & TLSv1.1 share the same common ciphers, so adding "!SSLv3" to the cipher specification will cause all client connection attempts using any of these protocols to fail.


      Testing using the OpenSSL s_client utility shows that attempts to connect using TLSv1.0 fail with SSL alert 40 indicating no common cipher was available.

      openssl s_client -connect -debug -tls1
      140735283961936:error:14094410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure:s3_pkt.c:1472:SSL alert number 40
      140735283961936:error:1409E0E5:SSL routines:ssl3_write_bytes:ssl handshake failure:s3_pkt.c:656:

      While connecting using TLSv1.2 is successful.

      openssl s_client -connect -debug -tls1_2
      New, TLSv1/SSLv3, Cipher is AES256-GCM-SHA384
      Server public key is 2048 bit
      Secure Renegotiation IS supported
      Compression: NONE
      Expansion: NONE
      No ALPN negotiated
      Protocol : TLSv1.2
      Cipher : AES256-GCM-SHA384

      Further reading

      On MarkLogic Security Certification

      How does MarkLogic Server's high-availability work in AWS?

      AWS provides fault tolerance within a geographic region through the use of Availability Zones (AZs) while MarkLogic gives that ability through Local Disk Failover (LDF). If you’re using AWS, the best practice is to place each MarkLogic node/EC2 instance in a different Availability Zone within a single region, where a given data forests is in one AZ (AZ A), while its LDF forest is in a different AZ (AZ B). This way, in the event where access to Availability Zone A is lost, the host in the Availability Zone A will failover to its LDF on the host in Availability Zone B, thereby ensuring high-availability within your MarkLogic cluster.

      Further reading:

      Should failover be configured for the Security forest?

      A cluster is not functional without its Security database. Consequently, it’s important to ensure high-availability of the Security database’s forest by configuring failover for that forest.

      Further reading:

      Should my forests have more than one Local Disk Failover forest?

      High-availability through Local Disk Failover with one LDF forest is designed to allow the cluster to survive the failure of a single host. If you're using AWS, careful forest placement across AWS availability zones can provide high-availability even in the event of an entire availability zone going down. With rare exceptions, additional LDF forests are typically not worth the additional complexity and cost for the vast majority of MarkLogic deployments.

      If you configure Local Disk Failover with one LDF coupled with Database Replication and Backups, you would have enough copies of your data to survive the failure of a single host to an entire availability zone.

      Do I still have high-availability post failover? What happens to the data forest? How can I fail back my forests to the way they were?

      When a failover event occurs, the LDF forest takes over as the acting data forest and the configured data forest will assume the role of the acting LDF forest as soon as it is successfully restarted. At this point, as long as both forests are still available, the cluster continues to be high availability but with forests reversing their originally intended roles. To fail back the forests into the roles they were originally intended, you will need to wait until the acting data forest (the originally intended LDF) and acting LDF (the originally intended data forest) are synchronized, then manually restart the acting data forest/intended LDF. At that point, the acting LDF/intended data forest “fails back” to take over its original role of acting data forest, and the acting data forest/intended LDF will once again assume its original role of acting LDF. In short, failover is automatic, but failing back requires a manual restart of the acting data forest/intended LDF. When failing back, it's very important to wait until the forests are synchronized - if you fail back before the forests are fully synchronized, you'll lose any data in the acting data forests that's yet to be propagated back to the acting LDF/intended data forest.

      Further reading:

      Where does the hostname come from?

      • If there is a MARKLOGIC_HOSTNAME environmental variable, it is used as the hostname
      • If there is no environment variable configured, the gethostname() library function is called (instead of the gethostname() system call since we use the GNU C Library - see notes here for more info) which internally calls uname() function 
        • This uname() function looks for and returns the nodename which does or does not have a '.' in it (you can also get the output of the uname() call by running the uname --nodename command on the terminal)
          • If the response from the uname() call has a '.' in it, we consider it a complete name and use it as the hostname
          • Otherwise, we look at the resolv.conf for a domain entry/search entry and we take the first entry and to get the complete hostname, we add this entry from resolv.conf to the uname output from the above step followed by a '.' and use that as a hostname
            • E.g.: <uname_output>.<resolv.conf_entry>
            • Note: the resolv.conf file could have both a domain and a search entry and usually domain entry takes priority over search


      If you experience a hostname mismatch or any hostname issue in general, you can check the following:

      • The following commands/functions are different ways to return the hostname (and you can verify if there is a mismatch)
        • Functions:
        • Commands:
          • hostname
          • hostname -f  (returns FDQN with '.')
          • hostname -d  (lists all the domains)
      • Check the resolv.conf file (under /etc) to see if it contains the right hostname
        • If yes and the issue still persists, restarting ML server would help because if ML is getting the hostname from this file, it will do so at startup

      Note: When you want to open a support ticket in this context, attaching the above information (the outputs of the above commands/functions and the contents of resolv.conf file) along with it would help speed up the investigation

      Possible issues with hostname mismatch:

      Introduction: getting more information about the bugs fixed between releases

      As a general recommendation, we encourage customers to keep the server up-to-date with patch releases at any case.

      If you would like a list of some of the published bugs that were addressed between two releases of the server (for example: 5.0-3 and 5.0-4.1), you can perform the following steps:

      - Log into the support portal at
      - Click on the "Fixed bugs" icon to take you to the bugtrack list
      - Select 5.0-3 in the From: dropdown box
      - Select 5.0-4.1 in the To: dropdown box
      - Click 'Show' to generate an HTML table or View PDF to export the results in a PDF document

      Step one: login

      Provide your credentials and use the form on the left-hand side to log in to access the support portal

      Log into the support portal

      Step two: select the "Fixed bugs" link from the icons on the page

      Select 'Fixed Bugs' to go to the bugtrack list

      Step three: select the release 'range' from the two dropdown lists on the Fixed Bugs page

      Use the Show button to update the page or download the list in PDF format as required

      Select the versions from the 'From' and 'To' lists to generate the report


      In Amazon Web Services, AMIs have unique ids based on their region. There will be many cases when you want to use multiple regions (for example: maintenance of two clusters in separate geographical regions). Below is an example of how to find the list of current AMIs.

      Log in to Amazon Web Services

      Example image showing the AWS Login Page

      Find your MarkLogic instance on Amazon AWS Marketplace

      Example image showing the MarkLogic 8 HVM in Amazon's Marketplace

      For example:

      Click continue

      Example Continue button

      View the table

      Choose the version of MarkLogic Server that you're planning to use from the version dropdown.

      Image of a table showing all AMI IDs available for this item in the AWS Marketplace

      You will see a table containing a list of all current regions and the corresponding AMI ID for our instances for each available region.

      Further reading


      MarkLogic Server has several different features that can help manage data across multiple database instances. Those features differ from each other in several important ways - this article will focus on high-level distinctions and will provide pointers to other materials to help you decide which of these features could work best for your particular use case.


      Backup/Restore - database backup and restore operations in MarkLogic Server provide consistent database-level views of your data. Propagating data from one instance to another via backup/restore involves a MarkLogic administrator using a completed backup from the source instance as the restore archive on the destination instance. You can read more about Backup/Restore here:

      Flexible Replication - can be used to maintains copies of data on multiple MarkLogic Servers. Unlike backup/restore (which relies on taking a consistent, database level view of the data at a particular timestamp), Flexible Replication creates a copy of a document in another database and keeps that copy in sync (possibly with some time-lag/latency) with the original in the course of normal operations. You can read more about Flexible Replication here: Do note that:

      • Flexible Replication is asynchronous. Asynchronous Replication refers to a configuration in which the Master does not wait for confirmation that the update has been received by the Replica before sending further updates.
      • Flexible Replication does not use the same transaction boundaries on the replica as on the master. For example, 10 documents might be inserted in a single transaction on a Flexible Replication master. Those 10 documents will eventually be inserted on a Flexible Replication replica, but there is no guarantee that the replica instance will also use a single transaction to do so.

      Database Replication - is used maintains copies of data on multiple MarkLogic Servers. Database Replication creates a copy of a document in another database and keeps that copy in sync (possibly with some time-lag/latency) with the original in the course of normal operations. You can read more about Database Replication here: Note that:

      a. Database Replication is, like Flexible Replication, asynchronous.

      b. In contrast to Fleixble Replication, Database Replication operates by copying journal frames from the Master database and replays the transactions described by those journal frames on the foreign Replica database.

      XA Transactions - MarkLogic Server can participate in distributed transactions by acting as a Resource Manager in an XA/JTA transaction. If there are multiple MarkLogic Server instances participating as XA resources in a given XA transaction, then it's possible to use that XA transaction as a synchronized means of replicating data across those multiple MarkLogic instances. You can read more about XA Transactions in MarkLogic Server here:


      Upgrading individual MarkLogic instances and clusters is generally very easy to do and in most cases requires very little downtime. In most cases, shutting down the MarkLogic instance on each host in turn, uninstalling the current release, installing the updated release and restarting each MarkLogic instance should be all you need to be concerned about...

      However, unanticipated problems do sometimes come to light and the purpose of this Knowledgebase article is to offer some practical advice as to the steps you can take to ensure the process goes as easily as possible - this is particularly important if you're planning an upgrade between major releases of the product.


      While the steps outlined under the process heading below offer practical advice as to what to do to ensure your data is safeguarded (by recommending that backups are taken prior to upgrading), another very useful step would be to ensure you have your current configuration files backed up.

      Each host in a MarkLogic cluster is configured using parameters which are stored in XML Documents that are available on each host. These are usually relatively small files and will zip up to a manageable size.

      If you cd to your "Data" directory (on Linux this is /var/opt/MarkLogic; on Windows this is C:\Program Files\MarkLogic\Data and on OS X this is /Users/{username}/Library/Application Support/MarkLogic), you should see several xml files (assignments, clusters, databases, groups, hosts, server).

      Whenever MarkLogic updates any of these files, it creates a backup using the same naming convention used for older ErrorLog files (_1, _2 etc). We recommend backing up all configuration files before following the steps under the next heading.


      1) Take a backup for each database in your cluster

      2) Turn reindexing off for each database in your cluster

      3) Starting with the node hosting your Security and Schemas forests, uninstall the current maintenance release MarkLogic version on your cluster, then install the latest maintenance release in that feature release (for example, if you're currently running version 10.0-2, you'll want to update to the latest available MarkLogic 10 maintenance release - at the time of this writing, it is 10.0-4).

      4) Start up the host in your cluster hosting your Security and Schemas forests, then the remaining hosts in the cluster.

      5) Access the Admin UI on the node hosting your Security and Schemas forests and accept the license agreement, either for just that host (Accept button) or for all of the hosts in the cluster (Accept for Cluster button). If you choose the Accept for Cluster button, a summary screen appears showing all of the hosts in the cluster. Click the Accept for Cluster button to confirm acceptance (all of the hosts must be started in order to accept for the cluster). If you accepted the license just for the one host in the previous step, you must go to all of the Admin Interface for all of the other hosts and accept the license for each host before each host can operate.

      6) If you're upgrading across feature releases, you may now repeat steps #3-5 until you reach the desired feature and maintenance release on your cluster (for example, if trying to upgrade from MarkLogic 8 to MarkLogic 10,  after installing 8.0-latest, you'll repeat steps 3-5 for version 9.0-latest).

      7) After you've finished upgrading across all the relevant feature releases, re-enable reindexing for each database in your cluster.

      For more details, please go through Section  “Upgrading a Cluster to a New Maintenance Release of MarkLogic Server” of “Scalability, Availability, and Failover” guide.

      If you've got database replication in place across both a master and replica cluster, then be aware that:

      1) You do not need to break replication between the clusters

      2) You should plan to upgrade both the master cluster and replica cluster. If you upgrade just the master, connectivity between the two clusters will stop due to different XDQP versions. 

      3) If the Security database isn't replicated, then there shouldn't be anything special you need to do other than upgrade the two clusters.

      4) If the security database is replicated, do the following:

      • Upgrade the Replica cluster and run the upgrade scripts. This will update the Replica's Security database to indicate that it is current. It will also do any necessary configuration upgrades.
      • Upgrade the Master cluster and run the upgrade scripts. This will update the Master's Security database to indicate that it is current. It will also do any necessary configuration upgrades.

      For more here Updating Clusters Configured with Database Replication

      Back-out Plan

      MarkLogic does not support restoring a backup made on a newer version of MarkLogic Server onto an older version of MarkLogic Server. Your Back-out plan will need to take this into consideration.

      See the section below for recommendations on how this should be handled.

      Further reading

      Backing out of your upgrade: steps to ensure you can downgrade in an emergency

      Product release notes

      The "Upgrade Support" section of the release notes.

      All known incompatibilities between releases

      The "Upgrading from previous releases" section of the documentation

      MarkLogic Support Fixed Bug List


      spell:suggest() and spell:suggest-detailed aren't simply looking for character differences between the provided strings and the strings in your dictionaries - they're also factoring in differences in the resulting phonetics represented by these strings.


      There is an undocumented option that can be passed along to increase the phonetic-distance threshold (which is 1, by default). For example, consider the following:

      xquery version "1.0-ml";

      spell:suggest-detailed(('customDictionary.xml'),'acknowledgment', <options xmlns=""> <phonetic-distance>2</phonetic-distance> </options> )


      <spell:suggestion original="acknowledgment"
      xmlns:spell=""> <spell:word distance="9" key-distance="2" word-distance="45"
      levenshtein-distance="1">acknowledgement</spell:word> </spell:suggestion>

      Note that the option "distance-threshold" corresponds to "distance" in the result, and "phonetic-distance" corresponds to "key-distance."

      Also note that increasing the phonetic-distance may cause spell:suggest() and spell:suggest-detailed() to use significantly more CPU. Metaphones are short keys, so a larger distance may match a very large fraction of the dictionary, which would then mean each of those matches would need to be checked in the distance algorithms.


      A database consists of one or more forests. A forest is a collection of documents (mostly XML trees, thus the name), implemented as a physical directory on disk. Each forest holds a set of documents and all their indexes. 

      When a new document is loaded into MarkLogic Server, the server puts this document in an in-memory stand and writes the action to an on-disk journal to maintain transactional integrity in case of system failure. After enough documents are loaded, the in-memory stand will fill up and be flushed to disk, written out as an on-disk stand. As more document are loaded, they go into a new in-memory stand. At some point this in-memory stand fills up as well, and the in-memory stand gets written as yet another new on-disk stand.

      To read a single term list, MarkLogic must read the term list data from each individual stand and unify the results. To keep the number of stands to a manageable level where that unification isn't a performance concern, MarkLogic runs merges in the background. A merge takes some of the stands on disk and creates a new singular stand out of them, coalescing and optimizing the indexes and data, as well as removing any previously deleted fragments
      Each forest has its own in-memory stand and set of on-disk stands. Loading and indexing content is a largely parallelizable activity so splitting the loading effort across forests and potentially across machines in a cluster can help scale the ingestion work.

      Deletions and Multi-Version Concurrency Control (MVCC)

      What happens if you delete or change a document? If you delete a document, MarkLogic marks the document as deleted but does not immediately remove it from disk. The deleted document will be removed from query results based on its deletion markings, and the next merge of the stand holding the document will bypass the deleted document when writing the new stand. MarkLogic treats any changed document like a new document, and treats the old version like a deleted document.

      This approach is known in database circles as which stands for Multi-Version Concurrency Control (or MVCC).
      In an MVCC system changes are tracked with a timestamp number which increments for each transaction as the database changes. Each fragment gets its own creation-time (the timestamp at which it was created) and deletion-time (the timestamp at which it was marked as deleted, starting at infinity for fragments not yet deleted).

      For a request that doesn't modify data the system gets a performance boost by skipping the need for any URI locking. The query is viewed as running at a certain timestamp, and throughout its life it sees a consistent view of the database at that timestamp, even as other (update) requests continue forward and change the data.

      Updates and Deadlocks

      An update request, because it isn't read-only, has to use read/write locks to maintain system integrity while making changes. Read-locks block for write-locks; write-locks block for both read and write-locks. An update has to obtain a read-lock before reading a document and a write-lock before changing (adding, deleting, modifying) a document. Lock acquisition is ordered, first-come first-served, and locks are released automatically at the end of a request.

      In any lock-based system you have to worry about deadlocks, where two or more updates are stalled waiting on locks held by the other. In MarkLogic deadlocks are automatically detected with a background thread. When the deadlock happens on the same host in a cluster, the update farthest along (with the most locks) wins and the other update gets restarted. When it happens on different hosts, because lock count information isn't in the wire protocol, both updates start over. MarkLogic differentiates queries from updates using static analysis. Before running a request, it looks at the code to determine if it includes any calls to update functions. If so, it's an update. If not, it's a query. Even if at execution time the update doesn't actually invoke the updating function, it still runs as an update.

      For the most part it's not under the control of the user. The one exception is there's an xdmp:lock-for-update($uri) call that requests a write-lock on a document URI, without actually having to issue a write and in fact without the URI even having to exist.

      When a request potentially touches millions of documents (such as sorting a large data set to find the most recent items), a query request that runs lock-free will outperform an update request that needs to acquire read-locks and writelocks. In some cases you can speed up the query work by isolating the update work to its own transactional context. This technique only works if the update doesn't have a dependency on the outer query, but that turns out to be a common case. For example, let's say you want to execute a content search and record the user's search string to the database for tracking purposes. The database update doesn't need to be in the same transactional context as the search itself, and would slow things down if it were. In this case it's better to run the search in one context (read-only and lock-free) and the update in a different context. See the xdmp:eval() and xdmp:invoke() functions for documentation on how to invoke a request from within another request and manage the transactional contexts between the two.

      Document Lifecycle

      Let's track the lifecycle of a document from first load to deletion until the eventual removal from disk. A document load request acquires a write-lock for the target URI as part of the xdmp:document-load() function call. If any other request is already doing a write to the same URI, our load will block for it, and vice versa. At some point, when the full update request completes successfully (without any errors that would implicitly cause a rollback), the actual insertion work begins, processing the queue of update work orders. MarkLogic starts by parsing and indexing the document contents, converting the document from XML to a compressed binary fragment representation. The fragment gets added to the in-memory stand. At this point the fragment is considered a nascent fragment, a term you'll see sometimes on the administration console status pages. Being nascent means it exists in a stand but hasn't been fully committed. (On a technical level, nascent fragments have creation and deletion timestamps both set to infinity, so they can be managed by the system while not appearing in queries prematurely.) If you're doing a large transactional insert you'll accumulate a lot of nascent fragments while the documents are being processed. They stay nascent until they've been committed. Once the fragment is placed into the in-memory stand, the request is ready to commit. It obtains the next timestamp value, journals its intent to commit the transaction, and then makes the fragment available by setting the creation timestamp for the new fragment to the transaction's timestamp. At this point it's a durable transaction, replayable in event of server failure, and it's available to any new queries that run at this timestamp or later, as well as any updates from this point forward (even those in progress). As the request terminates, the write-lock gets released.

      Our document lives for a time in the in-memory stand, fully queryable and durable, until at some point the in-memory stand fills up and gets written to disk. Our document is now in an on-disk stand. Sometime later, based on merge algorithms, the on-disk stand will get merged with some other on-disk stands to produce a new on-disk stand. The fragment will be carried over, its tree data and indexes incorporated into the larger stand. This might happen several times.

      At some point a new request makes a change to the document, such as with an xdmp:node-replace() call. The request making the change first obtains a read-lock on the URI when it first accesses the document, then promotes the read-lock to a write-lock when executing the xdmp:node-replace() call. If another write-lock were already present on the URI from another executing update, the read-lock would have blocked until the other write-lock released. If another read-lock were already present, the lock promotion to a write-lock would have blocked. Assuming the update request finishes successfully, the work runs similar to before: parsing and indexing the document, writing it to the in-memory stand as a nascent fragment, acquiring a timestamp, journaling the work, and setting the creation timestamp to make the fragment live. Because it's an update, it has to mark the old fragment as deleted also, and does that by setting the deletion timestamp of the original fragment to the transaction timestamp. This combination effectively replaces the old fragment with the new. When the request concludes, it releases its locks. Our document is now deleted, replaced by the new version.

      The old fragment still exists on disk, of course. In fact, any query that was already in progress before the update incremented the timestamp, or any query doing time travel with an old timestamp, can still see it. Eventually the on-disk stand holding the fragment will be merged again, at which point the old fragment will be completely removed from the system. It won't be written into the new on-disk stand. That is, unless the administration "merge timestamp" was set to allow deep time travel. In that case it will live on, sticking around in case any new queries want to time travel to see old fragments.


      The following article explains the way in-memory caches are used by MarkLogic Server and how can they be utilized to improve query execution.



      MarkLogic Server provides several caches that are used to improve the performance during query execution. When a query executes for the first time, the Server will populate these caches to store termlist and data fragments in memory.

      MarkLogic Server keeps a lot of its configuration information in databases, and has a lot of caches to make it run faster, but those caches get populated the first time things are accessed. The server also uses book-keeping terms in the indexes to keep track of whether all documents have been indexed with the current settings. MarkLogic caches this information, but has to query the indexes on the first request to warm the cache.

      The in-memory cache in MarkLogic Server holds data that was recently added to the system and is still in an in-memory stand; that is, it holds data that has not yet been written to disk.

      For updates, if there is no in-memory stand on a forest when a new document is inserted, the server will create it. This stand is big enough for thousands of documents, but the cost of creating it will be seen in the time taken for the first document added to it.


      How will the in-memory cache help improve query execution

      When a query is executed, the in-memory data structures like range indexes and lexicons get pinned into RAM the first time they are used.  The easiest way to speed things up is to "warm the caches” by running a small sample program that exercises the type-ahead prior to starting production. You can also keep the server warm by doing a non-time-critical stub update at time intervals (every 30 sec to 1 minute). If the server is idle, then it will serve to keep caches and in-memory stand warm. If the server is really busy then it would only take a small amount of extra work. Once this is done, the functionality will be fast for all users in all future sessions.


      MarkLogic does not recommend having more than one forest for the Security database.

      The Security database is typically fairly small and there is no reason to have more than one forest for the Security database. Having more than one Security forest causes additional complexity during failover events, server upgrades, and restarts. A functioning Security database is critical to the stability of a MarkLogic Cluster and it is easier to recover from a host failure if the Security database is configured with only a single forest and a single replica forest. 

      In terms of high availability and forest failover, one local disk failover forest should be configured. In terms of database replication, a replica forest in the replica cluster should be configured.

      If you have more than one Security forest(s):

      We have seen incidents where customers attached more than one Security forest either intentionally or inadvertently (scripting bug or user error) and run into issues while detaching them.

      When the database rebalancer is enabled for the database (default setting) and when a new forest is attached, the database will automatically redistribute the content across all attached forests. Problems can then arise when security forests are detached without preserving their content. This is true for any database, but is problematic when dealing with the Security database. 

      When a Security database forest is detached without first retiring it (and verifying documents are moved out of it), some Security documents will be removed from the database. This may lead to users being locked out of the cluster or render the cluster unusable.  If this occurs on your MarkLogic cluster, please contact MarkLogic Support to help with the repair.

      Best Practice

      • Do not configure more than one forest for any system database, including the Security database.
      • If you have multiple forests in your Security database, and need to come back in line with our one forest recommendation
        • Retire the extra Security database forests;
        • Verify all extra forests are drained of content (zero documents / zero fragments);
        • Detach the extra forests.
      • Once your cluster is in line with our one forest recommendation, disable the rebalancer for the Security database.
      • Configure a single replica forest to achieve high availability.

      Further reading

      Administering Security in MarkLogic

      Database Rebalancing in MarkLogic

      Restoring Security Database

      Security Database restore leading to lingering Certificate Template id in Config files

      The target for range indexes in a MarkLogic database should be about 100. This is because:

      • In the interests of performance, MarkLogic Server indexes your content on ingest, then memory maps those indexes to serialized data structures on disk. Each of those memory maps requires some amount of RAM.
        • If you've got many thousands of indexes you may run into a situation where system monitoring is reporting you've got RAM to spare, but MarkLogic Server is reporting "SVC-MAPINI: Mapped file initialization error." In which case you're likely running up against Linux's default vm.max_map_count value.
        • Independent of SVC-MAPINI errors, the more range indexes you've configured, the longer it will take to perform forest operations.
      • If you find yourself configuring many hundreds or even thousands of range indexes, you should migrate your data modeling scheme to take advantage of Template Driven Extraction (TDE), which was specifically engineered to address this scenario.

      Additional Reading:


      This Knowledgebase article is a general guideline for backups using the journal archiving feature for both free space requirements and expected file sizes written to the archive journaling repository when archive journaling is enabled and active.

      The MarkLogic environment used here was an out-of-the box version 9.x with one change of adding a new directory specific to storing the archive journal backup files.

      It is assumed that the reader of this article already has a basic understanding of the role of Journal Archiving in the Backup and Restore feature of MarkLogic Server. See references below for further details(below).

      How much free space is needed for the Archive Journal files in a backup?

      MarkLogic Server uses the forest size of the active forest to confirm whether the journal archive repository has enough free space to accommodate that forest, but if additional forests already exist on the same volume, then there may be an issue in the Server's "free-space" calculation as the other forests are never used in the algorithm that calculates the free space available for the backup and/or archive journal repositories. Only one forest is used in the free-space calculation.

      In other words, if multiple forests exist on the same volume, there may not be enough free space available on that specific volume due to the additional forests; especially during a high rate of ingestion. If that is the case, then it is advised to provide enough free space on that volume to accommodate the sizes of all the forests. Required Free Space(approximately) = (Number of Forests) x (Size of largest Forest).

      What can we expect to see in the journal archiving repository in terms of files sizes for specific ingestion types and sizes? That brings us to the other side.

      How is the Journal Archive repository filling up?

      1 MByte of raw XML data loaded into the server (as either a new document ingestion or a document update) will result in approximately 5 to 6 MBytes of data being written to the corresponding Journal Archive files.  Additionally, adding Range Indexes will contribute to a relatively small increase in consumed space.

      Ingesting/updating RDF data results in slightly less data being written to the journal archive files.

      In conclusion, for both new document ingestion and document updates, the typical expansion ratio of Journal Archive size to Input file size is between 5 an 6 but can be higher than that depending on the document structure and any added range indexes.



      Content processing applications often require multi-step processing. Each step in the process performs a particular task or set of tasks. The Content Processing Framework in MarkLogic Server supports these types of multi-step conversion processes. Sometimes during document delete operation, it is possible that the CPF action might fail with 'XDMP-CONFLICTINGUPDATES' error, which can be seen in document-properties file like:

      Sample message:

      <error:format-string>XDMP-CONFLICTINGUPDATES: xdmp:document-set-property("FILE-NAME", <cpf:state xmlns:cpf=""></cpf:state>) -- Conflicting updates xdmp:document-set-property("FILE-NAME", /cpf:state) and xdmp:document-delete("FILE-NAME")</error:format-string>

      This error message indicates that an update statement (for e.g. xdmp:document-set-property) is trying to update a document that is conflicting with other update occurring (e.g. xdmp:document-delete) in the same transaction.



      Actions that want to delete the target URI need special handling because MarkLogic CPF also wants to keep track of progress in the properties, and just having document-delete [ xdmp:document-delete($cpf:document-uri) ]can't do that.

      Following are ways to achieve the expected behavior and get past the XDMP-CONFLICTINGUPDATES error:

      1) Performing a "soft delete" on the document and then let CPF take care of deleting the document. This can be done by setting the document status to "deleted" via cpf:document-set-processing-status API function. Setting the document's processing status to "deleted" will tell CPF to clean up the document and not update properties at the same time.

      cpf:document-set-processing-status( $uri-to-delete, "deleted" )

      Additional details can be found at:

      2) If you want to keep a record of the URI that is being deleted, you can delete its root node instead of the document. The CPF state will be able be recorded in document-properties, even if the document is gone.


      Details at:


      Sometimes, when a host is removed from a cluster in an improper manner -- e.g., by some means other than the Admin UI or Admin API, a remote host can still try to communicate with its old cluster, but the cluster will recognize it as a "foreign IP" and will log a message like the one below:

      2014-12-16 00:00:20.228 Warning: XDQPServerConnection::init( SVC-SOCRECV: Socket receive error: wait Timeout


      XDQP is the internal protocol that MarkLogic uses for internal communications amongst the hosts in a cluster and it uses port 7999 by default. In this message, the local host is receiveng socket connections from foreign host


      Debugging Procedure, Step 1

      To find out if this message indicates a socket connection from an IP address that is not part of the cluster, the first place is to look is in the hosts.xml files. If the IP address in not found in the hosts.xml, then it is a foreign IP. In that case, the following are the steps will help to identify the the processes that are listening on port 7999.


      Debugging Procedure, Step 2

      To find out who is listening on XDQP ports, try running the following command in a shell window on each host:

            $ sudo netstat -tulpn | grep 7999

      You should only see MarkLogic as a listner:

           tcp 0 0* LISTEN 1605/MarkLogic

      If you see any other process listening on 7999, yopu have found your culprit. Shot down those processes and the messages will go away.


      Debugging Procedure, Step 3

      If the issue persists, run tcpdump to trace packets to/from "foreign" hosts using the following command:

           tcpdump -n host {unrecognized IP}

      Shutdown MarkLogic on those hosts. Also, shutdown any other applications that are using port 7999.


      Debugging Procedure, Step 4

      If the cluster are hosts on AWS, you may also want to check on your Elastic Load Balancer ports. This may be tricky, because instances will change IP addresses if they are rebooted, so  work with AWS Support to help you find the AMI or load balancer instance that is pinging your cluster.

      In the case that the "foreign host" is an elastic load balancer, be sure to remove port 7999 from its rotation/scheduler. In addition, you should set the load balancer to use port 7997 for the heartbeat functionality.


      Sometimes, when a cluster is under heavy load, your cluster may show a lot of XDQP-TIMEOUT messages in the error log. Often, a subset of hosts in the cluster may become so busy that the forests they host get unmounted and remounted repeatedly. Depending on your database and group settings, the act of remounting a forest may be very time-consuming, due to the fact that that all hosts in the cluster are being forced to do extra work of index detection.

      Forest Remounts

      Every time a forest remounts, the error log will show a lot messages like these:

      2012-08-27 06:50:33.146 Debug: Detecting indexes for database my-schemas
      2012-08-27 06:50:33.146 Debug: Detecting indexes for database Triggers
      2012-08-27 06:50:35.370 Debug: Detected indexes for database Last-Login: sln
      2012-08-27 06:50:35.370 Debug: Detected indexes for database Triggers: sln
      2012-08-27 06:50:35.370 Debug: Detected indexes for database Schemas: sln
      2012-08-27 06:50:35.370 Debug: Detected indexes for database Modules: sln
      2012-08-27 06:50:35.373 Debug: Detected indexes for database Security: sln
      2012-08-27 06:50:35.485 Debug: Detected indexes for database my-modules: sln
      2012-08-27 06:50:35.773 Debug: Detected indexes for database App-Services: sln
      2012-08-27 06:50:35.773 Debug: Detected indexes for database Fab: sln
      2012-08-27 06:50:35.805 Debug: Detected indexes for database Documents: ss, fp

      ... and so on ...

      This can go on for several minutes and will cost you more down time than necessary, since you already know the indexes for each database.

      Improving the situation

      Here are some suggestions for improving this situation:

      1. Browse to Admin UI -> Databases -> my-database-name
      2. Set ‘index detection’ to ‘none’
      3. Set ‘expunge locks’ to ‘none’

      Repeat steps 1-4 for all active databases.

      Now tweak the group settings to make the cluster less sensitive to an occasional busy host:

      1. Browse to Admin UI -> Groups -> E-Nodes
      2. Set ‘xdqp timeout’ to 30
      3. Set ‘host timeout’ to 90
      4. Click OK to make this change effective.

      The database-level changes tell the server to speed up cluster startup time when a server node is perceived to be offline. The group changes will cause the hosts on that group to be a little more forgiving before declaring a host to be offline, thus preventing forest unmounting when it's not really needed.

      If after performing these changes, you find that you are still experiencing XDQP-TIMEOUT's, the next step is to contact MarkLogic Support for assistance. You should also alert your Development team, in case there is a stray query that is causing the data nodes to gather too many results.

      Related Reading

      XML Data Query Protocol (XDQP)


      Under normal operations, only a single user object is created for a user-name. However, when users are migrated from another security database and if the recommend checking is not performed, duplicate user-names might be created.


      When there are duplicate user-names in the database, you may see the following message on the Admin UI or in the error logs:

      500: Internal Server Error
      XDMP-AS: (err:XPTY0004) get-element($col, "sec:user", "sec:user-name", $user-name, "SEC-USERDNE") -- Invalid coercion: (fn:doc("*******")/sec:user, fn:doc("*******")/sec:user) as element()?


      To fix duplicate user-names, the extra security object that is created needs to be removed. You can delete one of the extra security objects, which should have a URI similar to:******* where "*******" represents the user-id's.


      To resolve the issue, follow the below steps:

      1. Perform a backup of your Security database in case manual recovery is required.

      2. Login to the QConsole with admin credentials.

      3. Select "Security" database as the content-source

      4. Delete the security object by executing xdmp:document-delete($uri) with $uri set to the Uri of the duplicate user.


      When configuring a server to add a foreign cluster you may encounter the following error:

      Host does not match origin or inferred origin, or is otherwise untrusted.

      This error will typically occur when using MarkLogic Server versions prior to 10.0-6, in combination with Chrome versions newer than 84.x.

      Our recommendation to resolve this issue is to upgrade to MarkLogic Server 10.0-6 or newer. If that is not an option, then using a different browser, such as Mozilla Firefox, or downgrading to Chrome version 84.x may also resolve the error.

      Changes to Chrome

      Starting in version 85.x of Chrome, there was a change made to the default Referrer-Policy, which is what causes the error. The old default was no-referrer-when-downgrade, and the new value is strict-origin-when-cross-origin. When no policy is set, the browser's default setting is used. Websites are able to set their own policy, but it is common practice for websites to defer to the browser's default setting.

      A more detailed description can be found at


      For hosts that don't use a standard US locale (en_US) there are instances where some lower level calls will return data that cannot be parsed by MarkLogic Server. An example of this is shown with a host configured with a different locale when making a call to the Cluster Status page (cluster-status.xqy):


      The problem

      The problem you have encountered is a known issue: MarkLogic Server uses a call to strtof() to parse the values as floats:

      Unfortunately, this uses a locale-specific decimal point. The issue in this environment is likely due to the Operating System using a numeric locale where the decimal point is a comma, rather then a period.

      Resolving the issue

      The workaround for this is as follows:

      1. Create a file called /etc/marklogic.conf (unless one already exists)

      2. Add the following line to /etc/marklogic.conf:

      export LC_NUMERIC=en_US.UTF-8

      After this is done, you can restart the MarkLogic process so the change is detected and try to access the cluster status again.


      This Knowledgebase article outlines the necessary steps required in importing an existing (pre-signed) Certificate into MarkLogic Server and configuring a MarkLogic Application Server to utilize that certificate.

      Existing (Pre-signed) Certificate vs. Certificate Request Generated by MarkLogic

      MarkLogic will allow you to use an existing certificate or will allow you to generate a Certificate Request. The key difference between above two lies in who generates public-private keys and other fields in the certificate.

      For a Pre-Signed Certificate: In this instance, the keys already exist outside of MarkLogic Server, and 3rd party tool would have populated CN (Common Name) and other subject fields to generate Certificate Request File (.csr) containing a public key.

      For a Certificate Request Generated by MarkLogic: In this instance, new keys are generated by MarkLogic Server (it does this while creating the new template), while CN and other fields are added by the MarkLogic Server Administrator (or user) through the web-based MarkLogic admin GUI during New Certificate Template creation.

      The section in MarkLogic's online documentation on Creating a Certificate Template covers the steps required to generate a certificate template from within MarkLogic Server:


      Steps to Import Pre-Signed Certificate and Key into MarkLogic

      1) Create a Certificate Template 

      Create a new Certificate Template with the fields similar to your existing Pre-Signed Certificate

      For example, your current Certificate file -

      [amistry@engrlab18-128-026 PreSignedCert]$ openssl x509 -in ML.pem -text 
              Version: 1 (0x0)
              Serial Number: 7 (0x7)
          Signature Algorithm: sha1WithRSAEncryption
              Issuer: C=US, ST=CA, L=San Carlos, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic CA
                  Not Before: Nov 30 04:12:33 2015 GMT
                  Not After : Nov 29 04:12:33 2017 GMT
              Subject: C=US, ST=NJ, L=Princeton, O=DemoLab Corporation, OU=Engineering,
              Subject Public Key Info:
                  Public Key Algorithm: rsaEncryption
                      Public-Key: (1024 bit)
      For above Certificate we will create below Custom Template in Admin GUI -> Configure-> Security -> Certificate Template  Create Tab as below.
      We will save our new template as - "DemoLab Corporation Template"

      Note - Above fields are placeholders only for signed Certificate, and MarkLogic mainly uses above fields to generate Certificate Signing Request (.csr). For Certificate request generated by 3rd party tool, it does NOT matter if template field matches exactly with final signed Certificate or not.

      Once we have Signed Certificate imported, App Server will use the Signed Certificate, and the SSL Client will only see field values from the Signed Certificate (even if they are different from Template Config page ).

      2) Create an HTTPS App Server

      Please follow Procedures for Enabling SSL on App Servers except for the "Creating Certificate Template" part as we have created the Template to match our existing pre-signed Certificate. 

      3) Verify Pre-signed Certificate and Private Key file 

      Prior to installing a pre-signed certificate and private key the following verification should be performed to ensure that both certificate and key are valid and are in the correct format. 

      * Generate and display the certificate checksum using the OpenSSL utility

      [admin@sitea ~]# openssl x509 -noout -modulus -in cert.pem | openssl md5

      (stdin)= 2ddd2ca48ad2eb4eba082f5da3fd33ab

      * Generate and display the private key checksum

      [admin@siteaa ~]# openssl rsa -noout -modulus -in key.key | openssl md5

      (stdin)= 2ddd2ca48ad2eb4eba082f5da3fd33ab

      The checksum from both commands should return identical values, if the values do not match or if you are prompted for additional information such as the private key password then the certificate and private keys are not valid and should be corrected before proceeding.

      Note: Proceeding to the next step without verifying the certificate and the private key could lead to the MarkLogic server being made inaccessible. 

      4) Install Pre-signed Certificate and Key file to Certificate Template using Query Console

      Now since Certificate was pre-signed, MarkLogic does not have a key that goes along with that Pre-signed Certificate. We will install Pre-signed Certificate and Key into MarkLogic using below XQuery in Query Console.

      Note: Query Must be run against Security Database. 

      Please change the Certificate Template-Name, and Certificate/Key File location in below XQuery to reflect values from your environment.

      xquery version "1.0-ml";
      import module namespace pki = "" at "/MarkLogic/pki.xqy";
      import module namespace admin = "" at "/MarkLogic/admin.xqy";
      (: Update Template name for your environment :)
      let $templateid := pki:template-get-id(pki:get-template-by-name("TemplateName"))
      (: Path on the MarkLogic host that is readable by the MarkLogic server process (default daemon) :)
      (:   File suffix could also be .txt or other format :)
      let $path-to-cert := "/cert.pem"
      let $path-to-key := "/key.key"
          <options xmlns="xdmp:document-get"><format>text</format></options>),
          <options xmlns="xdmp:document-get"><format>text</format></options>)

       Above will associate our pre-signed Certificate and Key into Template created earlier, which is linked to HTTPS App Server.

      Important note: pki:insert-trusted-certificates can also be used in place of pki:insert-host-certificate in the above example.


      This article discusses the effects of the incremental backup implementation on Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO).


      With MarkLogic 8 you can have multiple daily incremental backups with minimal impact on database performance.

      Incrementals complete more quickly than full backups reducing the backup window. A smaller backup window enables more frequent backups, reducing the RPO of the database in case of disaster.

      However, RTO can be longer when using incremental backups compared to just full backups, because multiple backups must be restored to recover.

      There are two modes of operation when using incremental backups:

      Incremental since last full. Here, each incremental has to store all the data that has changed since the last full backup. Since a restore only has to go through a single incremental data set, the server is able to perform a faster restore.  However, each incremental data set is bigger and takes longer to complete than the previous data set because it stores all changes that were included in the previous incremental.

      Please note when doing “Incremental since last full”:-

      - Create a new incremental backup directory for each incremental backup

      - Call database-incremental-backup with incremental-dir set to the new incremental backup directory


      Incremental since last incremental.  In this case, a new incremental stores only changes since the last incremental, also known as delta backups. By storing only the changes since the last incremental, the incremental backup sets are smaller in size and are faster to complete.  However, a restore operation would have to go through multiple data sets.

      Please note when doing “Incremental since last incremental”:-

      - Create an incremental backup directory ONCE

      - Call database-incremental-backup with the same incremental backup directory.

      See also the documentation on Incremental Backup.



      Indexing Best Practices

      MarkLogic Server indexes records (or documents/fragments) on ingest. When a database's index configuration is changed, the server will consequently reindex all matching records.

      Indexing and reindexing can be a CPU and I/O intensive operation. Reindexing creates a lot of new fragments, with the original fragments being marked for deletion. These deleted fragments will then need to be merged out. All of this activity can potentially affect query performance, especially in systems with under-provisioned hardware.

      Reindexing in Production

      If you need to add or modify an index on a production cluster, consider scheduling the reindex during a time when your cluster is less busy. If your database is too large to completely reindex during a single period of low usage, consider running the reindex over several periods of time. For example, if your low usage period is during a weekend, the process may look like:

      • Change your index configuration on a Friday night
      • Let the reindex run for most of the weekend
      • To pause the reindex, set the reindexer-enable field to 'false' for the database being reindexed. Be sure to allow sufficient time for the associated merging to complete before system load comes back.
      • If needed, reindexing can continue over the next weekend - the reindexer process will pick up where it left off before it was disabled.

      You can refer to for more details on invoking reindexing on production.

            When you have Database Replication Configured

      If you have to add or modify indexes on a database which has database replication configured, make sure the same changes are made on the Replica cluster as  well. Starting with ML server version 9.0-7, index data is also replicated from the Master to the Replica, but it does not automatically check if both sides have the same index settings. Reindexing is disabled by default on a replica cluster. However, when database replication configuration is removed (such as after a disaster),  the replica database will reindex as necessary. So it is important that the Replica database index configuration matches the Master’s to avoid unnecessary reindexing.

      Further reading -

      Master and Replica Database Index Settings

      Database Replication - Indexing on Replica Explained

      Avoid Unused Range Indexes, Fields, and Path Indexes

      In addition to taking up extra disk space, Range, Field, and Path Indexes require extra work when it's time to reindex. Field and Path indexes may also require extra indexing passes.

      Avoid Using Namespaces to Implement Multi-Tenancy

      It's a common use case to want to create some kind of partition (or multiple partitions) between documents in a particular database. In such a scenario it's far better to 1) constrain the partitioning information to a particular element in a document (then include a clause over that element in your searches), than it is to 2) attempt to manage partitions via unique element namespaces corresponding to each partition. For example, given two documents in two different partitions, you'll want them to look like this:

      1a. <doc><partition>partition1</partition><name>Joe Smith</name></doc>

      1b. <doc><partition>partition2</partition><name>John Smith</name></doc>

      ...vs. something like this:

      2a. <doc xmlns:p="http://partition1"><p:name>Joe Smith</p:name></doc>

      2b. <doc xmlns:p="http://partition2"><p:name>John Smith</p:name></doc>

      Why is #1 better? In terms of searching the data once it's indexed, there's actually not much of a difference - one could easily create searches to accommodate both approaches. The issue is how the indexing works in practice. MarkLogic Server indexes all content on ingest. In scenario #2, every time a new partition is created, a new range element index needs to defined in the Admin UI, which means your index settings have changed, which means the server now needs to reindex all of your content - not just the documents corresponding to the newly introduced partition. In contrast, for scenario #1, all that would need to be done is to ingest the documents corresponding to the new partition, which would then be indexed just like all the other existing content. There would be a need, however, to change the searches in scenario #1, as they would not yet include a clause to accommodate the new partition (for example: cts:element-value-query(xs:QName("partition"), "partition2")) - but the overall impact of adding a partition is changing the searches in scenario #1, which is ultimately far, far less intrusive a change than reindexing your entire database as would be required in scenario #2. Note that in addition to a database-wide reindex, searches would also need to change in scenario #2, as well.

      Keep an Eye on I/O Throughput

      Reindexing can lead to heavy merge activity and may lead to disk I/O bottlenecks if not managed carefully. If you have a system that is available 24-7 with no downtime window, then you may need to throttle the reindexer in order to keep the disk I/O to a minimum. We suggest the following database settings for reindexing a system that must always remain in use:

      • reindexer-throttle = 3
      • large-size-threshold = 1048576

      You can also adjust the following group settings to help limit background I/O:

      • background-io-limit = 100

      This will limit the background I/O for that group to 100 MB/sec per host across all hosts in that group. This should only be configured if merges are causing problems—it is a way of throttling back the I/O used by the merging process.This is good starting point, and may be increased in increments of 50 if you find that your merges are progressing too slowly.  Proceed with caution as too low of a background IO limit can have negative performance or even catastrophic consequences

      General Recommendations

      In general, your indexing/reindexing and subsequent search experience will be better if you


      MarkLogic Admin GUI is convenient place to deploy the Normal Certificate infrastructure or use the Temporary Certificate generated by MarkLogic. However for certain advance solutions/deployment we need XQuery based admin operations to configure MarkLogic.

      This knowledgebase discusses the solution to deploy SAN or Wildcard Certificate in 3 node (or more) cluster.


      Certificate Types and MarkLogic Default Config

      Certificate Types

      In general, When browsers connect to a Server using HTTPS, they check to make sure your SSL Certificate matches the host name in the address bar. There are three ways for browsers to find a match:

      a).The host name (in the address bar) exactly matches the Common Name in the certificate's Subject.

      b).The host name matches a Wildcard Common Name. Please find example at end of article. 

      c).The host name is listed in the Subject Alternative Name (SAN) field as part of X509v3 extensions. Please find example at end of article.

      The most common form of SSL name matching is for the SSL client to compare the server name it connected to with the Common Name (CN field) in the server's Certificate. It's a safe bet that all SSL clients will support exact common name matching.

      MarkLogic allows this common scenario (a) to be configured from Admin GUI, and we will discuss the Certificate featuring (b) and (c) deployment further.

      Default Admin GUI based Configuration 

      By default, MarkLogic generates Temporary Certificate for all the nodes in the group for current cluster when Template is assigned to MarkLogic Server ( Exception is when Template assignment is done through XQuery ).

      The Temporary Certificate generated for each node do have hostname as CN field for their respective Temporary Certificate - designed for common Secnario (a).

      We have two path to install CA signed Certificate in MarkLogic

      1) Generate Certificate request, get it signed by CA, import through Admin GUI

      or 2) Generate Certificate request + Private Key outside of MarkLogic, get Certificate request signed by CA, import Signed Cert + Private Key using Admin script

      Problem Scenario

      In both of the above cases, while Installing/importing Signed Certificate, MarkLogic will look to replace Temporary Certificate by comparing CN field of Installed Certificate with Temporary Certificaet CN field.

      Now, if we have WildCard Certificate (b) or SAN Certificate (c), our Signed Certificate's CN field will never match Temporary Certificate CN field, hence MarkLogic will Not remove Temporary Certificates - MarkLogic will continue using Temporary Certificate.



      After installing SAN or wildcard Certificate, we may run into AppServer which still uses Temporary installed Certificate ( which was not replaced while installing SAN/wild-card Certificate).

      Use below XQuery against Security DB to remove all Temporary Certificates. XQuery needs uri lexicon to be enabled (default enabled). [Please change the Certificate Template-Name in below XQuery to reflect values from your environment.] 

      xquery version "1.0-ml";
      import module namespace pki = ""  at "/MarkLogic/pki.xqy";
      import module namespace admin = ""  at "/MarkLogic/admin.xqy";
      let $hostIdList := let $config := admin:get-configuration()
                         return admin:get-host-ids($config)
      for $hostid in $hostIdList
        (: FDQN name matching Certificate CN field value :)
        let $fdqn := ""
        (: Change to your Template Name string :)
        let $templateid := pki:template-get-id(pki:get-template-by-name("YourTemplateName"))
        for $i in cts:uris()
        (   (: locate Cert file with Public Key :)
            and fn:doc($i)//pki:authority=fn:false()
            and fn:doc($i)//pki:host-name=$fdqn
        return <h1> Cert File - {$i} .. inserting host-id {$hostid}
        {xdmp:node-insert-child(doc($i)/pki:certificate, <pki:host-id>{$hostid}</pki:host-id>)}
            (: extract cert-id :)
            let $certid := fn:doc($i)//pki:certificate/pki:certificate-id
            for $j in cts:uris()
                (: locate Cert file with Private key :)
                and fn:doc($j)//pki:certificate-private-key/pki:certificate-id=$certid
            return <h2> Cert Key File - {$j}
        } </h1>

      Above will remove all Temporary Certificates (including Template CA) and their private-key, leaving only Installed Certificate associated with Template, forcing all nodes to use Installed Certificate. 


      Example: SAN (Subject Alternative Name) Certificate

      For 3 node cluster (,,

      $ opensl x509 -in ML.pem -text -noout
              Version: 3 (0x2)
              Serial Number: 9 (0x9)
              Signature Algorithm: sha1WithRSAEncryption
              Issuer: C=US, ST=NY, L=NewYork, O=MarkLogic, OU=Engineering, CN=Support CA
                  Not Before: Apr 20 19:50:51 2016 GMT
                  Not After : Jun  6 19:50:51 2018 GMT
              Subject: C=US, ST=NJ, L=Princeton, O=MarkLogic, OU=Eng,
              Subject Public Key Info:
                  Public Key Algorithm: rsaEncryption
                  RSA Public Key: (1024 bit)
                      Modulus (1024 bit):
                      Exponent: 65537 (0x10001)
              X509v3 extensions:
                  X509v3 Key Usage: 
                      Key Encipherment, Data Encipherment
                  X509v3 Extended Key Usage: 
                      TLS Web Server Authentication
                  X509v3 Subject Alternative Name: 
          Signature Algorithm: sha1WithRSAEncryption

      Example: Wild-Card Certificate

      For 3 node cluster (,, 

      $ openssl x509 -in ML-wildcard.pem -text -noout
              Version: 1 (0x0)
              Serial Number: 7 (0x7)
              Signature Algorithm: sha1WithRSAEncryption
              Issuer: C=US, ST=NY, L=NewYork, O=MarkLogic, OU=Engineering, CN=Support CA
                  Not Before: Apr 24 17:36:09 2016 GMT
                  Not After : Jun 10 17:36:09 2018 GMT
              Subject: C=US, ST=NJ, L=Princeton, O=MarkLogic Corporation, OU=Engineering Support, CN=*


      Okta provides secure identity management and single sign-on to any application, whether in the cloud, on-premises or on a mobile device.

      The following procedure describes the procedure required to integrate MarkLogic with Okta identity management and Microsoft Windows Active Directory using the Okta AD Agent.

      This document assumes that the users accessing MarkLogic are defined in the Windows Active Directory only and do not currently have Okta User Profiles defined.

      Authentication Flow

       The authentication flow in this scenario will be as follows:

      1. The user opens a Browser connection to the Site Single Sign-On Portal page.
      2. The user enters their Active Directory credentials
      3. Okta verifies the user credentials using the Okta LDAP Agent
      4. If successful, the user is presented with a selection of applications they can sign-on to.
      5. The user selects the required application and Okta completes the sign-on using the stored user credentials.


      • MarkLogic Server version 8 or 9
      • Okta Admin account access
      • Okta AD Agent
      • Active Directory Server

      For the purpose of this document the following Active Directory user entry will be used as an example:

      # LDAPv3
      # base <dc=MarkLogic,dc=Local> with scope subtree
      # filter: (sAMAccountName=martin.warnes)
      # requesting: *
      # Martin Warnes, Users, marklogic.local
      dn: CN=Martin Warnes,CN=Users,DC=marklogic,DC=local
      objectClass: top
      objectClass: person
      objectClass: organizationalPerson
      objectClass: user
      cn: Martin Warnes
      sn: Warnes
      givenName: Martin
      distinguishedName: CN=Martin Warnes,CN=Users,DC=marklogic,DC=local
      sAMAccountName: martin.warnes
      memberOf: CN=mladmins,CN=Users,DC=marklogic,DC=local
      sAMAccountType: 805306368
      userPrincipalName: martin.warnes@marklogic.local


      1. By default, Okta uses the email address as the username, however, MarkLogic usernames cannot contain certain special characters such as the @ symbol so the sAMAccountName will be used to sign-on on to MarkLogic. This will be configured later during the Okta Application definition.
      2. One or more memberOf attributes should be assigned to the Active Directory user entry and these will be used to assign MarkLogic Roles without requiring the need to configure duplicate user entries in the MarkLogic security database.

      Step 1. Create a MarkLogic External Security definition

       An External Security definition is required to authenticate and authorize Okta users against a Microsoft Windows Active Directory server.

       Full details on configuring an external security definition can be found at:

       You should ensure that both “authentication” and “authorization” are set to “ldap”, for details on the remaining settings you should consult your Active Directory administrator.

      Step 2. Assign Active Directory group membership to MarkLogic Roles

      In order to assign the correct Roles and Permission to Okta users, you will need to map Active Directory memberOf attributes to MarkLogic rolls.

      In my example Active Directory user entry martin.warnes belongs to the following Group:

       memberOf: CN=mladmins,CN=Users,DC=marklogic,DC=local

      To ensure that all members of this Group are assigned MarkLogic Admin roles you simply need to add the memberOf attribute value as an external name in the admin role as below:

      Step 3. Configure the MarkLogic AppServer

      For each App Server that you wish to integrate with Okta, you will need to set the “authentication” to “basic” and select the “external security” definition.

      As HTTP Basic Authentication is considered insecure it is highly recommended that you secure the AppServer connection using HTTPS by configuring and selecting a “SSL certificate template”.

       Further details on configuring SSL for AppServers can be found at:

      Step 4. Install and Configure Okta AD Integration

      In order for Okta to authenticate your Active Directory users, you will first need to download and install the Okta AD Agent using the following instructions supplied by Okta

       Once installed your Okta Administrator will be able to complete the AD Agent configuration to select which AD users to import into Okta.

      Step 5. Create Okta MarkLogic application

      From the Okta Administrator select “Add Application”, search for the Basic Authentication template and click “Add

      On the “General Settings” tab, enter the MarkLogic AppServer URL, ensure to use HTTP or HTTPS depending on whether you have chosen to secure the listening port using TLS.

       Check the “Browser plugin auto-submit” option.

      On the Sign-On options panel select “Administrator sets username, password is the same as user’s Okta password

       For “Application username format” select “AD SAM Account name” from the drop-down selection.

      Once the Okta application is created you should assign the users permitted to access the application

      When assigning a user, you will be prompted to check the AD Credentials, at this point you should just check that Okta has selected the correct "sAMAccountName" value, the password will not be modifiable.

      Repeat Step 5. for each AppServer you wish to access via the Okta SSO portal.

      Step 6. Sign-on to Okta SSO Portal

      All assigned MarkLogic applications should be shown:

      Selecting one of the MarkLogic applications should automatically log you in using your AD Credentials stored within Okta.

      Additional Reading


      MarkLogic Server provides pre-commit or post-commit triggers and these triggers listen for certain events to occur and then invokes a configured XQuery module to run after the event occurs. It is a common use case to create a common function in a library module which is shared among different trigger modules called by various triggers. This article shows an example to create and use such a shared library module in a post-commit trigger.


      This example shows a simple post commit trigger that fires when a new document is created.

      1. For this example create a database 'minidb' and after that set its triggers database as self (minidb). Also, create another database 'minimodule' to store all modules.

      2. Using Query Console, create a trigger using trigger definition by evaluating below XQuery against triggers database (minidb):


      3. Create a module by running below XQuery against modules database:


      4. Insert a library module into the modules database (minimodules):


      5. Now insert the sample document into the content database (minidb):


      6. Check the output in logs:

      After a new document having its URI prefixed with "/mini" is inserted into the content database, the TaskServer Logs file logs the below message:

      2018-04-25 11:40:50.224 Info: *****Document with /mini root /mini/test-25-1-1.xml was created.*****2018-04-25T11:40:50+05:30

      NOTE: Module imports are relative to root.


      1. Creating and Managing Triggers With triggers.xqy -


      We are always looking for ways to understand and address performance issues within the product and we are addressing this by adding the following new diagnostic features to the product.

      New Trace Events in MarkLogic Server

      Some new diagnostic trace events have been added to MarkLogic Server:

      • Background Time Statistics - Background thread period and further processing timings are added to xdmp:host-status() output if this trace event is set.
      • Journal Lag 30 - A forest will now log a warning message if a frame takes more than 30 seconds to journal.
        • Please note that this limit can be adjusted down by setting the Journal Lag # trace event (where # is {1, 2, 5 or 10} seconds).
      • Canary Thread 10 - A new "canary thread" that does nothing but sleep for a second and check how long is was since it went to sleep.
        • It will log messages if the interval between sleeping has exceeded 10 seconds.
        • This can be adjusted down by setting the Canary Thread # trace event (where # is {1, 2, 5 or 10} seconds).
      • Canary Thread Histogram - Adding this trace event will cause MarkLogic to write to the ErrorLog a histogram of timings once every 10 minutes.
      • Forest Fast Query Lag 10 - By default, a forest will now warn if the fast query timestamp is lagging by more than 30 seconds.
        • This can be adjusted down by setting the Forest Fast Query Lag # (where # is {1, 2, 5, or 10} seconds).
        • Note that Warning level messages will be repeatedly logged at intervals while the lag limit is exceeded, with the time between logged messages doubling until it reaches 60 seconds.
        • There will be a final warning when the lag drops below the limit again as a way to bracket the period of lag.

      Examples of some of new statistics can be viewed in the Admin UI by going to the following URL in a browser (replacing hostname with the name of a node in your cluster and replacing TheDatabase with the name of the database that you would like to monitor):

      You can clear the forest insert and journal statistics by adding clear=true to your request; executing the following in a browser:

      These changes now feature in the current releases of both MarkLogic 7 and MarkLogic 8 and are available for download from our developer website:

      Hints for interpreting new diagnostic pages

      Here's some further detail on what the numbers mean.

      First, a note about how bucketing is performed on these diagnostic pages:

      For each operation category (e.g. Timestamp Wait, Semaphore, Disk), the wait time will fall into a range of values, which need to be bucketed.

      The bucketing algorithm starts with 1000 buckets to cover the whole range, but then collapses them into a small set of buckets that cover the whole span of values. The algorithm aims to

      1. End up with a small number of buckets

      2. Include extreme (out-liers) values

      3. Spread out multiple values so that they are not too "bunched-up" and are therefore easier to interpret.

      Forest Journal Statistics (http://hostname:8001/forest-journal-statistics.xqy?database=TheDatabase)

      When we journal a frame, there are a sequence of operations.

      1. Wait on a semaphore to get access to the journal.
      2. Write to the journal buffer (possibly waiting for I/O if exceeding the 512k buffer)
      3. Send the frame to replica forests
      4. Send the frame to journal archive/database replica forests
      5. Release the semaphore so other threads can access the journal
      6. Wait for everything above to complete, if needed.
        1. If it's a synchronous op (e.g. prepare, commit, fast query timestamp), we wait for disk I/O
        2. If there are replica forests, we wait for them to acknowledge that they have journaled and replayed.
        3. If the journal archive or database replica is lagged, wait for it to no longer be lagged.

      We note the wall clock time before/after these various options, so we can track how long they're taking.

      On the replica side, we also measure the "Journal Replay" time which would be inserting into the in-memory stand, committing, etc.

      Here's an example for a master and its replica.

      Forest F-1-1

      Timestamp Wait
      Bucket (ms)Count%CumulativeCumulative %
      0..9 280 99.64 280 99.64
      50..59 1 0.36 281 100.00
      Bucket (ms)Count%CumulativeCumulative %
      0..9 816 100.00 816 100.00
      Bucket (ms)Count%CumulativeCumulative %
      0..9 204 99.51 204 99.51
      10..19 1 0.49 205 100.00
      Local-Disk Replication
      Bucket (ms)Count%CumulativeCumulative %
      0..9 804 99.26 804 99.26
      10..119 6 0.74 810 100.00
      Journal Archive

      No Information

      Database Replication

      No Information

      Journal Total
      Bucket (ms)Count%CumulativeCumulative %
      0..9 810 99.26 810 99.26
      10..119 6 0.74 816 100.00
      Journal Replay

      No Information

      Forest F-1-1-R

      Timestamp Wait

      No Information

      Bucket (ms)Count%CumulativeCumulative %
      0..9 811 100.00 811 100.00
      Bucket (ms)Count%CumulativeCumulative %
      0..9 203 99.02 203 99.02
      10..59 2 0.98 205 100.00
      Local-Disk Replication

      No Information

      Journal Archive

      No Information

      Database Replication

      No Information

      Journal Total
      Bucket (ms)Count%CumulativeCumulative %
      0..9 809 99.75 809 99.75
      10..59 2 0.25 811 100.00
      Journal Replay
      Bucket (ms)Count%CumulativeCumulative %
      0..9 807 99.63 807 99.63
      10..119 3 0.37 810 100.00

      Forest Insert Statistics (http://hostname:8001/forest-insert-statistics.xqy?database=TheDatabase)

      When we're inserting a fragment into an in-memory stand, we also have a sequence of operations.

      1. Wait on a semaphore to get access to the in-memory stand.
      2. Wait on the insert throttle (e.g. if there are too may stands)
      3. Wait for the stand's journal semaphore, to serialize with the previous insert if needed.
      4. Release the stand insert semaphore.
      5. Journal the insert.
      6. Release the stand journal semaphore.
      7. Start the checkpoint task if the stand is full.

      As with the journal statistics, we note the wall clock time between these operations so we can track how long they're taking.

      On the replica side, the behavior is similar, although the journal and insert are in reverse order (we journal before inserting into the in-memory stand). If it's a database replica forest, we also have to regenerate the index information (Filled IPD).

      Here is a example for a master and its replica.

      Forest F-1-1

      Journal Throttle
      Bucket (ms)Count%CumulativeCumulative %
      0..9 606 100.00 606 100.00
      Insert Sem
      Bucket (ms)Count%CumulativeCumulative %
      0..9 604 99.67 604 99.67
      80..199 2 0.33 606 100.00
      Filled IPD

      No Information

      Stand Throttle
      Bucket (ms)Count%CumulativeCumulative %
      0..9 606 100.00 606 100.00
      Stand Insert
      Bucket (ms)Count%CumulativeCumulative %
      0..9 605 99.84 605 99.84
      100..109 1 0.17 606 100.00
      Journal Sem
      Bucket (ms)Count%CumulativeCumulative %
      0..9 604 99.67 604 99.67
      10..119 2 0.33 606 100.00
      Bucket (ms)Count%CumulativeCumulative %
      0..9 603 99.50 603 99.50
      10..119 3 0.50 606 100.00
      Bucket (ms)Count%CumulativeCumulative %
      0..9 597 98.51 597 98.51
      10..19 6 0.99 603 99.50
      200..229 3 0.50 606 100.00

      Forest F-1-1-R

      Journal Throttle

      No Information

      Insert Sem
      Bucket (ms)Count%CumulativeCumulative %
      0..9 606 100.00 606 100.00
      Filled IPD

      No Information

      Stand Throttle
      Bucket (ms)Count%CumulativeCumulative %
      0..9 606 100.00 606 100.00
      Stand Insert
      Bucket (ms)Count%CumulativeCumulative %
      0..9 605 99.84 605 99.84
      110..119 1 0.17 606 100.00
      Journal Sem
      Bucket (ms)Count%CumulativeCumulative %
      0..9 606 100.00 606 100.00

      No Information

      Bucket (ms)Count%CumulativeCumulative %
      0..9 605 99.84 605 99.84
      110..119 1 0.17 606 100.00

      Further reading

      To learn more about diagnostic trace events, please refer to our documentation and Knowledgebase articles and note that some trace events may only log information if logging is set to debug:

      Data Hub Framework allows you to model your data according to business entities. And Template Driven Extraction (TDE) allows you to view these entities through a relational or a semantic lens. With Data Hub Framework (DHF), TDE Templates are now created automatically so you can create data as rows using SQL or Optic API (see this video for more information). Template Driven Extraction feature has been available in MarkLogic for a while now whereas the DHF Generated TDE feature came out in DHF 4.

      Recently, we have been receiving reports of a couple of issues with respect to the DHF generated TDE feature and we are currently working on investigating and resolving those issues. Although this feature is fully functional for the most part, while our investigation is in progress, if you are seeing issues with your DHF generated TDE feature, our recommendation is to consider the DHS generated TDE as an example only and based on that, create your own TDE in the meantime to be able to handle the queries that you would like to run.

      Helpful resources:


      The jemalloc library is included with the MarkLogic install and is recommended to use as it has shown a performance boost over the default Linux malloc library.  It is included with the MarkLogic server install and is configured to be used by default. 

      There have been cases where even if configured, the library is not used.  This article will give possible solutions to debug that.


      ErrorLog message on startup if jemalloc is not allocated:

      Warning: Memory allocator is not jemalloc; check /etc/sysconfig/MarkLogic


      1) Make sure to use superuser shell or sudo and run the 'service MarkLogic restart'

      2) Verify that the jemalloc library is present in the install directory (ie /opt/MarkLogic/lib/

      3) Has the /etc/sysconfig/MarkLogic configuration file been modified from the default?  Try setting the configuration file back to the default and restarting the server.

      4) Confirm that /etc/sysconfig/MarkLogic contain the following lines:
      # preload jemalloc
      if [ -e $MARKLOGIC_INSTALL_DIR/lib/ ]; then


      For more information on the jemalloc library, please review the article provided by Facebook Engineering


      This article compares JSON support in MarkLogic Server versions 6, 7, and 8, and the upgrade path for JSON in the database.

      How is native JSON different than the previous JSON support?

      Previous versions of MarkLogic Server provided XQuery APIs that converted between JSON and XML. This translation is lossy in the general case meaning developers were forced to make compromises on either or both ends of the transformation. Even though the transformation was implemented in C++ it still added significant overhead to ingestion. All of these issues go away with JSON as a native document format. 

      How do I upgrade my JSON façade data to native JSON?

      For applications that use the previous JSON translation façade (for example: through the Java or REST Client APIs), MarkLogic 8 comes with sample migration scripts to convert JSON stored as XML into native JSON.

      The migration script will upgrade a database’s content and configuration from the XML format that was used in MarkLogic 6 and 7 to represent data to native JSON, specifically converting documents in the namespace.
      If you are using the MarkLogic 7 JSON support, you will also need to migrate your code to use the native JSON support. The resulting application code is expected to be more efficient, but it will require application developers to make minor code changes to your application.
      See also:
      Version 8 JSON incompatibilities


      MarkLogic Server provides a couple of useful techniques for keeping values in memory or resolving values without having to scan for documents on-disk.


      There are a few options available:

      1. cts:element-values performs a lexicon lookup so it's directly getting those values from the range indexes; you can add an options node and use the "map" parameter to get the call to return a map directly as per the documentation, which may give you what you need without having to do any further work.


      2. Storing a map as a server field is a popular approach and is widely used for storing data that needs to be accessed routinely by queries.

      Bear in mind that there is a catch to this approach as the map is not available to all nodes in a cluster - it is only available to the node responsible for evaluating the original request, so if you're using this technique in a clustered environment, the results may not be what is expected.

      Also note that if you're planning on storing a large number of maps in server fields on nodes on the cluster, it's important to make sure the hosts are provisioned with enough memory to accommodate these maps on top of group level caches and memory for query allocation, stands, range indexes document retrieval and the like.



      3. xdmp:set only allows you to set a value for the life of a single query but this technique can be useful in some circumstances - especially in situations where you're interested in keeping track of certain values throughout the processing of a module or a function within a module.


      4. If you have a situation where you have a large number of complex queries - particularly ones where lexicon lookups or calls to range indexes won't resolve the data you need and where lots of documents will need to be retrieved from disk, you should consider using registered queries.


      Note that registered queries utilise the List Cache so, if you plan to adopt this method, we recommend careful testing to ensure your caches are sized sufficiently to suit the needs of your application.


      This article explains how to kill Long Running Query and related timeout configurations.

      Problem Scenario

      At some point, we've all run into an inefficient long running query. What should we do if we don't want to wait for the query to complete? If we cancel the browser request, that would end the connection, but it wouldn't end the program invocation (called a "request") on the MarkLogic Server side. On the server side, that program invocation would continue to run until the execution is complete.

      Most of the time, this isn't really an issue. The server, of course, is multi-threaded, handling many concurrent transactions. We can just cancel the browser request, move on, and let the query finish when it finishes. However, sometimes it becomes necessary to free up server resources by killing the query and starting over. To do this, we need access to the Admin interface. 

      Sample Long running Query 

      Example only, please don't try this on any production machines!

      for $x in 1 to 1000000
      return collection()[1 + xdmp:random(1000)]
      This query is asking for 1,000,000 random documents, and will take a long time to execute. How can we cancel this query?

      How to Cancel/Kill the Query

      Go to the Administrative interface (at http://localhost:8001/ if you're running MarkLogic locally). At the top of the screen, you'll see a tab labeled "Status." Click that:


      This will take you to the "System Status" screen. This page reveals status information about hosts, databases, forests, and app servers. The App Server section is what we're concerned with. Scanning down the "Queries" column, we see that the "Admin" server is processing a query (namely, the one that generated the page we see). Everything looks okay so far. But just below that, we see that the "App-Services" server is just over 3 minutes into processing a query. That's our slow one. Query Console runs on the "App-Services" app server, which explains why we see it there. Go ahead and click the "App-Services" link:


      This takes us to the "App-Services" status page. So far, there's still no "cancel" button. One more click will reveal it ("show more"):


      We can now see an individual entry for the currently running query. Here we see it's called "eval.xqy"; that's the query module that Query Console invokes when you submit a query. If you were running your own query module (instead of using Query Console), then you would see its name here instead. To cancel the query, click the "[cancel]" link:


      One more click (on the confirmation page).


      This takes us back to the status page, where we see MarkLogic Server is in the process of canceling our query:


      Above page will continue to say "cancelling..." even though query is already killed and no longer exist till we refresh the page.

      A quick refresh of the above page shows that the query is no longer present.



      What happens if you forget to cancel a query?

      MarkLogic will continue to execute the query until a time limit is reached, at which point the Server will cancel the query for you. For example, here's what Query Console eventually returns back if we don't bother to cancel the query:


      How long is this time limit?

      This depends on your server configuration. We can actually set the timeout in the query itself, using the xdmp:set-request-time-limit() function, but even that will be limited by your server's "max time limit."

      For example, on the "Configure" tab of my "App-Services" app server, you can see that the "default time limit" is set to 10 minutes (600 seconds), and the longest any query can allow itself to run (by setting its own request time limit) is one hour (3600 seconds):



      Update and delete operations can be performance intensive and have negative effects on search performance when done in a conventional way, where data is updated or deleted in-place. To avoid these performance impacts during update and delete operations, MarkLogic Server updates and deletes "lazily."

      In MarkLogic Server, when you delete a document, it is not removed from disk immediately as that document's fragments are instead marked as "obsolete." Marking a document as obsolete tags its fragments for later removal, and also hides its fragments from subsequent query results. Updates happen in a similar way, where instead of updating in-place, MarkLogic Server marks the old versions of the fragments in an old stand as "obsolete" for later deletion, while also creating new versions of those fragments in a new stand (initially an in-memory stand, which is eventually written down as a new on-disk stand).

      Eventually, merges occur to move any unchanged fragments from an old stand into a new stand. Old fragments marked obsolete are ultimately deleted after the merge creating the new stand finishes, where the old stands that were used as input into that merge are finally removed from disk. Merging is very important - this is the mechanism by which MarkLogic Server both frees up disk space and optimizes its on-disk data structures, as well as reduces the number of fragments evaluated during its queries and searches.

      While lazy deletion results in faster updates and deletes, be aware that residual impacts can be seen in terms of both disk space and query performance if merges are not done in a timely manner.

      Further reading:

      Multi-Version Concurrency Control
      How do updates work in MarkLogic Server?
      ML Performance: Understanding System Resources


      MarkLogic Server allows you to configure MarkLogic Server so that users are authenticated using an external authentication protocol, such as Lightweight Directory Access Protocol (LDAP) or Kerberos. These external agents serve as centralized points of authentication or repositories for user information from which authorization decisions can be made. If, after following the configuration instructions in our documentation, the authentication does not work as expected, this article gives some additional debugging ideas.


      The following are areas should be checked when your LDAP Authentication is not working as expected:

      1. Verify that cyrus-sasl-md5 library is installed on MarkLogic Server node.

      2. Run the following LDAP search command to check if LDAP server is properly setup.

      ldapsearch -H ldap://{Your LDAP Serevr URI}:389 -x -s base

      a. Once you run the ldap search command, make sure digest-md5 is supported. 

      supportedSASLMechanisms: DIGEST-MD5

      b. Identify the correct LDAP Service name:

      e.g ldapServiceName: MLTEST1.LOCAL:dc1$@MLTEST1.LOCAL

      3. On Windows platforms, the services.keytab file is created using Active Directory Domain Services (AD DS) on a Windows server. If you are using Active Directory Domain Services (AD DS) on a computer that is running Windows Server 2008 or Windows Server 2008 R2, be sure that you have installed the hot fix described in

      Introduction: the issue

      MarkLogic performs Nested lookups on the LDAP Groups assigned to a user to determine which roles the user will be assigned. If the groups belong to multiple Active Directory Domains within a federated Active Directory Forest then MarkLogic user authorization could fail with a subordinate Referral error, as seen below:

      2019-07-30 13:27:23.002 Notice: XDMP-LDAP: ldap_search_s failed on ldap server ldap:// Referral (10)


      MarkLogic has been configured to connect to the Local Domain Controller LDAP ports 389 (LDAP) or 636 (LDAPs), however, a Local Domain Controller can only search domains to which it has access.


      A user is a member of the following groups which belong to two separate Active Directory domains, subA, and subC.

      Using a Local Domain Controller for subA for external authorization would result in a login failure when attempting to perform the nested group lookup for the domain subC

      member=CN=Group Onw,OU=OrgUnitAGroups,OU=OrgUnitA,DC=subA,DC=domain
      member=CN=Group Two,OU=OrgUnitAGroups,OU=OrgUnitA,DC=subA,DC=domain
      member=CN=Group Three,OU=OrgUnitCGroups,OU=OrgUnitC,DC=subC,DC=domain


      If you have multiple Active Directory Domains federated into an Active Directory forest you should use the Global Catalog port 3278 (LDAP) or 3279 (LDAPS) to prevent failures when searching for group memberships that are defined in other domains.

      Optional workaround

      A large number of nested groups can potentially lead to a decrease in login time performance, if you do not need to really on nested lookups to determine group membership for MarkLogic roles, i.e. all groups required are returned from the initial user search request then you should consider disabling setting the "ldap nested lookup" parameter to false in the External Security configuration.

      Doing this would also prevent subordinate domain searches and allow you to continue to use an Active Directory Domain Controller instead of switching to the Global Catalog.

      Further reading


      A leap second, as defined by wikipedia is "a one-second adjustment that is occasionally applied to Coordinated Universal Time (UTC) in order to keep its time of day close to the mean solar time. Without such a correction, time reckoned by Earth's rotation drifts away from atomic time because of irregularities in the Earth's rate of rotation."  At the time of this writing, the next leap second to be inserted is on June 30, 2015 at 23:59:60 UTC.

      For systems that use the Network Time Protocol (NTP) to synchronize the network time across all the host in their MarkLogic Cluster, the Marklogic Server Software is not impacted by the leap second (i.e. we expect everything to work fine at the MarkLogic layer)

      For systems where the synchronization of their system clocks require UTC time to be set backwards, then anywhere time dependent data is stored, it must be accounted for. In this case, we recommend that our customers implement NTP in their environment.  Otherwise, the application layer will need to handle discontinuous time. 

      Transactional Consistency

      The algorithm that MarkLogic Server uses to maintain transactional consistency of data is not wall clock dependent and, as such, is not affected by the leap second.

      Network Time Protocol (NTP)

      NTP generally works really really hard not to make time go backwards as clock readings are constrained to always increase - every reading increases the NTP clock. NTP adjusts things gradually by slowing down or speeding up the clock and not by making discrete changes unless time is off by a lot. A second is not a lot.  An hour is a lot. Regardless of the leap second, adjustments for computer clock drift can easily be more than a second and happen frequently. 

      When Time Goes Backwards

      Without NTP and left on their own, computer clocks are really not that accurate. If synchronization of the system clocks on the hosts of a MarkLogic cluster require the clocks to be set backwards, then the application layer will need to account for and handle discontinuous date-time in their data. 

      Beginning with MarkLogic Server version 8,  the temporal feature was introduced.  If the system clock is adjusted backwards, there are conditions where temporal document inserts and updates will fail with an appropriate error code.  This is by design and expected.

      Our recommendation is to implement NTP on all hosts of a MarkLogic cluster to eliminate the need to handle discontinuous time at the application layer. 

      Further Reading

      Redhat article on the Leap Second - ;

      Microsoft Support article on the Leap Second - ;



      The internal mechanisms MarkLogic Server uses to implement security are query constraints. Lexicon search performance may be impacted by security query contraints.  If performed with admin credentials, Lexicon searches will not be impacted by the security query constraints.  


      Query time grows proportionately with the number of matches from a given search across a set of documents (not the actual number of documents in your database). The presence of security constraints will contribute a significantly larger number of matches than if the same lexicon search was performed with admin credentials.  In order to minimize the number of matches (and therefore query time) for a given lexicon search, you'll want to amp your lexicon searches to an admin user.

      For MarkLogic Server v6.0, the absolute maximum number of MarkLogic Servers in a Cluster is 256, but the optimum is around 64.


      MarkLogic recommends the default "ordered" option for Linux ext3 and ext4 file-systems.

      File System administrators in Linux are tempted to use the data=writeback option to achieve higher throughput from their file-system, but this comes with the side-effects of potential data corruption and data-secuity breach. This article explains both file system options with respect to MarkLogic Server. 


      Linux ext3 and ext4 file system has default data option of "ordered", which writes to the main file system before committing to the journal.

      Both of these file-system goes the extra mile to protect your files and writes data associated with that meta data by default with data=ordered, thus assuring file-system integrity to application layer - essential for MarkLogic Server data integrity. 


      Other journaled file systems like XFS and JFS write meta data to the disk;  to make ext3 and ext4 behave like XFS and other journal file system, an administrator could set 'data=writeback' in their mount options.

      The 'data=writeback' mode does not preserve data ordering when writing to the disk, so commits to the journal may happen before the data is written to the file system. This method is faster because only the meta data is journaled, but is not good at protecting data integrity in the face of a system failure.

      If there is a crash between the time when metadata is commited to the journal and when data is written to disk, the post-recovery metadata can point to incomplete, partially written or incorrect data on disk; which can lead to corrupt data files. Additionally, data which was supposed to be overwritten in the filesystem could be exposed to users - resulting in a security risk.

      Linus Torvalds comments on 'data=writeback'

      "it makes things much smoother, since now the actual data is no longer in the critical path for any journal writes, but anybody who thinks that's a solution is just incompetent.  We might as well go back to ext2 then. If your data gets written out long after the metadata hit the disk, you are going to hit all kinds of bad issues if the machine ever goes down."   -



      Here we discuss management of temporal documents.


      In MarkLogic, a temporal document is managed as a series of versioned documents in a protected collection. The ‘original’ document inserted into the database is kept and never changes. Updates to the document are inserted as new documents with different valid and system times. A delete of the document is also inserted as a new document.

      In this way, a temporal document always retains knowledge of when the information was known in the real world and when it was recorded in the database.


      By default the normal xdmp:* document functions (e.g., xdmp:document-insert) are not permitted on temporal documents.

      The temporal module (temporal:* functions; see Temporal API) contains the functions used to insert, delete, and manage temporal documents.

      All temporal updates and deletes create new documents and in normal operations this is exactly what will be desired.

      See also the documentation: Managing Temporal Documents.

      Updates and deletes outside the temporal functions

      Note: normal use of the temporal feature will not require this sort of operation.

      The function temporal:collection-set-options can be used with the updates-admin-override option to specify that users with the admin role can change or delete temporal documents using non-temporal functions, such as xdmp:document-insert and xdmp:document-delete.

      For example, if you need to do a corb or other administrative transform, but do not want to update the system dates on the documents; say, you want to change the values M/F to Male/Female.



      This article outlines different manual procedures to failback after a failover event

      What is failover?

      Failover in MarkLogic Server provides high availability for data nodes in the event of a d-node or forest-level failure. With failover enabled and configured, a host can go offline or unresponsive and a MarkLogic Server cluster automatically and gracefully recovers from the outage, continuing to process queries without any immediate action needed by an administrator.

      MarkLogic offers support for two varieties of failover at the forest level, both of which provide a high-availability solution for data nodes.

      • Local-disk failover: Allows you to specify a forest on another host to serve as a replica forest which will take over in the event of the forest's host going offline. Multiple copies of the forest are kept on different nodes/filesystems in local-disk failover
      • Shared-disk failover: Allows you to specify alternate nodes within a cluster to host forests in the event of a forest's primary host going offline. A single copy of the forest is kept in shared-disk failover

      More information can be found at:

      How does failover work?

      The mechanism for how MarkLogic Server automatically fails over is described in our documentation at: How Failover Works

      When does failover occur?

      Scenarios that trigger a forest to failover are discussed in detail at:

      High level overview of failing back after a failover event

      If failover is configured, other hosts in the cluster automatically assume control of the forests (or replicas of the forests) of the failed host. However, when the failed host comes back up, the transfer of control back to their original host does not happen automatically. Manual intervention is required to failback. If you have a failed over forest and want to fail back, you'll need to:

      • Restart either the forest or the current host of that forest, if using shared-disk failover
      • Restart the acting data forest or restart the host of that forest, if using local-disk failover. You should only do this if the original primary forest is in the sync replicating state, which indicates that it is up-to-date and ready to take over. Updates written to an acting primary forest must be synchronized to acting replicas, else those updates will be lost after failing back. After restarting the acting data forest, the intended primary data forest will automatically open on the intended primary host.

      Make sure the primary host is safely back online before attempting to fail back the forest.

      You can read more about this procedure at: Reverting a Failed Over Forest Back to the Primary Host

      Local disk failover procedure for attaching replicas directly to the database and clearing the intended primary forests error states

      If your primary data forests are in an error state, you'll need to clear those errors before failing back. This will usually require unmounting the primary forest copy, then directly mounting the local disk failover forest copy (or "LDF") to the relevant database. That procedure looks like:

      1. Make sure to turn OFF the rebalancer/reindexer at the database level - you don't want to unintentionally move data across forests when manually altering your database's forest topology.
      2. Break forest level replication between forests (i.e. - between the intended LDF replica (aka "acting primary") and intended primary forest currently in an error state)
      3. Detach the intended primary forest from database
      4. Attach the intended LDF replica (aka acting primary) forest directly to the database
      5. Make sure the database is online
      6. Delete the intended primary forest in error state
      7. Create a new forest with the same name as the now deleted intended primary forest
      8. Re-establish forest-level replication between the intended LDF replica (aka acting primary) forest and the newly created intended primary forest
      9. Let bulk replication repopulate the intended primary forest
      10. After bulk replication is finished, fail back as described above, so the intended primary forest is once again the acting primary forest, and the intended LDF replica is once again the acting LDF replica forest

      What is the procedure for failing forests back to the primary host in cases where the replicas are directly attached to the database?

      If intended LDF replicas are instead directly attached to the relevant database, forest or host restarts will not fail back correctly. Instead, you must rename the relevant forests:

      1. Forests that are currently attached to the database can be renamed - from their LDF replica naming scheme, to the desired primary forest naming scheme.
      2. Conversely, unattached primary forests can be renamed as LDF replicas, then configured as LDF replicas for the relevant database
      3. At this point, the server should detect that the current primary (which was previously the LDF replica) will have more recent data than the current LDF replica (which was previously the primary), which should then cause the server to populate the current LDF replica from the current primary

      What should be done in case of a disk failure?

      In the unlikely event a logical volume is lost, you'll want to restore from a copy of your data. That copy can take the form of:

      1. Local disk failover (LDF) replicas within the same cluster (assuming those copies are fully synchronized)
      2. Database Replication copies in your replication cluster (again, assuming those copies are fully synchronized)
      3. Backups, which might be missing updates made since the backup was taken

      You can restore from backups if you can afford to lose updates subsequent to that backup's timestamp and/or can re-apply whatever updates happened after the backup was taken.

      If you would instead prefer not to lose updates, then use LDF replicas to sync back to replacement primary forests created on new volumes, failing back manually when done. In the event that data was moved across forests in some way after the backup was taken, it would be best to use LDF replicas instead, which avoids the possibility database corruption in the form of duplicate URIs.

      Database Replication will allow you to maintain copies of forests on databases in multiple MarkLogic Server clusters. Once the replica database in the replica cluster is fully synchronized with its primary database, you may break replication between the two and then go on to use the replica cluster/database as the primary. Note: To enable Database Replication, a license key that includes Database Replication is required. You'll also need to ensure that all hosts are:

      1. Running the same maintenance release of MarkLogic Server
      2. Using the same Operating System
      3. Have Database Replication correctly configured


      • It's possible to have multiple copies of your data in a MarkLogic Server deployment
      • Under normal operations, these copies are synchronized with one another
      • Should failover events occur in a cluster, or catastrophic events occur to an entire cluster, you can shift traffic to the available previously synchronized copies
      • Failing back is a manual operation
        • Make sure to re-synchronize copies that were offline with online copies
        • Shifting previously offline copies to acting primary before re-synchronization may result in data loss, as offline forests can overwrite updates previously committed to LDF forests serving as acting primaries while the intended primary forests were offline

      Related materials:


      When CPF is installed, a number of new documents are created for the nominated Triggers database associated with that database.

      This Knowledgebase article is designed to show you what CPF creates on install, in the event that you want to safely disable and remove it from your system.

      Getting started

      Below is a layout of all databases and their associated document counts with a clean install of MarkLogic 9.0-2:

      Database IDDatabase NameDocument Count
      8723423541597683063 App-Services 14
      12316032390759111212 Modules 0
      1695527226691932315 Fab 0
      11723073009075196192 Security 1526
      15818912922008798974 Triggers 0
      5212638700134402198 Documents 0
      4320540002505594119 Extensions 0
      9023394855382775954 Last-Login 0
      11598847197347642387 Schemas 0
      12603105430027950215 Meters 48

      Adding CPF

      After installing CPF on the Documents database (with conversion enabled), we now see:

      Database IDDatabase NameDocument Count
      8723423541597683063 App-Services 15
      12316032390759111212 Modules 0
      1695527226691932315 Fab 0
      11723073009075196192 Security 1526
      15818912922008798974 Triggers 39
      5212638700134402198 Documents 0
      4320540002505594119 Extensions 0
      9023394855382775954 Last-Login 0
      11598847197347642387 Schemas 0
      12603105430027950215 Meters 498

      If we ignore Meters and App-Services, we can see that by default, A CPF install will create a number of documents in the Triggers database:


      Files created by CPF

      One of these files is the CPF configuration.xml file

      One of these documents describes the default domain which is created when CPF is installed:

      Default Documents

      Of the 39 files created, we can see from the URI listing above that the majority (28) of these are prefaced with These files describe each of the standard conversion pipelines that ship with the server. These are:

      Alerting (spawn)
      Calais Entity Enrichment Sample
      Conversion Processing
      Conversion Processing (Basic)
      Data Harmony Enrichment Sample
      DocBook Conversion
      Document Filtering (Properties)
      Document Filtering (XHTML)
      Entity Enrichment
      Flexible Replication
      HTML Conversion
      Janya Entity Enrichment Sample
      MS Office Conversion
      Office OpenXML Extract
      PDF Conversion
      PDF Conversion (Image Batching)
      PDF Conversion (Page Layout with Reblocking)
      PDF Conversion (Page Layout, Image Batching)
      PDF Conversion (Page Layout)
      PDF Conversion (Paged Text, No Rendering)
      Schema Validation
      SRA NetOwl Entity Enrichment Sample
      Status Change Handling
      Temis Entity Enrichment Sample
      WordprocessingML Process
      XHTML Conversion Processing
      XInclude Processing

      Seven of the files are triggers - all of which are namespaced with the cpf prefix:

      cpf:any-property Default Documents
      cpf:create Default Documents
      cpf:delete Default Documents
      cpf:state Default Documents
      cpf:status Default Documents
      cpf:update Default Documents

      Removing the core files created when CPF was initially installed will disable it from further functioning in your environment.

      Scripting the removal of default CPF components

      This GitHub gist demonstrates a method for removing CPF configuration from a given database - in the example below, the "Triggers" database is specfied:


      If you have an existing MarkLogic Server cluster running on EC2, there may be circumstances where you need to upgrade the existing AMI with the latest MarkLogic rpm available. You can also add a custom OS configuration.

      This article assumes that you have started your cluster using the CloudFormation templates with Managed Cluster feature provided by MarkLogic.

      To upgrade manually the MarkLogic AMI, follow these steps:

      1. Launch a new small MarkLogic instance from the AWS MarketPlace, based on the latest available image. For example, t2.small based on MarkLogic Developer 9 (BYOL). The instance should be launched only with the root OS EBS volume.
      Note: If you are planning to leverage the PAYG-PayAsYouGo model, you must choose MarkLogic Essential Enterprise.
      a. Launch a MarkLogic instance from AWS MarketPlace, click Select and then click Continue:

      b. Choose instance type. For example, one of the smallest available, t2.small
      c. Configure instance details. For example, default VPC with a public IP for easy access
      d. Remove the second EBS data volume (/dev/sdf)
      e. Optional - Add Tags
      f. Configure Security Group - only SSH access is needed for the upgrade procedure
      g. Review and Launch
      Review step - AWS view:

      2. SSH into your new instance and switch the user to root in order to execute the commands in the following steps.

      $ sudo su -

      Note: As an option, you can also use "sudo ..." for each individual command.

      3. Stop MarkLogic and uninstall MarkLogic rpm:

      $ service MarkLogic stop
      $ rpm -e MarkLogic

      4. Update-patch the OS:

      $ yum -y update

      Note: If needed, restart the instance (For example: after a kernel upgrade/core-libraries).
      Note: If you would like to add more custom options/configuration/..., they should be done between steps 4 and 5.

      5. Install the new MarkLogic rpm
      a. Upload ML's rpm to the instance. (For example, via "scp" or S3)
      b. Install the rpm:

      $ yum install [<path_to_MarkLogic_RPM>]/[MarkLogic_RPM]

      Note: Do not start MarkLogic at any point of AMI's preparation.

      6. Double check to be sure that the following files and log traces do not exist. If they do, they must be deleted.

      $ rm -f /var/local/mlcmd.conf
      $ rm -f /var/tmp/mlcmd.trace
      $ rm -f /tmp/

      7. Remove artifacts
      Note: Performing the following actions will remove the ability to ssh back into the baseline image. New credentials are applied to the AMI when launched as an instance. If you need to add/change something, mount the root drive to another instance to make changes.

      $ rm -f /root/.ssh/authorized_keys
      $ rm -f /home/ec2user/.ssh/authorized_keys
      $ rm -f /home/ec2-user/.bash_history
      $ rm -rf /var/spool/mail/*
      $ rm -rf /tmp/userdata*
      $ rm -f [<path_to_MarkLogic_RPM>]/[MarkLogic_RPM]
      $ rm -f /root/.bash_history
      $ rm -rf /var/log/*
      $ sync

      8. Optional - Create an AMI from the stopped instance.[1] The AMI can be created at the end of step 7.

      $ init 0

      [1] For more information:

      At this point, your custom AMI should be ready and it can be used for your deployments. If you are using multiple AWS regions, you will have to copy the AMI as needed.
      Note: If you'd like to add more custom options/configuration/..., they should be done between steps 4 and 5.

      Additional references:
      [2] Upgrading the MarkLogic AMI -


      A powerful new feature was added to MarkLogic 8 - the ability to build applications around a declarative HTTP rewriter. You can read more about MarkLogic Server's HTTP rewriter and some of the new features it provides in our documentation.

      This article will cover some basic tips for debugging applications that make use of this feature.

      Validating your rewriter rules (Using XML Schema)

      The rewriter adheres to an XML Schema. At runtime the rewriter is not validated against this schema; this is by design so that potentially minor errors don't risk taking your application offline. As a best practice, we recommend validating your rewriters manually every time you make a change. In order to do this, you can use MarkLogic Server or any other tool that supports XML validation (the schema is standard XSD 1.0).  If you want to view the schema, it's copied to Config/rewriter.xsd when you install the product.

      In order to validate from within MarkLogic using XQuery you can simply execute:

      validate { fn:doc("/path/to/your/rewriter.xml") }

      The above will validate the XML if your rewriter rules are stored in a database. If you're using the filesystem, you can use xdmp:document-get instead.

      Alternatively, you can copy / paste the XML body into Query Console and wrap it with a call to validate as below:

      validate { * Paste your rewriter rules here * }

      The above approach should work without any issue as long as there is no content in your rewriter XML that contains any XQuery reserved syntax.

      General rewritter debugging and tracing

      For a simple "print" style debugging you can manually add trace statements at any point an eval rule is allowed. Like this:

      <trace event="customevent">data</trace>

      Then enable diagnostics (in your group settings) and add "customevent"; your custom trace will now show up in ErrorLog.txt whenever that endpoint is accessed. To read more on the use of trace events in your applications, refer to this Knowledgebase article

      There is error code handling:

      <error code="MYAPP-EXCEPTION" data1="value1" data2="... 

      You can also add ids - these will be traced out - which may aid debugging

      <match id="match-id-for-myregex" regex=".* ...

      Useful diagnostic trace events

      Note that additional trace events can generate a lot of data and may slow your application down, so make sure these do not get left on in a production-critical environment

      Below are some trace events you can use and a brief description of what each trace event does:

      Rewriter Parser Details of the parsing of the rewriter XML file
      Rewriter Evaluator Execution traces of rules as evaluated
      Rewriter Evaluator Verbose Additional (more verbose) tracing
      Declarative Rewriter Entry points into and out of the rewriter from the app server request handler
      Rewriter Print Rules After parsing and validation of the rewriter – a full dump of the internal data structures that resulted.

      Additional points to note

      Use of the "Evaluator" traces will write to the ErrorLog.txt on every request.

      The "Parser" trace event will only occur once or upon updating your rewriter.


      Prior to the 9.0-9 release, MarkLogic currently provides support for the Oracle JDK 8.  However, Oracle have recently announced End of Public Updates of Java SE 8

      What can we expect from MarkLogic?

      MarkLogic will support OpenJDK 9, OpenJDK 10 and OpenJDK 11 starting with MarkLogic Server 9.0-9 and associated products.

      These products include:

      From the 9.0-9 release onwards, we will no longer QA test our products with Oracle JDK.

      We will support Amazon Corretto JDK as part of our Amazon offerings.  Corretto meets the Java SE standard and certified compliant by AWS using the Java Technical Compatibility Kit.

      The latest version of MarkLogic Server is available to download from:

      JDK Requirements for Data Hub Framework (DHF) Users

      Requirements are discussed in further detail in the DHF documentation, however it's important to note that versions of DHF prior to the 5.2 release require Java 8.


      The default configuration of MarkLogic Application Servers are not vulnerable to the FREAK SSL attack. 

      What is the FREAK SSL attack?

      Tuesday 2015/03/03 - Researchers of miTLS team (joint project between Inria and Microsoft Research) disclosed a new SSL/TLS vulnerability — the FREAK SSL attack (CVE-2015-0204). The vulnerability allows attackers to intercept HTTPS connections between vulnerable clients and servers and force them to use ‘export-grade’ cryptography, which can then be decrypted or altered.

      Read more about the FREAK SSL attack.

      Testing a webserver

      You can verify whether a webserver is attackable by the FREAK attack with this free SSL vulnerability checker.


      MarkLogic Server uses FIPS-capable OpenSSL to implement the Secure Sockets Layer (SSL v3) and Transport Layer Security (TLS v1) protocols. When you install MarkLogic Server, FIPS mode is enabled by default and SSL RSA keys are generated using secure FIPS 140-2 cryptography. This implementation disallows weak ciphers and uses only FIPS 140-2 approved cryptographic functions. Read more about OpenSSL FIPS mode in MarkLogic Server, and how to configure it.

      As long as FIPS mode was not explicitly disabled, MarkLogic Application Servers are not vulnerable to the FREAK SSL attack. 


      Eliminating the vulerability for all configurations requires an update to the OpenSSL library. MarkLogic Server continually updates the implementation version of the OpenSSL library so every MarkLogic Server maintenance release published after the discovery of this vulnerability will include the OpenSSL version that is not vulnerable to the FREAK attack.


      As long as FIPS mode is enabled, which is the default configuration, MarkLogic Application Servers are not vulnerable to the FREAK SSL attack



      MarkLogic 9 introduces Certificate based User Authentication, which allows users to Log into MarkLogic Server without being required to enter user name/password. In previous versions, Certificates were only utilized to restrict client access to MarkLogic Server with the Digest/Basic User Authentication Scheme. In addition to Certificate based User Authentication using Internal user and External name verification MarkLogic 9 also permits authenticating and authorizing user certificates against an LDAP or Active Directory database to permit access based on MarkLogic Roles and LDAP Group membership. By using this method of authentication and authorization a site is able to maintain all users access externally without the need to manage a separate set of users within the MarkLogic security database.

      This document will expand on the concepts and configuration examples described in the associated "MarkLogic Certificate based User Authentication" knowledge base article and will show the additional steps required to configure MarkLogic to authorize a User certificate against an LDAP or Active Directory. It is highly recommended that you make yourself familiar with the previous article as it covers in more detail the steps required to setup the MarkLogic App Server to ensure that TLS Client Authentication is configured correctly to request and verify the certificates that may be presented by the user.

      Creating the External Security definition

      To authorize users presenting a certificate you should first create a new External Security definition selecting “Certificate” for authentication and LDAP for authorization.


      Next, configure the LDAP server entry.



      • Unlike standard user authorization when MarkLogic searches for the user certificate, MarkLogic uses a base Object search using the full certificate distinguished name rather than a sub-tree search off the “ldap base”. MarkLogic UI currently requires an entry for the “ldap base”; Even though it is not used, as such you will need to code a dummy value to satisfy UI verification.
      • When performing the LDAP search, MarkLogic will request the “ldap attribute” value to use when creating the temporary userid. Care should be taken when selecting this value to ensure that the value is unique for all possible Certificate DN’s that may be presented.
      • Ensure that the “ldap default user” has the required permissions to search for the Certificate within the LDAP or Active Directory server and return the required attributes.
      • MarkLogic uses the “memberOf” and “member” attributes to return Group and Group of Group membership, if your LDAP or Active Directory server using different attributes such as “isMemberOf” you can override them in the “memberOf” and “member” attribute fields. 

      Configuring the App Server

      Configure the App Server to use “certificate” authentication, set “Internal Security” to false and select the external security definition created above.


      Enable TLS Client Authentication and configure the SSL Client Certificate authorities that you will accept to sign the user certificates. Any certificates presented that is not signed by one of the specified CA’s will be rejected.



      For more details on configuring the CA certificates required for certificate based authentication please from to the knowledge base article "MarkLogic Certificate based User Authentication". 

      Configure MarkLogic Security Roles

      For each role specify one or more external names that match the “memberOf” attribute returned for the Certificate DN.

      To confirm that users are being authorized to the MarkLogic AppServer correctly, connect using your browser or command line tool such as “cUrl”.

      MacPro-4505:~ $ curl -k --cert ./mluser1.p12:password https://localhost:8013
      <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "">
      <html xml:lang="en" xmlns="">
      <title>Welcome to the MarkLogic Test page.</title>
      <body><p>This application is running on MarkLogic Server version 9.0-1.1</p></body>

      Within the AppServer AccessLog, you should see a mapping for a new temporary userid to the expected role.

      External User(mluser1) is Mapped to Temp User(mluser1) with Role(s): mladmin
      ::1 - mluser1 [18/Jul/2017:16:07:05 +0100] "GET / HTTP/1.1" 200 347 - "curl/7.51.0"


      If a user is not able to connect using their certificate, the first thing to check is if the Certificate Distinguished Name (DN) can be found in the LDAP or Active Directory database and if it contains the required userid and memberOf attributes.

      Using a tool such as OpenSSL determine the correct Subject Certificate DN, e.g.

      MacPro-4505:~ $ openssl x509 -in mluser1.pem -text
      Version: 3 (0x2)
      Serial Number: 1497030421 (0x593adf15)
      Signature Algorithm: sha256WithRSAEncryption
      Issuer: CN=User Signing Authority, O=MarkLogic, OU=Support
      Not Before: Jun 9 17:47:13 2017 GMT
      Not After : Jun 9 17:47:13 2018 GMT
      Subject: CN=mluser1, OU=Users, DC=MarkLogic, DC=Local
      Next using an LDAP lookup tool such as “ldapsearch” or "ldp.exe" on Microsoft Windows, perform a base Object search for the Certificate DN requesting the LDAP user and memberOf attribute (with the entries matching your LDAP External Security settings).

      If either the userid or memberOf attributes are missing access will be denied.

      MacPro-4505:~ $ ldapsearch -H ldap:// -x -D "cn=manager,dc=marklogic,dc=local" -W -s base -b "cn=mluser1,ou=Users,dc=MarkLogic,dc=Local" "memberOf" "cn"
      # extended LDIF
      # LDAPv3
      # base <cn=mluser1,ou=Users,dc=MarkLogic,dc=Local> with scope baseObject
      # filter: (objectclass=*)
      # requesting: memberOf uid
      # mluser1, Users, MarkLogic.Local
      dn: cn=mluser1,ou=Users,dc=MarkLogic,dc=Local
      uid: mluser1
      memberOf: cn=AppAdmin,ou=Groups,dc=MarkLogic,dc=Local
      # search result
      search: 2
      result: 0 Success
      If MarkLogic is able successfully to locate the certificate and return the required attributes, then check if the external names in the security role matches (case-sensitive) the “memberOf” attribute returned by the LDAP search.

      The following XQuery can be used to show all the external names assigned to a specific role. 

      (: execute this against the security database :)
      xquery version "1.0-ml";
      import module namespace sec = ""
          at "/MarkLogic/security.xqy";



      If MarkLogic is still not able to authenticate users, it is very useful to use a packet capture tool such as Wireshark to check - if MarkLogic is able to contact the LDAP or Active Directory server and is receiving the expected successful Admin bind and Search for the Certificate DN.

      The following example trace shows a successful BIND using the LDAP Default user followed by a successful search for the Certificate DN.


      Further Reading


      MarkLogic 9 introduces Certificate based User Authentication, which allows users to Log into MarkLogic Server without being required to enter user name/password. In previous versions, Certificates were only utilized to restrict client access to MarkLogic Server with the Digest/Basic User Authentication Scheme. Certificate based User Authentication configuration can be achieved using Internal User or External Name based user configurations.

      Certificate Authentication: Internal User vs External Name based Authentication:

      The difference between Internal User or External Name based authentication lies in the existence of the Certificate CN field based User (demoUser1 in our example) in the MarkLogic Security Database (Internal User) vs if the user retrieved from Certificate Subject field (whole Subject field as DN) is mapped as External Name value in any Existing User.

      User Certificate Example:

      There are few common steps/examples listed to add to clarity. For our example setup, the certificate presented by the App Server User (demoUser1) will be as following. 

      $ openssl x509 -in UserCert.pem -text -noout
              Version: 1 (0x0)
              Serial Number: 7 (0x7)
          Signature Algorithm: sha1WithRSAEncryption
              Issuer: C=US, ST=NY, L=New York, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic DemoCA
                  Not Before: Jul 11 02:58:24 2017 GMT
                  Not After : Aug 27 02:58:24 2019 GMT
              Subject: C=US, ST=NJ, L=Princeton, O=MarkLogic Corporation, OU=Engineering, CN=demoUser1
              Subject Public Key Info:
                  Public Key Algorithm: rsaEncryption
                      Public-Key: (1024 bit)
                      Exponent: 65537 (0x10001)
          Signature Algorithm: sha1WithRSAEncryption

      CA Certificate (User Cert Signer) Import from Admin GUI

      In order to allow MarkLogic Server to accept the Certificate presented by a user, MarkLogic Server needs Certificate Authority (CA) to sign the User Certificate installed into MarkLogic. We can install CA Certificate (below) used to sign demoUser1 Cert using Admin GUI->Configure->Security->Certificate Authority Import tab.

      $ openssl x509 -in CACert.pem -text -noout
              Version: 3 (0x2)
              Serial Number: 9774683164744115905 (0x87a6a68cc29066c1)
          Signature Algorithm: sha256WithRSAEncryption
              Issuer: C=US, ST=NY, L=New York, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic DemoCA
                  Not Before: Jul 11 02:53:18 2017 GMT
                  Not After : Jul  6 02:53:18 2037 GMT
              Subject: C=US, ST=NY, L=New York, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic DemoCA
              Subject Public Key Info:
                  Public Key Algorithm: rsaEncryption
                      Public-Key: (4096 bit)
                      Exponent: 65537 (0x10001)
              X509v3 extensions:
                  X509v3 Subject Key Identifier:
                  X509v3 Authority Key Identifier:
                  X509v3 Basic Constraints: critical
                  X509v3 Key Usage: critical
                      Digital Signature, Certificate Sign, CRL Sign
          Signature Algorithm: sha256WithRSAEncryption

      CA Certificate Import into MarkLogic from Query Console

      We can also import above Certificate Authority with xquery call pki:insert-trusted-certificates to load the Trusted CA into MarkLogic.  The sample Query Console code below demonstrates this process. 

      (Please ensure this query is executed against the Security database)

      Certificate Template & Template CA import into Client (Browser/SSL Client)

      To enable SSL App Server, we will either

      1) Create Certificate Template to utilize Self Signed Certificate.

      or, 2) Import pre-signed Certificate Certificate into MarkLogic

      In both of the above cases, we will need to import CA used to sign Certificate used by MarkLogic SSL AppServer ro Client Browser/SSL Client.

      Importing a Self Signed Certificate Authority into Windows

      Once template is created, we will link our Template with our App Server to enable SSL based App Server.

      Certificate Authentication: CN as Internal User vs External Name based Internal User

      Difference between above two lies in if Certificate CN field User (demoUser1 in our example) exist in MarkLogic Security Database as Internal User -vs- if User retrieved from Certificate Subject field is mapped as External Name to any Existing User.

      1.) Certificate Authentication: Certificate CN field value as MarkLogic Security Database Internal User

      Steps to configure Certificate based User Authentication for our User demoUser1 as MarkLogic Internal User.

      a.) Create User "demoUser1" with necessary roles in MarkLogic Security (Internal User).


      b.) On the AppServer page, we will set Authentication schema to "Certificate" with Internal Security to "true". Also, unless you want to have some Users Authenticated as External User as well, you should leave External Security object to "none".


      c.) AppServer would also select CA that will be used to sign Client/User Certificate as accepted Certificate Authorities (please see section: CA Certificate earlier for our example).


      Once Configured, accessing above App Server with Browser with User Certificate (demoUser1) installed will be able to log into MarkLogic with internal demoUser1 (Note- We will also need to assign necessary Roles to Internal User to access resource as needed). 

      2.) Certificate Authentication: User Certificate Subject field value as External Name for Internal User

      Steps to configure Certificate based User Authentication for our User demoUser1 as MarkLogic External Name for Internal User "newUser1".

      a.) Create User "newUser1" with necessary roles in MarkLogic Security (Internal User), and Configure User Certificate Subject field as External Name to User.


      b.) Create an External Security object with Certificate based Authentication.


      c.) On External Security Object Configuration itself, select CA that will be used to sign Client/User Certificate as accepted Certificate Authorities (please see section: CA Certificate earlier for our example).

      Please Note - below Configuration is different then configuring Client CA on App Server (required for Internal User).


      d.) For External Name (Cert Subject field) based linkage to Internal User, App Server needs to point to our External Security Object.



      MarkLogic may fail to start, with an XDMP-ENCODING error, Initialization: XDMP-ENCODING: (err:XQST0087) Unsupported character encoding: ascii.  This is caused by a mismatch in the Linux Locale character set, and the UTF-8 character set required by MarkLogic.


      There are two primary causes to this error. The first is using service instead of systemctl to start MarkLogic on some Linux distros.  The second is related to the Linux language settings.

      Starting MarkLogic Service

      On an Azure MarkLogic VM, as well as some more recent Linux distros, you must use systemctl, and not service to start MarkLogic. To start the service, use the following command:

      • sudo systemctl start MarkLogic

      Linux Language Settings

      This issue occurs when the Linux Locale LANG setting is not set to UTF-8.  This can be accomplished by changing the value of LC_ALL to "en_US.UTF-8".  This should be done for the root user for default installations of MarkLogic.  To change the system wide locale settings, the /etc/locale.conf needs to be modified. This can be done using the localectl command.

      • sudo localectl set-locale LANG=en_US.UTF-8

      If MarkLogic is configured to run as a non-root user, then setting the locale can be done in the users environment.  Setting the value can be done using the $HOME/.i18n file.  If the file does not exist, please create it and ensure it has the following:

      • export LANG="en_US.UTF-8"

      If that does not resolve the issue in the user environment, then you may need to look at setting LC_CTYPE, or LC_ALL for the locale.

      • LC_CTYPE will override the character set part of the LANG setting, but will not change other locale settings.
      • LC_ALL will override both LC_CTYPE and all locale configurations of the LANG setting.


      Overlarge workloads, underprovisioned environments, or a combination of the two often result in false failovers - where MarkLogic Server will perceive an overloaded node as unavailable. Failover events redistribute the affected node’s traffic to the remaining nodes in the cluster. False failover events, unfortunately, redistribute an overloaded node’s workload to the likely similarly overloaded (and now even fewer number of) nodes remaining in the cluster. While it’s possible to mitigate this scenario in the short term by allowing more time for nodes to talk to one another, long term correction requires throttling of workloads, increasing the environment’s hardware provisioning, or a combination of the two.

      What does failover look like in MarkLogic Server?
      High availability systems require continuity within a cluster. MarkLogic Server delivers high availability by providing fault tolerance - if a node in a MarkLogic cluster fails, other nodes automatically pick up the workload so that the data stored in forests is always available. 

      More specifically, failover in MarkLogic Server is designed to address data node (“d-node”) or forest-level failure. D-node failures can include operating system crashes, MarkLogic Server restarts, power failures, or persistent system failures (hardware failures, for example). A forest-level failure is any disk I/O or other failure that results in an error state on the forest. 

      Failover in MarkLogic Server is "hot" in the sense that switchover occurs immediately to failover hosts already running in the same cluster, with no node restarts required. Failing back from a failover host to the primary host, however, needs to be done manually and requires a node restart.

      When a node is perceived as no longer communicating with the rest of the cluster, and a quorum of greater than 50% of the nodes in the cluster vote to remove the affected node, then a failover event will occur automatically. A node is defined to no longer be communicating with the rest of the cluster when that node fails to respond to cluster wide heartbeats within the defined host timeout.

      What does false failover look like in MarkLogic Server?
      False failover events in MarkLogic Server occur when a node is present and working, but so overloaded that it can no longer respond to cluster wide heartbeats within the specified host timeout. In other words, during false failover events the affected node is so busy that it is unable to communicate its status to the other nodes in the cluster, and consequently unable to prevent the other nodes from voting to remove it from the cluster.

      There could be many reasons causing a busy node/cluster and one of the reasons that’s often overlooked is the infrastructure especially when Virtualization is involved where you can get more out of your resources by allowing VMs to share resources under the assumption that not all systems will need the assigned resources at the same time. However, if you are in a situation where multiple VMs are under load, they can outstrip the available physical resources because more than 100% of the resources have been assigned to the VMs causing what is called a "resource starvation".

      What should I do about false failover events in MarkLogic Server?
      Recall that a node is voted out when it can no longer respond to the rest of the cluster within the specified host timeout. It might be possible to mitigate false failovers in the short term by temporarily increasing the environment’s XDQP and host timeouts. Larger timeouts would give all the nodes in the cluster more time to respond to clusterwide heartbeats, which under heavy load should decrease the frequency of false failover events. That said - DO NOT get in the habit of simply increasing your timeouts to larger and larger values. Increasing timeout to avoid false failovers is, at best, a temporary/short term tactic.

      Long term correction of false failover events requires better alignment between your system's workloads and its hardware provisioning. You could, for example, reduce the workload, or spread the same workload over more time, or increase your system’s hardware provisioning. All of these tactics would free up the affected nodes to respond to the clusterwide heartbeat in a more timely manner, thereby avoiding false failover events. You can read more about aligning your workloads and hardware footprint at:

      1. MarkLogic Performance: Understanding System Resources
      2. Performance Issues in MarkLogic Server: what they look like - and what you should do about them

      Further reading:

      MarkLogic Server is optimized for query performance - if you're coming from a relational database background, you might be surprised by how much storage and storage bandwidth might be used. To better understand this behavior, it's important to recall the following:

      Speed over storage savings - While it makes sense to minimize storage footprint from a storage utilization perspective, MarkLogic Server trades space for time to take advantage of rapidly falling storage prices.

      Lazy Deletes - To better prioritize query performance, in MarkLogic Server record deletions happen in the form of "lazy deletes" where the record (or "document") is first marked as "obsolete" and consequently hidden from query results. The work of actually deleting any one record is deferred for a later time, when multiple obsolete documents can be removed and your remaining data optimized all at the same time and in bulk during a merge operation.

      Index on ingest - MarkLogic Server indexes documents as they're ingested. If your data model and index configuration is where it needs to be, that means your data is ready to be queried as soon as it's in a MarkLogic Server database. If your index configuration isn't quite where you want it, however, revising it means reindexing your entire database, creating lots of obsolete documents and resulting in potentially multiple large merge operations. This is why it's always better in MarkLogic Server to optimize your index settings in smaller environments before propagating those index settings to your bigger environments, and why you'll want to do fewer, bigger index configuration changes instead of many small index configuration changes. Each index configuration change - regardless of size - will trigger a reindex, so you'll want to minimize the number of reindexes you need to perform instead of the minimizing the number of changes in any one reindex.

        In addition to reindexing, other aspects of MarkLogic Server that take up significant storage bandwidth include:

        • Rebalancing - which redistributes your data across your database
        • Local disk failover/database replication - both make copies of your data, and those copies need their own resources
        • Backup/restore - backup is making a copy of your data, and restore is effectively a mass update of your data
        • Mass updates of existing documents - Because of the way updates are performed in MarkLogic Server (read more), updating a large number of existing records will create a large number of obsolete documents, and consequently result in lots of large merge operations. To help reduce performance overhead, and if you have no need to preserve attributes of your existing data (read more), you might want to consider simply reloading data into an empty database, instead (which would result in avoiding the creation of obsolete documents and consequent merges)


        Understanding System Resources
        Understanding MarkLogic Minimum Disk Space Requirements
        MarkLogic - Lazy Deletes
        Mass Updates - "node-replace" vs "document-insert"


        A MarkLogic cluster is a group of inter-connected individual machines (often called “nodes” or “hosts”) that work together to perform computationally intensive tasks. Clustering offers scalability and high-availability by avoiding single-points of failure. This knowledgebase article contains tips and best practices around clustering, especially in the context of scaling out.

        How many nodes should I have in a cluster?

        If you need high-availability, there should be a minimum of three nodes in a cluster to satisfy quorum requirements.

        Anything special about deploying on AWS?

        Quorum requirements hold true even in a cloud environment where you have Availability Zones (or AZs). In addition to possible node failure, you can also defend against possible AZ failure by splitting your d-Nodes and e-Nodes evenly across three availability zones.

        Load distribution after failover events

        If a d-node experiences a failover event, the remaining d-nodes pick up its workload so that the data stored in its forests remains available.

        Failover forest topology is an important factor in both high-availability and load-distribution within a cluster. Consider the example below of a 3-node cluster where each node has two data forests (dfs) and two local disk-failover forests (ldfs):

        • Case 1: In the event of a fail over, if both dfs (df1.1 and df1.2) from node1 fail over to node2, the load on node2 would double (100% to 200%, where node2 would now be responsible for its own two forests - df2.1 and df2.2 - as well as the additional two forests from node1 - ldf1.1 and ldf1.2)
        • Case 2: In the event of a fail over, if we instead set up the replica forests in such a way that when node1 goes down, df1.1 would fail over to node2 and df1.2 would fail over to node3, then the load increase would be reduced per node. Instead of one node going from 100% to 200% load, two nodes would instead go from 100% to 150%, where node2 is now responsible for its two original forests - df2.1 and df2.2, plus one of node1's failover forests (ldf1.1), and node3 would also now be responsible for its two original forests - df3.1 and df3.2, plus one of node1's failover forests (ldf1.2)

        Growing or scaling out your cluster

        If you need to fold in additional capacity to your cluster, try to add nodes in "rings of three." Each ring of three can have its own independent failover topology, where nodes 1, 2, and 3 will fail over to each other as described above, and nodes 4, 5, and 6 will fail over to each other separate from the original ring of three. This results in minimal configuration changes for any nodes already in your cluster when adding capacity.

        Important related takeaways

        • In addition to the standard MarkLogic Server clustering requirements, you'll also want to pay special attention to the hardware specification of individual nodes
          • Although the hardware specification doesn’t have to be exactly the same across all nodes, it is highly recommended that all d-nodes be of the same specification because cluster performance will ultimately be limited by the slowest d-node in the system
          • You can read more about the effect of slow d-nodes in a cluster in the "Check the Slowest D-Node" section of our "Performance Testing
            With MarkLogic" whitepaper
        • Automatic fail-back after a failover event is not supported in MarkLogic due to the risks of unintentional overwrites, which could potentially result in accidental data loss. Should a failover event occur, human intervention is typically required to manually fail-back. You can read more about the considerations involved in failing a forest back in the following knowledgebase article: Should I flip failed over forests back to their respective masters? What are the risks if I leave them?


        Further reading


        What does it mean?



        This error may sometimes be encountered when:

        • When a restore is attempted while a backup task is running
        • Another process has the backup directory locked 

        Seen when:

        • The disk containing the backup directory runs out of space
        • There's a bad disk configuration
        • The backup destination disk is unmounted


        Indicates that an operation such as a merge, backup or query was explicitly canceled. This can occur:

        • Through the Admin Interface
        • By calling an explicit cancellation function, such as xdmp:request-cancel()
        • When a client breaks the network socket connection to the server while a query is running 


        MarkLogic Server expects the system clocks to be synchronized across all the nodes in a cluster, as well as between Primary and Replica clusters. The acceptable level of clock skew (or drift) between hosts is less than 0.5 seconds, and values greater than 30 seconds will trigger XDMP-CLOCKSKEW errors, and could impact cluster availability


        Indicates that an update statement attempted to perform an update to a document that will conflict with other updates occurring in the same statement. For example:

        • A single update transaction that attempts to updates a node, then attempts to add a child element to that node in the same transaction
        • A single update transaction that attempts to insert a document and then attempts to insert a node to that document
        • A single update transaction that attempts to insert a document at the same URI twice


        Indicates that the same URI occurred in multiple forests of the same database. Under normal operating conditions, duplicate URIs are not allowed to occur, but there are ways that programmers and administrators can bypass the server safeguards


        Indicates that MarkLogic Server detected a deadlock. Depending on whether the error is frequent or infrequent or whether it occurs as a ‘debug’ level or ‘notice’ level message, you need to take appropriate corrective action to avoid the deadlock


        Indicates that MarkLogic has run out of room in the expanded tree cache during query evaluation, and that consequently it cannot continue evaluating the complete query


        Indicates that a query or other operation exceeded its processing time limit. This can be caused by:

        • Inefficient queries
        • Inadequate processing limit
        • Resource bottlenecks


        Indicates that in-memory storage is full, resulting in the forest stands being written out to disk. These are informational only and are not errors as MarkLogic Server is working as expected. However, if these messages consistently appear more frequently than once per minute, increasing the corresponding 'in-memory' settings in the affected database may be appropriate.

        • MarkLogic Server uses its list cache to hold search term lists in memory
        • If you're attempting to execute a particularly non-selective or inefficient query, your query will fail due to the size of the search term lists exceeding the allocated list cache


         Both errors indicate that the requested module does not exist or the user does not have the right permissions on the module

        Question Answer Further reading
        How many replicas (Database Replication) should each of my primary databases have?
        • One replica per each primary database
        • Multiple replicas are not typically worth the additional administrative complexity or resource provisioning

        KB Articles:

        Do my primary and replica clusters have to be the same spec?
        • Slow replicas will throttle the performance of your primary cluster.
        • Therefore, your replica should be provisioned to avoid primary throttling - which typically means a similar hardware specification

         KB Articles:


        What do I do about a lagging primary?

        You can either:

        • Speed up the replica by reducing traffic to the replica, or adding hardware resources to the replica - or both
        • Pause replication - be aware you'll no longer have a synchronized DR copy until replication is re-enabled

        KB Articles:


        In terms of disaster recovery (DR) - how do I choose between backup/recovery or replication?
        • Database Replication
          • Best if you need a more synchronized copy of your data
          • Needs a bigger hardware footprint
          • Can result in primary throttling if under-provisioned or under heavy load
        • Backup/Restore
          • Best if you are not sufficiently provisioned for a more synchronized DR copy, as seen with database replication
          • Results in a more unsynchronized snapshot copy of your data

        KB Articles:


        Can I do multi-primary replication i.e., have primary databases on both the clusters on a pair of coupled clusters?
        • Database replication is intended for disaster recovery (DR) & redundancy
        • For DR purposes, the recommended configuration is a dedicated primary cluster for all primary databases and a dedicated DR cluster for all replicas of those primary databases
        • Because of administrative complexity and compromised DR functionality, multi-primary DR configurations are not recommended


        Should I replicate the auxillary databases?
        • Always replicate your Security database when setting up Database replication

        • Separate security databases on both primary and replica clusters are not recommended due to administrative complexity

        • Avoid replicating the App-Services database


        Can my primary and replica both write to the same shared storage?

        Avoid writing both primary and replica to the same shared storage since it results in a single-point failure architecture, thereby defeating the purpose of DR

        KB Articles:

        How many bootstrap hosts should my cluster have?

        Only mark the hosts that hold your security forests (and its local disk failover copy) as bootstrap hosts to avoid too many unnecessary connections between primary & replica clusters.


        How do I upgrade replicated environments?

        Replica First. If Security DB is replicated then Replica cluster must be upgraded before Master.


        How do I divert traffic away from my primary to replica cluster?
        • Disable database replication for the database on the replica cluster.
        • Make the replica cluster/database as the master.
        • Rolling Back to the Non-Blocking Timestamp on the new master


        Question Answer Further reading
        When does failover occur?
        • Failover occurs when a quorum of nodes votes a node out of a cluster
        • Voting depends on timely cluster heartbeats between its nodes. If a node isn't communicating with other nodes in the cluster, it gets voted out and its forests are failed over

        KB Articles:


        What nodes participate in quorum?

        All nodes in the cluster configuration count towards quorum, irrespective of the:

        • group they belong to
        • type of node it might be (E-node, D-node, E/D-node, etc.)
        • state of the node (online/offline)
        • forest, database or group configurations

        KB Articles:


        My cluster saw a failover event - does fail back happen automatically?
        • Failing back is a manual operation
        • Automatic fail-back is not supported due to the risks of unintentional overwrites and accidental data loss


         KB Articles:


        How should I distribute my forests across the nodes in my cluster?
        • Here is an example forest topology on a typical 3-node cluster:
          • Node1: df1.1, df1.2, ldf2.1, ldf3.1
          • Node2: df2.1, df2.2, ldf1.1, ldf3.2
          • Node3: df3.1, df3.2, ldf1.2, ldf2.2
        • Distributing local disk failover forests (LDFs) evenly on the other two nodes splits the load on each surviving node to just 150% of normal should a failover event occur
        • When scaling out the cluster, try adding nodes in "rings of three" to keep the above load distribution intact with minimal config changes within each ring (for example, nodes 4, 5, and 6 would mirror the data and LDF forest distribution seen in the ring made up of nodes 1, 2, and 3)

        KB Articles:

        How many Local Disk Failover forests (LDFs) should each of my primary forests have?
        • One LDF for each primary forest, hosted on a different machine than the one which hosts the primary forest
        • More than one LDF per primary is not recommended due to unnecessary increases in administrative complexity and hardware resource requirements

        KB Articles:

        Should I be backing up my local disk failover forests?
        • Local disk failover forests (LDFs) should be included in your backup if you expect backups to be taken in a failed over state. Note that this will typically double the size of your backup since you're including both data and LDF forests
        • If you're not in a failed over state, or you manually fail back before a backup starts, then you can reduce the size of your backups by only backing up your data forests

        KB Articles:

        MarkLogic Linux Tuned Profile


        The tuned tuning service can change operating system settings to improve performance for certain workloads. Different tuned profiles are available and choosing the profile that best fits your use case simplifies configuration management and system administration. You can also write your own profiles, or extend the existing profiles if further customization is needed. The tuned-adm command allows users to switch between different profiles.

        RedHat Performance and Tuning Guide: tuned and tuned-adm

        • tuned-adm list will list the available profiles
        • tuned-adm active will list the active profile

        Creating a MarkLogic Tuned Profile

        Using the throughput-performance profile, we can create a custom tuned profile for MarkLogic Server. First create the directory for the MarkLogic profile:

        sudo mkdir /usr/lib/tuned/MarkLogic/

        Next, create the tuned.conf file that will include the throughput-performance profile, along with our recommended configuration:

        # tuned configuration
        summary=Optimize for MarkLogic Server on Bare Metal
        vm.swappiness = 1
        vm.dirty_ratio = 40

        Activating the MarkLogic Tuned Profile

        Now when we do a tuned list it should show us the default profiles, as well as our new MarkLogic profile:

        $ tuned-adm list
        Available profiles:
        - MarkLogic                   - Optimize for MarkLogic Server
        - balanced                    - General non-specialized tuned profile
        - desktop                     - Optimize for the desktop use-case
        - hpc-compute                 - Optimize for HPC compute workloads
        - latency-performance         - Optimize for deterministic performance at the cost of increased power consumption
        - network-latency             - Optimize for deterministic performance at the cost of increased power consumption, focused on low latency network performance
        - network-throughput          - Optimize for streaming network throughput, generally only necessary on older CPUs or 40G+ networks
        - powersave                   - Optimize for low power consumption
        - throughput-performance      - Broadly applicable tuning that provides excellent performance across a variety of common server workloads
        - virtual-guest               - Optimize for running inside a virtual guest
        - virtual-host                - Optimize for running KVM guests
        Current active profile: virtual-guest

        Now we can make MarkLogic the active profile:

        $ sudo tuned-adm profile MarkLogic

        And then check the active profile:

        $ tuned-adm active
        Current active profile: MarkLogic

        Disabling the Tuned Daemon

        The tuned daemon does have some overhead, and so MarkLogic recommends that it be disabled. When the daemon is disabled, tuned will only apply the profile settings and then exit. Update the /etc/tuned/tuned-main.conf and set the following value:

        daemon = 0



        There is a lot of useful information in MarkLogic Server's documentation surrounding many of the new features of MarkLogic 9 - including the new SQL implementation, improvements made to the ODBC driver and the new system for generating SQL "view" templates for your data. This article attempts to pull it all together by showing all the measures needed to create a successful connection and to verify that everything is set up correctly and works as expected?

        This guide presents a step-by-step walk through covering the installation of all the necessary components, the configuration of the ODBC driver and the loading of data into MarkLogic in order to create a Template View that will allow a SQL query to be rendered.


        We're starting with a clean install of Redhat Enterprise Linux 7:

        $ uname -a
        Linux 3.10.0-327.4.5.el7.x86_64 #1 SMP Thu Jan 21 04:10:29 EST 2016 x86_64 x86_64 x86_64 GNU/Linux

        In this example, I'm using yum to manage the additional dependencies (openssl-libs and unixODBC) required for the MarkLogic ODBC driver:

        $ sudo yum install openssl-libs
        Package 1:openssl-libs-1.0.2k-8.el7.x86_64 already installed and latest version
        Nothing to do
        $ sudo yum install unixODBC
        Package unixODBC-2.3.1-11.el7.x86_64 already installed and latest version
        Nothing to do

        If you want to use the latest version of unixODBC (2.3.4 at the time of writing), you can get it using cURL by running curl -O

        $ curl -O
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed
        100 1787k  100 1787k    0     0   235k      0  0:00:07  0:00:07 --:--:--  371k

        Please note - as per the documentation, this method will require unixODBC to be compiled so additional dependencies may need to be met for this.

        This article assumes that you have downloaded the ODBC driver for MarkLogic Server and the MarkLogic 9 install binary and have those available on your machine:

        $ ll
        total 310112
        -r--r--r-- 1 support support 316795526 Nov 16 04:19 MarkLogic-9.0-3.x86_64.rpm
        -r--r--r-- 1 support support    754596 Nov 16 04:18 mlsqlodbc-1.3-3.x86_64.rpm
        Getting started: installing and configuring MarkLogic 9 with an ODBC Server

        We will start by installing and starting MarkLogic 9:

        $ sudo rpm -i MarkLogic-9.0-3.x86_64.rpm
        $ sudo service MarkLogic start
        Starting MarkLogic:                                        [  OK  ]

        From there, we can point our browser at http://host:8001 and walk through the initial MarkLogic install process:

        As soon as the install process has been completed and you have created an Administrator user for MarkLogic Server, we're ready to create an ODBC Application Server.

        To do this, go to Configure > Groups > Default > App Servers and select the Create ODBC tab:

        Next we're going to make the minimal configuration necessary by entering the required fields - the odbc server name, the Application Server module directory root and the port.

        In this example we will configure the Application Server using the following values:

        odbc server name

        After this is done, confirm that the Application Server has been created by going to Configure > Groups > Default > App Servers and ensure that you can see the ODBC Server listed and configured on port 5432 as per the image below:

        Getting started: Setting up the MarkLogic ODBC Driver

        Use RPM to install the ODBC driver:

        $ sudo rpm -i mlsqlodbc-1.3-3.x86_64.rpm
        odbcinst: Driver installed. Usage count increased to 1.
            Target directory is /etc

        Configure the base template as instructed in the installation guide:

        $ odbcinst -i -s -f /opt/MarkLogic/templates/mlsql.template
        Getting started: ensure unixODBC is configured

        To ensure the unixODBC commandline client is configured, you can run isql -h to bring up the help options:

        $ isql -h
        * unixODBC - isql                            *
        * Syntax                                     *
        *                                            *
        *      isql DSN [UID [PWD]] [options]        *
        *                                            *
        * Options                                    *
        *                                            *
        * -b         batch.(no prompting etc)        *
        * -dx        delimit columns with x          *
        * -x0xXX     delimit columns with XX, where  *
        *            x is in hex, ie 0x09 is tab     *
        * -w         wrap results in an HTML table   *
        * -c         column names on first row.      *
        *            (only used when -d)             *
        * -mn        limit column display width to n *
        * -v         verbose.                        *
        * -lx        set locale to x                 *
        * -q         wrap char fields in dquotes     *
        * -3         Use ODBC 3 calls                *
        * -n         Use new line processing         *
        * -e         Use SQLExecDirect not Prepare   *
        * -k         Use SQLDriverConnect            *
        * --version  version                         *
        *                                            *
        * Commands                                   *
        *                                            *
        * help - list tables                         *
        * help table - list columns in table         *
        * help help - list all help options          *
        *                                            *
        * Examples                                   *
        *                                            *
        *      isql WebDB MyID MyPWD -w < My.sql     *
        *                                            *
        *      Each line in My.sql must contain      *
        *      exactly 1 SQL command except for the  *
        *      last line which must be blank (unless *
        *      -n option specified).                 *
        *                                            *
        * Please visit;                              *
        *                                            *
        *               *
        *                      *
        *              *

        If you're not seeing the above message, it could be possible that there's another application on your system overriding this, for this configuration, the isql command is found at /usr/bin/isql:

        $ which isql /usr/bin/isql
        Getting started: initial connection test

        If you're happy that isql is correctly, installed, we're ready to test the connection using isql -v:

        $ isql -v MarkLogicSQL admin admin
        | Connected!                            |
        |                                       |
        | sql-statement                         |
        | help [tablename]                      |
        | quit                                  |
        |                                       |

        Let's confirm that it's really working by loading some data into MarkLogic and creating an SQL view around that data.

        Loading sample data into MarkLogic

        To load data, we're going to use Query Console to insert the same sample data that is created in the Quick Start Documentation:

        To access Query Console, point your browser at http://host:8000 and make note of the following:

        Ensure the database is set to Documents (or at least, matches the database specified by your ODBC Application Server) and ensure that the Query Type is set to JavaScript

        When these are both set correctly, run the code to generate sample data (note that this data is taken from the quick start guide and reproduced here for convenience):

        After that has run, you should see a null response back from the query:

        To confirm that the data was loaded successfully, you can use the Explore button.  You should now see that 22 employee documents (rows) are now in the database:

        Create the template view

        Now the documents are loaded, a tabular view for that data needs to be created.

        Ensure the database is (still) set to Documents (or at least, matches the database specified by your ODBC Application Server) and ensure that the Query Type is now set to XQuery

        As soon as this is set, you can run the code below to generate the template view (note that this data is taken from the quick start guide and reproduced here for convenience):

        And to confirm this was loaded, Query Console should report an empty sequence was returned.

        Test the template using a SQL Query

        The database should remain set to Documents and ensure that the Query Type is now set to SQL:

        Then you can run the following SQL Query:

        SELECT * FROM employees

        If everything has worked correctly, Query Console should render a view of the table in response to your query:

        Test the SQL Query via the ODBC Driver

        All that remains now is to go back to the shell and test the same connection over ODBC.

        To do this, we're going to use the isql command again and run the same request there:

        $ isql -v MarkLogicSQL admin admin
        | Connected!                            |
        |                                       |
        | sql-statement                         |
        | help [tablename]                      |
        | quit                                  |
        |                                       |
        SQL> select * from employees
        <<< RESPONSE CUT >>>
        SQLRowCount returns 7
        7 rows fetched

        Further reading


        This article details changes to the upgrade procedures for MarkLogic 9 AMIs.

        MarkLogic 9 now supports 1-click deployment in AWS Marketplace. This is an addition to existing options of manual launch of an AMI and launching MarkLogic clusters via CloudFormation templates. In order to make 1-click launch possible, our AMIs have pre-configured data volume (device on /dev/sdf).  The updated cloud formation templates account for the pre-configured data volume. This change also requires a different approach to our documented upgrade process.


        As per MarkLogic EC2 Guide, the main goal of the upgrade is to update AMI IDs in CloudFormation in order to upgrade all instances in the stack. There is now an additional step to handle the blank data volume that is pre-configured on MarkLogic AMIs.

        Always backup your data before attempting any upgrade procedures!

        Scenario 1:  You are using unmodified CF templates that were published by MarkLogic on starting from version 8.0-3.

        1. Update your CloudFormation stack with the latest template as there were no breaking changes since 8.0-3. The current templates for MarkLogic 9 include new AWS regions, new AMI IDs, and code to remove blank data volume that is bundled with current AMIs.
        2. In the EC2 Dashboard, stop one instance at the time and wait for it to be replaced with a new one.
        3. For a rolling upgrade (and as a good practice) terminate the other nodes one by one starting with the node that has Security database. They will come up and reconnect without any UI interaction.
        4. Go to 8001 port on any new instance where an upgrade prompt should be displayed.
        5. Click OK and wait for the upgrade to complete on the instance.

        Scenario 2: You made some changes to MarkLogic templates or you are using custom templates.

        1. Download current templates from
        2. Locate the AMI IDs by searching for "AWSRegionArch2AMI" block in the template.
          "AWSRegionArch2AMI": {
                "us-east-1": {
                  "HVM": "ami-54a8652e"
                "us-east-2": {
                  "HVM": "ami-2ab29f4f"
                }, ...
        3. Locate AMI IDs in the old template and replace them with the ones from the new template. 
        4. Locate "BlockDeviceMappings" section in the new template that was downloaded in step 1. This block of code was added to remove blank volume that is part of the new 1-click AMIs.
        5. Update the old template to include "BlockDeviceMappings" as a property of LaunchConfig. There will be one or three LaunchConfig blocks depending on the template used. Those can by located by searching for "AWS::AutoScaling::LaunchConfiguration". Here is an example of the new property under LaunchConfig.
          "Ebs": {}
        6. Once all the changes are saved, update your stack with the updated CloudFormation template. Make sure the stack update is complete.
        7. In the EC2 Dashboard, terminate nodes one by one starting with the node that has Security database. New nodes will come up after a couple of minutes and reconnect without any UI interaction.
        8. Wait for all nodes to be up and in green state.
        9. Go to 8001 port on any new instance where an upgrade prompt should be displayed.
        10. Click OK and wait for the upgrade to complete on the instance.

        Scenario 3: You have instances that were brought up directly from MarkLogic AMI. For each MarkLogic instance in your cluster, do the following:

        1. Terminate the instance.
        2. Launch a new instance from the upgraded AMI.
        3. Detach blank volume that is mounted on /dev/sdf (should be 10GB in size)
        4. Attach the EBS data volume associated with the original instance.

        More details on how to update CloudFormation stack can be found at

        Introduction: the decimal type

        In order to be compliant with the XQuery specification and to satisfy the needs of customers working with financial data, MarkLogic Server implements a decimal type, available in XQuery and server-side JavaScript.

        Decimal type has been implemented for very specific requirements, decimals have about a dozen more bits of precision than doubles but take up more memory and arithmetic operations over them are much slower.

        Use the double where possible

        Unless you have a specific requirement to use a Decimal data type, in most case it's better and faster to use the double data type to represent large numbers.

        Specific details about the decimal data type

        If you still want or need to use a decimal data type below are its limitations and details on how exactly it is implemented in MarkLogic Server:

        o   Precision

        • How many decimal digits of precision does it have?

        The MarkLogic implementation of xs:decimal representation is designed to meet the XQuery specification requirements to provide at least 18 decimal digits of precision. In practice, up to 19 decimal digits can be represented with full fidelity.

        • If it is a binary number, how many binary digits of precision does it have?

         A decimal number is represented inside MarkLogic with 64 binary bits of digits and an additional 64 bits of sign and a scale (specifies where the decimal point is).

        • What are the exact upper and lower bounds of its precision?

        -18446744073709551615 to 18446744073709551615 

        Any operation producing number smaller or bigger than this range will result in XDMP-DECOVRFLW error (decimal overflow)

        o   Scale

        • Does it have a fixed scale or floating scale?

        It has a floating scale.

        • What are the limitations on the scale?

        -20 to 0

        So you can only represent numbers between 1 * (2^-64) and 18446744073709551615

        • Is the scale binary or decimal?


        • How many decimal digits can it scale?


        • How many binary digits can it scale?


        • What is the smallest number it can represent and the largest?

        smallest: -1*(2^64)
        closest to zero: 1*(10^-20)
        largest: (2^64)

        • Are all integers safe or does it have a limited safe range for integers?

        It can represent 64 bit unsigned integers with full fidelity.


        o   Limitations

        • Does it have binary rounding errors?

        The division algorithm on Linux in particular does convert to an 80-bit binary floating point representation to calculate reciprocals - which can result in binary rounding errors. Other arithmetic algorithms work solely in base 10.

        • What numeric errors can it throw and when?

        Overflow: Number is too big or small to represent
        Underflow: Number is close to zero to represent
        Loss of precision: The result has too many digits of precision (essentially the 64bit digits value has overflowed)

        • Can it represent floating point values, such as NaN, -Infinity, +Infinity, etc.?


        o   Implementation

        • How is the DECIMAL data type implemented?

        It has a representation with 64 bits of digits, a sign, and a base 10 negative exponent (fixed to range from -20 to 0). So the value is calculated like this:

        sign * digits * (10 ^ -exponent)

        • How many bytes does it consume?

        On disk, for example in triple indexes, it's not a fixed size as it uses integer compression. At maximum, the decimal scalar type consumes 16 bytes per value: eight bytes of digits, four bytes of sign, and four bytes of scale. It is not space efficient but it keeps the digits aligned on eight-byte boundaries.


        A database or forest backup in MarkLogic Server may be significantly slower than just performing a file copy (cp in Linux).  Why is this so?


        Using cp on very large files on a large-memory linux can produce huge amounts of dirty pages that can saturate i/o channels for minutes in order to flush data to the disk. Cp also doesn’t wait for the data to be written before returning.  As a result, cp is very unfriendly to other applications running on the same system.

        When MarkLogic Server performs a backup, it works hard not to saturate any subsystem or resource. MarkLogic takes care that the number of dirty pages at any one time is never very large, and it keeps the i/o queues short so that any concurrent database queries and updates are not significantly impacted by the backup. Finishing the backup in the fastest possible time is not the priority. 

        Can I make it go faster?

        Yes, there is a diagnostic trace event “Unthrottle Backup” that turns off throttling in MarkLogic. However, even with throttling turned off, MarkLogic will still work to keep the number of dirty pages low.

        The diagnostic trace event can be enabled from the MarkLogic Server Admin UI by navigating to -> Configure -> Groups -> {group-name} -> Diagnostic:  trace events activated = true; Add  “Unthrottle Backup” (without quotes); Press "ok".


        MarkLogic automatically provides 

        • ANSI REPEATABLE READ level of isolation for update transactions, and 
        • Serializable isolation for read-only (query) transactions.

        MarkLogic can be made to provide ANSI SERIALIZABLE isolation for update transactions, but doing so requires developers to manage their own predicate locks.

        Isolation Levels - Background

        There are many possible levels of isolation, and many different taxonomies of isolation levels. The most common taxonomy (familiar to those with a RDBMS background) is the one defined by ANSI SQL, which defines four levels of isolation based on read phenomena that are possible at each level. ANSI has a definition for each phenomenon, but these definitions are open to interpretation. Broad interpretation results in more rigorous criteria for each isolation level (and therefore better isolation at each level), whereas strict interpretation results in less rigorous isolation at each level. Here I’ll use a shorthand notation to describe these phenomena, and will use the broad rather than the strict interpretation. The notation specifies the operation, the transaction performing the operation, and the item or domain on which the operation is performed. Operations in my notation are:

        • Write (w)
        • Read (r)
        • Commit (c)
        • Abort/rollback (a)

        An example of this shorthand: w1[x] means transaction1 writes to item x.

        Now the phenomena:

        • A dirty read happens when a transaction T2 reads an item that is being written by concurrently running transaction T1. In other words: w1[x]…r2[x]…((c1 or a1) and (c2 or a2) in any order). This phenomenon could lead to an anomaly in the case where T1 later aborts, and T2 has then read a value that never existed in the database.
        •  A non-repeatable read happens when a transaction T2 writes an item that was read by a transaction T1 prior to T1 completing. In other words: r1[x]…w2[x]…((c1 or a1) and (c2 or a2) in any order). Non-repeatable reads don’t produce the same anomalies as dirty reads, but can produce errors in cases where T1 relies on the value of x not changing between statements in a multi-statement transaction (e.g. reading and then updating a bank account balance).
        • A phantom read happens when a transaction T1 retrieves a set of data items matching some search condition and concurrently running transaction T2 makes a change that modifies the set of items that match that condition. In other words: (r1[P] and w2[x in P] in any order)…((c1 or a1) and (c2 or a2) in any order), where P is a set of results. Phantom reads are usually less serious than dirty or non-repeatable reads because it generally doesn’t matter if item x in P is written before or after T1 finishes unless T1 is itself explicitly reading x. And in this case the phenomenon would no longer be a phantom, but would instead be a dirty or non-repeatable read per the definitions above. That said, there are some cases where phantom reads are important.

         The isolation levels ANSI defines are based on which of these three phenomena are possible at that isolation level. They are:

        • READ UNCOMMITTED – all three phenomena are possible at this isolation level.
        • READ COMMITTED – Dirty reads are not possible, but non-repeatable and phantom reads are.
        • REPEATABLE READ – Dirty and non-repeatable reads are not possible, but phantom reads are.
        • SERIALIZABLE – None of the three phenomena are possible at this isolation level.

        Note that as defined above, ANSI SERIALIZABLE is not sufficient for transactions to be truly serializable (in the sense that running them concurrently and running them in series would in all cases produce the same result), so SERIALIZABLE is an unfortunate choice of names for this isolation level, but that’s what ANSI called it.

        Update Transaction Locks

        Typically, a DBMS will avoid dirty and non-repeatable reads by taking locks on records (called item locks). Locks are either shared locks (which can be held by more than one transaction) or exclusive locks (which can be held by only one transaction at a time). In most DBMSes (including MarkLogic), locks taken when reading an item are shared and locks taken when writing an item are exclusive.

        MarkLogic prevents dirty and non-repeatable reads in update transactions by taking item locks on items that are being read or written during a transaction and releasing those locks only on completion of the transaction (post-commit or post-abort). When a transaction needs to lock an item on which another transaction has an exclusive lock, that transaction waits until either the lock is released or the transaction times out. Deadlock detection prevents cases where two transactions are waiting on each other for exclusive locks. In this case one of the transactions will abort and restart.

        In addition, MarkLogic prevents some types of phantom reads by taking item locks on the set of items in a search result. This prevents phantom reads involving T2 removing an item in a set that T1 previously searched, but does not prevent phantom reads involving T2 inserting an item in a set that T1 previously searched, or those involving T2 searching for items and seeing a deletion caused by T1.

        Avoiding All Phantom Reads

        To avoid all phantom reads via locking, it is necessary to take locks not just on items that currently match the search criteria, but also on all items that could match the search criteria, whether they currently exist in the database or not. Such locks are called predicate locks. Because you can search for pretty-much anything in MarkLogic, guaranteeing a predicate lock for arbitrary searches would require locking the entire database. From a concurrency and throughput perspective, this is obviously not desirable. MarkLogic therefore leaves the decision to take predicate locks and the scope of those locks in the hands of application developers. Because the predicate domain can frequently be narrowed down with some application-specific knowledge, this provides the best balance between isolation and concurrency. To take a predicate lock, you lock a synthetic URI representing the predicate domain in every transaction that reads from or writes to that domain. You can take shared locks on a synthetic URI via fn:doc(URI). Exclusive locks are taken via xdmp:lock-for-update(URI).

        Note that predicate locks should only be taken in situations where phantom reads are intolerable. If your application can get by with REPEATABLE READ isolation, you should not take predicate locks, because any additional locking results in additional serialization and will impact performance.


        To summarize, MarkLogic automatically provides ANSI REPEATABLE READ level of isolation for update transactions and true serializable isolation for read-only (query) transactions. MarkLogic can be made to provide ANSI SERIALIZABLE isolation for update transactions, but doing so requires developers to manage their own predicate locks.


        Text is stored in MarkLogic Server in Unicode NFC normalized form.


        In MarkLogic Server, all text is converted into Unicode NFC normalized form before tokenization and storage. 

        Unicode considers NFC-compatible characters to be essentially equivalent. See the Unicode normalization FAQ and Conformance Requirements in the Unicode Standard.


        For example, consider the NFC equivalence of the codepoints x2126 (&#x2126) and x03A9 (&#x03A9). This is shown for the x2126 entry in the Unicode code chart for the U2100 block.

        You can see the effects of normalization alone, and during tokenization, by running the following in MarkLogic Server's Query Console:

        xquery version "1.0-ml";
        (: equivalence of Ω forms :)
        let $s := fn:codepoints-to-string (xdmp:hex-to-integer ('2126'))
        let $token := cts:tokenize ($s)
        return (
            'original: '||xdmp:integer-to-hex (fn:string-to-codepoints ($s)),
            'normalized: '||xdmp:integer-to-hex (fn:string-to-codepoints (fn:normalize-unicode ($s, 'NFC'))),
            'tokenized: '||xdmp:describe ($token, (), ())

        The results show the original value, the normalized value, and the resulting token:

        original: 2126
        normalized: 3a9
        tokenized: cts:word("&#x03a9;")


        In MarkLogic Server version 9, the default tokenization and stemming code has been changed for all languages (except English tokenization). Some tokenization and stemming behavior will change between MarkLogic 8 and MarkLogic 9. We expect that, in most cases, results will be better in MarkLogic 9.

        Information is given for managing this change in the Release Notes at Default Stemming and Tokenization Libraries Changed for Most Languages, and for further related features at New Stemming and Tokenization.

        In-depth discussion is provided below for those interested in details.

        General Comments on Incompatibilities

        General implications of tokenization incompatibilities

        If you do not reindex, old content may no longer match the same searches, even for unstemmed searches.

        General tokenization incompatibilities

        There are some edge-case changes in the handling of apostrophes in some languages; in general this is not a problem, but some specific words may include/break at apostrophes.

        Tokenization is generally faster for all languages except English and Norwegian (which use the same tokenization as before).

        General implications of stemming incompatibilities

        Where there is only one stem, and it is now different:  Old data will not match stemmed searches without reindexing, even for the
        same word.

        Where the new stems are more precise:  Content that used to match a query may not match any more, even with

        Where there are new stems, but the primary stem is unchanged:  Content that used to not match a query may now match it with advanced
        stemming or above. With basic stemming there should be no change.

        Where the decompounding is different, but the concatenation of the components is the same:  Under decompounding, content may match a query when it used to not match, or may not match a query when it used to match, when the query or content involves something with one of the old/new components. Matching under advanced or basic stemming would be generally the same.

        General stemming incompatibilities

        • MarkLogic now has general algorithms backing up explicit stemming dictionaries.  Words not found in the default dictionaries will sometimes be stemmed when they previously were not.
        • Diminutives/augmentatives are not usually stemmed to base form.
        • Comparatives/superlatives are not usually stemmed to base form.
        • There are differences in the exact stems for pronoun case variants.
        • Stemming is more precise and restricted by common usage. For example, if the past participle of a verb is not usually used as an adjective, then the past participle will not be included as an alternative stem. Similarly, plural forms that only have technical or obscure usages might not stem to the singular form.
        • Past participles will typically include the past participle as an alternative stem.
        • The preferred order of stems is not always the same: this will affect search under basic stemming.


        It is advisable to reindex to be sure there are no incompatibilities. Where the data in the forests (tokens or stems) does not match the current behavior, reindexing is recommended. This will have to be a forced reindex or a reload of specific documents containing the offending data. For many languages this can be avoided if queries do not touch on specific cases. For certain languages (see below) the incompatibility is great enough that it is essential to reindex.

        Language Notes

        Below we give some specific information and recommendations for various languages.



        The Arabic dictionaries are much larger than before. Implications:  (1) better precision, but (2) slower stemming.

        Chinese (Simplified)


        Tokenization is broadly incompatible.

        The new tokenizer uses a corpus-based language model.  Better precision can be expected.


        Reindex all Chinese (simplified).

        Chinese (Traditional)


        Tokenization is broadly incompatible.

        The new tokenizer uses a corpus-based language model.  Better precision can be expected.


        Reindex all Chinese (traditional).



        This language now has algorithmic stemming, and may have slight tokenization differences around certain edge cases.


        Reindex all Danish content if you are using stemming.



        There will be much more decompounding in general, but MarkLogic will not decompound certain known lexical items (e.g., "baastardwoorden").


        Reindex Dutch if you want to query with decompounding.



        British variants may include the British variant as an additional stem, although the first stem will still be the US variant.

        Stemming produces more alternative stems. Implications are (1) stemming is slightly slower and (2) index sizes are slightly larger (with advanced stemming).



        This language now has algorithmic stemming and may have slight tokenization differences around certain edge cases.


        Reindex all content in this language if you are using stemming.


        See general comments above.



        Decompounding now applies to more than just pure noun combinations. For example, it applies to "noun plus adjectives" compound terms. Decompounding is more aggressive, which can result in identification of more false compounds. Implications: (1) stemming is slower, (2) decompounding takes more space, and (3) for compound terms, search gives better recall, with some loss of precision.


        Reindex all German.



        This language now has algorithmic stemming and may have slight tokenization differences around certain edge cases.


        Reindex all content in this language if you are using stemming.


        See general comments above.



        Tokenization is broadly incompatible.

        The tokenizer provides internal flags that the stemmer requires.  This means that (1) tokenization is incompatible for all words at the storage level due to the extra information and (2) if you install a custom tokenizer for Japanese, you must also install a custom stemmer.


        Stemming is broadly incompatible.


        Reindex all Japanese content.



        Particles (e.g., 이다) are dropped from stems; they used to be treated as components for decompounding.

        There is different stemming of various honorific verb forms.

        North Korean variants are not in the dictionary, though they may handled by the algorithmic stemmer.


        Reindex Korean unless you use decompounding.

        Norwegian (Bokmal)


        Previously, hardly any decompounding was in evidence; now it is pervasive.

        Implications: (1) stemming is slower, (2) decompounding takes more space, and (3) search gives better recall, with some loss of precision, at least where it comes to compounds.


        Reindex Bokmal if you want to query with decompounding.

        Norwegian (Nynorsk)


        Previously hardly any decompounding was in evidence; now it is pervasive.

        Implications: (1) stemming is slower, (2) decompounding takes more space, and (3) search gives better recall, with some loss of precision, at least where it comes to compounds.


        Reindex Nynorsk if you want to query with decompounding.

        Norwegian (generic 'no')


        Previously 'no' was treated as an unsupported language; now it is treated as both Bokmal and Nynorsk: for a word present in both dialects, all stem variants from both will be present.


        Do not use 'no' unless you really must; reindex if you want to query it.


        See general comments above.



        More precision with respect to feminine variants (e.g., ator vs atriz).



        This language now has algorithmic stemming and may have slight tokenization differences around certain edge cases.


        Reindex all content in this language if you are using stemming.



        Inflectional variants of cardinal or ordinal numbers are no longer stemmed to a base form.

        Inflectional variants of proper nouns may stem together due to the backing algorithm, but it will be via affix-stripping, not to the nominal form.

        Stems for many verb forms used to be the perfective form; they are now the simple infinitive.

        Stems used to drop ё but now preserve it.


        Reindex all Russian.


        See general comments above.



        Previously hardly any decompounding was in evidence; now it is pervasive.

        Implications: (1) stemming is slower, (2) decompounding takes more space, and (3) search gives better recall, with some loss of precision, at least where it comes to compounds.


        Reindex Swedish if you want to query with decompounding.



        This language now has algorithmic stemming and may have slight tokenization differences around certain edge cases.


        Reindex all content in this language if you are using stemming.



        This language now has algorithmic stemming and may have slight tokenization differences around certain edge cases.


        Reindex all content in this language if you are using stemming.

        What is MarkLogic Data Hub?

        MarkLogic’s Data Hub increases data integration agility, in contrast to time consuming upfront data modeling and ETL. Grouping all of an entity’s data into one consolidated record with that data’s context and history, a MarkLogic Data Hub provides a 360° view of data across silos. You can ingest your data from various sources into the Data Hub, standardize your data - then more easily consume that data in downstream applications. For more details, please see our Data Hub documentation.

        Note: Prior to version 5.x, Data Hub was previously known as Data Hub Framework (DHF)


        • In contrast to previous versions, Data Hub 5 is largely configuration-based. Upgrading to Data Hub 5 will require either:
          • Conversion of legacy flows from the code-based approach of previous versions to the configuration-based format of Data Hub 5
          • Executing your legacy flows with the “hubRunLegacyFlow” Gradle task
        • It’s very important to verify the “Version Support” information on the Data Hub GitHub before installing or upgrading to any major Data Hub release


        One of the pre-requisites for installing Data Hub is to check for the supported/compatible MarkLogic Server version. For details, see our version compatibility matrix. Other pre-requisites can be seen here.

        New installations of Data Hub

        We always recommend installing the latest Data Hub version compatible with your current MarkLogic Server version. For example:

        -If a customer is running MarkLogic Server 9.0-7, one should install the most recent compatible Data Hub version (5.0.2), even if the previous Data Hub versions (such as 5.0.1, 5.0.0, 4.x and 3.x) also work with server version 9.0-7.

        -Similarly, if a customer is running 9.0-6, the recommended Data Hub version would be 4.3.1 instead of previous versions 4.0.0, 4.1.x, 4.2.x and 3.x.

        Note: A specific MarkLogic server version can be compatible with multiple Data Hub versions and vice versa, which allows independent upgrades of either Data Hub or MarkLogic Server.


        Upgrading from a previous version

        1. To determine your upgrade path, first find your current Data Hub version in the “Can upgrade from” column in the version compatibility matrix.
        2. While Data Hub should generally work with future server versions, it’s always best to run the latest Data Hub version that's also explicitly listed as compatible with your installed MarkLogic Server version.
        3. If required, make sure to upgrade your MarkLogic Server version to be compatible with your desired Data Hub version. You can upgrade MarkLogic Server and Data Hub independently of each other as long as you are running a version of MarkLogic Server that is compatible with the Data Hub version you plan to install. If you are running an older version of MarkLogic Server, then you must upgrade MarkLogic Server first, before upgrading Data Hub.

        Note: Data Hub is not designed to be 'backwards' compatible with any version before the MarkLogic Server version listed with the release. For example, you can’t use Data Hub 3.0.0 on 9.0-4 – you’ll need to either downgrade to Data Hub 2.0.6 while staying on MarkLogic Server 9.0-4, or alternatively upgrade MarkLogic Server to version 9.0-5 while staying on Data Hub 3.0.0.

        • Example 1 - Scenario where you DO NOT NEED to upgrade MarkLogic Server:


        • Current Data Hub version: 4.0.0
        • Target Data Hub version: 4.1.x
        • ML server version: 9.0-9
        • The “Can upgrade from” value for the target version shows 2.x which means you need to be at least be on Data Hub 2.x. Since, the current Data Hub version is 4.0.0, this requirement has been met.
        • Unless there is a strong reason for choosing 4.1.x, we highly recommend to upgrade to the latest version compatible with MarkLogic Server 9.0-9 in 4.x - which in this example is 4.3.2. Consequently, the recommended upgrade path here becomes 4.0.0-->4.3.2 instead of 4.0.0-->4.1.x.
        • Since 9.0-9 is supported by the recommended Data Hub version 4.3.2, there is no need to upgrade ML server.
        • Hence, recommended path will be Data Hub 4.0.0-->4.3.2


        • Example 2 - Scenario where you NEED to upgrade MarkLogic Server:


        • Current Data Hub version: 3.0.0
        • Target Data Hub version: 5.0.2
        • ML server version: 9.0-6
        • The “Can upgrade from” value for the target version shows Data Hub version 4.3.1 which means you need to be at least be on 4.3.x (4.3.1 or 4.3.2 depending on your MarkLogic Server version). Since the current Data Hub version 3.0.0 doesn’t satisfy this requirement, upgrade path after this step becomes Data Hub 3.0.0-->4.3.x
        • As per the matrix, the latest compatible Data Hub version for 9.0-6 is 4.3.1, so the path becomes 3.0.0-->4.3.1
        • From the matrix, the minimum supported MarkLogic Server version for 5.0.2 is 9.0-7, so you will have to upgrade your MarkLogic Server version before upgrading your Data Hub version to 5.0.2.
        • Because 9.0-7 is supported by all 3 versions under consideration (3.0.0, 4.3.1 and 5.0.2), recommended path can be either
          1. 3.0.0-->4.3.1-->upgrade MarkLogic Server version to at least 9.0-7-->upgrading Data Hub version to 5.0.2
          2. Upgrading MarkLogic Server version to at least 9.0-7-->upgrade Data Hub from 3.0.0 to 4.3.1-->upgrade Data Hub version to 5.0.2
        • Recall that Data Hub 5 moved to a configuration-based approach from previous versions’ code-based approach. Upgrading to Data Hub 5 from a previous major version will require either:
          • Conversion of legacy flows from the code-based approach of previous versions to the configuration-based format of Data Hub 5
          • Executing your legacy flows with the “hubRunLegacyFlow” Gradle task

        Links for Reference:







        Further Reading

        What are the maximum and minimum number of nodes a MarkLogic Cluster can have?

        Minimum: 1 node (3 nodes if you want high availability)

        Optimum: ~64 nodes

        Maximum: 256 nodes

        KB Articles:


        Are all nodes created equal in MarkLogic?

        In MarkLogic, how a node is configured, provisioned, and scaled depends on the type of that node and what roles it might serve:

        • A single node can act as an e-node, d-node, or both ("e/d-node")
        • With respect to high availability/failover, any one node serves as both primary host (for its assigned data forests) and failover host (for its assigned failover forests)
        • With respect to disaster recovery/replication, nodes can serve as either hosts for primary data forests in the primary cluster, or as hosts for replica forests in the replica cluster
        • Bootstrap hosts  are used to establish an initial connection to foreign clusters during database replication. Only the nodes hosting your security forests (both primary security forests as well as their local disk failover copies) need to be bootstrap hosts

        KB Articles:


        Can I have nodes with mixed specifications within a cluster?

        • Queries in MarkLogic Server use every node in the cluster
        • Fast nodes will wait for slow nodes - especially slow d-nodes
        • Therefore, all nodes - especially all d-nodes - should be of the same hardware specification

        KB Articles:


        Does MarkLogic support Horizontal Scaling or Vertical Scaling?

        • Both horizontal (more nodes) and vertical scaling (bigger nodes) are possible with MarkLogic Server
        • Do note that high availability (HA) in MarkLogic Server requires at least some degree of horizontal scaling with a minimum of three nodes in a cluster
        • Given the choice between one big node and three smaller nodes, most deployments would be better off with three smaller nodes to take advantage of HA



        I'm confused about high availability (HA) vs. disaster recovery (DR) - How does MarkLogic do HA?  - How does MarkLogic do DR?

        • High Availability (HA) in MarkLogic Server involves automatic forest failover, which maintains database availability in the face of host failure. Failing back is a manual operation
        • Disaster Recovery (DR) in MarkLogic Server involves a separate copy - with smaller data deltas (database replication) or larger (backup/restore). Switching to and back from DR copies are both manual operations


        How many forests can a MarkLogic cluster have?

        Maximum: 1024 (including Local Disk Failover forests)

        KB Articles

        What is the maximum size for a forest in MarkLogic?

        • The rule-of-thumb maximum size for a forest is 512GB
        • It's almost always better to have more small forests instead of one very large forest
        • It's important to keep in mind that forests have hard maximums for:
          • Number of stands
          • Number of fragments

        KB Articles:


        How many documents per forest/database?

        While MarkLogic Server does not have a practical or effective limit on the number of documents in a forest or database, you'll want to watch out for:

        • Size of forests - as bigger forests require more time and computational resources to maintain
        • Maximum number of stands per forest (64) is a hard stop and difficult to unwind - so it's important that your database is merging often enough to stay well under that limit. Most deployments don't come close to this maximum unless they're underprovisioned and therefore merging too slowly or too infrequently
        • Maximum number of fragments per stand (on the order of tens or hundreds of millions). Most deployments typically scale horizontally to more forests (and therefore more stands) well before needing to worry about the number of fragments in a specific stand

        KB Articles:


        How should I configure my default databases (like security)?

        • The recommended number of local disk failover (LDF) forests for default databases is one for each primary forest
        • For example - each default database (including security) should have one data forest and one LDF forest
        • More LDF copies are not recommended as they're almost never worth the additional administrative complexity and dedicated hardware resources

        KB Articles:

        What is the recommended record or document size?

        100 kb +/- two orders of magnitude (1 kB - 10 MB)

        KB Articles:

        What is the recommended number of range indexes for a database?

        • On the order of 100 or so
        • If you need many more, revise your data model to take advantage of Template Driven Extraction (TDE)

        KB Articles


        Does it help to do concurrent MLCP jobs in terms of performance?

        • Each MLCP job, starting in version 10.0-4.2, uses the maximum number of threads available on the server as the default thread count
        • Since a single job already uses the all the available threads, concurrent MLCP jobs won't be helpful in terms of performance

        KB Articles:


        Should we backup default databases?

        • We recommend regular backups for the Security database
        • If actively used, regular backups are recommended for Schemas, Modules, Triggers and other default databases

        KB Articles:

        Backup/restore best practices?

        • Backups can be CPU/RAM intensive
        • Incremental backups minimize storage, not necessarily time
        • Unless your cluster is over-provisioned compared to most, concurrent backup jobs are not recommended
        • The "Include Replica" setting allows for backup if failed over - but also doubles your backup footprint in terms of storage
        • The "Max Backups" setting is applicable only for full backups

        KB Articles:


        Do we need to mirror configuration between primary and replica databases? If so, how do we do it?
        • Yes - primary and replica databases should have mirrored configurations. If the replica database's configuration is different, query results from the replica database will also be different

        • Configurations can be mirrored with Configuration Manager (deprecated in 10.0-3), or mlgradle/Configuration Management API (CMA)

        KB Articles:

        Sub FAQs

        Local Disk Failover

        MarkLogic Fundamentals FAQ - Local Disk Failover

        Database Replication

        MarkLogic Fundamentals FAQ - Database Replication

        Common Error Messages

        MarkLogic Fundamentals FAQ - Common Error Messages


        In addition to the multiple language support in MarkLogic Server, MarkLogic Server also supports ISO codes listed below for representation of names for these languages.


        MarkLogic supported ISO codes

        MarkLogic supports following ISO codes for the representation of language names:
        1. ISO 639-1
        2. ISO 639-2/T , and
        3. ISO 639-2/B

        Further, NOTE:
        a. MarkLogic uses the 2-letter ISO 639-1 codes, including zh's zh_Hant variant, and
        b. MarkLogic uses the 3-letter ISO 639-2 codes. To get a more specific list of ISO 639-2 codes go to

        Again, MarkLogic only supports below listed languages,
        Chinese (Simplified and Traditional)
        Persian (Farsi)
        Norwegian (Nynorsk and Bokmål)



        The function cdict:get-languages() can be used to get ISO Codes for all supported languages. Here is an example of the usage:

          xquery version "1.0-ml";
          import module namespace cdict = "" 
        		  at "/MarkLogic/custom-dictionary.xqy";
          ==> ("en", "ja", "zh", "zh_Hant")



        There are many different kinds of locks present in MarkLogic Server.

        Transaction locks are obtained when MarkLogic Server detects the potential of a transaction to change the database, at which point the server considers it to be an update transaction. Once a lock is acquired, it is held until the transaction ends. Transaction locks are set by MarkLogic Server either explicitly or implicitly depending on the configured commit mode. Because it's very common to see poorly performing application code written against MarkLogic Server due to unintentional locking, the two concepts of transaction type and commit mode have been combined into a single, simpler control - transaction mode

        MarkLogic Server also has the notion of document and directory locks. Unlike transaction locks, document and directory locks must be set explicitly and are persistent in the database - they are not tied to a transaction. Document locks also apply to temporal documents. Any version of a temporal document can be locked in the same way as a regular document.

        Cache partition locks are used by threads which can make changes to a cache. Threads need to acquire a write lock for both the relevant cache and cache partition before it makes the change.

        Transaction Locks and Commit Mode vs. Transaction Mode

        Transaction lock types are associated with transaction types. Query type transactions do not use locks to obtain a consistent view of data, but rather the state of the data at a particular timestamp. Update type transactions have the potential to change the database and therefore require locks on documents to ensure transactional integrity. 

        So - if an update transaction type is run in explicit commit mode, then locks are acquired for all statements in an update transaction -  whether or not those statements perform updates. Once a lock is acquired, it is held until the transaction ends. If an update transaction type is run in auto commit mode, by default MarkLogic Server detects the transaction type through static analysis of the first statement in that transaction. If the server detects the potential for updates during static analysis, then the transaction is considered an update transaction - which results in a write lock being acquired.

        In multi-statement transactions, if an update transaction type is run in explicit commit mode, then the transaction is an update transaction and locks are acquired for all statements in an update transaction - even if no update occurs. In auto commit mode MarkLogic Server determines the transaction type through static analysis of the first statement. If in auto commit mode, and the first statement is a query, and an update occurs later in that transaction, MarkLogic Server will throw an exception. In multi-statement transactions, the transaction ends only when it is explicitly committed or rolled back. Failure to explicitly commit or roll back a multi-statement transaction might retain locks until the transaction times out or reaches the end of the session - at which point the transaction rolls back.

        Best practices:

        1) Avoid unnecessary transaction locks or holding on to transaction locks for too long. For single-statement transactions, do not explicitly set the transaction type to update if running a query. For multi-statement transactions, always explicitly commit or rollback the relevant transaction to free transaction locks as soon as possible.

        2) It's very common for users to write code that unintentionally takes write locks. One of the best ways to avoid unintentional locks is to use transaction modes instead of transaction types/commit modes. Transaction modes combines transaction type and commit mode into a single configurable value. You can read more about transaction mode in our documentation at: Transaction Mode Overview.

        3) Be aware that when setting transaction mode, the xdmp:commit and xdmp:update XQuery prolog options affect only the next transaction created after their declaration; they do not affect an entire session. Use xdmp:set-transaction-mode or xdmp.setTransactionMode if you need to change the transaction mode settings at the session level.

        Document and Directory Locks

        Document and directory locks are not tied to a transaction. The locks must be explicitly set and stored as a lock document in a MarkLogic Server database. So the locks can last a specified time period or be persistent until explicitly unlocked.

        Each document and directory can have a lock. The lock can be used as part of an application's update strategy. MarkLogic Server provides the flexibility for client to set up a policy of how to use the locks that suitable for client environment. For example, if only one user is allowed to update the specific database objects, you can set the lock to be "exclusive." In contrast, if you have multiple users updating the same database object, you can set the lock to be "shared."

        Unlike transaction locks, document and directory locks are persistent in the database and are consequently searchable.   

        Temporal Document Locks

        A temporal collection contain bi-temporal or uni-temporal documents. Each version of a temporal document can be locked in the same way as a regular, non-temporal document.

        Cache and Cache Partition Locks

        If a thread attempts to make a change to database cache, it needs to acquire a write lock for the relevant cache and cache partition. This cache or cache partition write lock is serializes write access, which keep date in the relevant cache or cache partition thread-safe. While cache and cache partition locks are short-lived, be aware that in the case of a single cache partition, all of the threads needing to access that would need to serialize through a single cache partition write lock. For multiple cache partitions, multiple write locks can be acquired with one lock per partition - which allows multiple threads to make concurrent cache partition updates.

        References and Additional Reading:

        1) Understanding Transactions in MarkLogic Server

        2) Cache Partitions

        3) Document and Directory Locks

        4) Understanding Locking in MarkLogic Server Using Examples

        5) Understanding XDMP-DEADLOCK

        6) Understanding the Lock Trace Diagnostic Trace Event

        7) How MarkLogic Server Supports ACID Transactions

        Updates are a key aspect of data manipulation in MarkLogic Server, and can sometimes be performance intensive, especially if performed in bulk. Therefore one should take time to consider exactly how your application will perform updates. Moreover, a given document often is associated with data other than its content, such as attributes, permissions, collections, quality, and metadata - all of these attributes can be affected by a chosen update method.

        MarkLogic Server offers various methods to update a document, but there are two major ways to do it, in general:

        • node-replace - Replaces a node in an existing document
        • document-insert - Inserts an entirely new document into the database or replaces the content of an existing document based on whether or not a document with a specified URI already exists.

        Although there is no material difference between node-replace and document-insert, using node-replace for updates is better because it preserves document attributes like permissions, collections, quality and metadata as opposed to document-insert which replaces all the aforementioned attributes along with the content of the document unless these attributes are explicitly found and attached to the insert query.

        Note: Using ‘node-replace’ is the authoritative way of updating documents among all the node-level update functions


        For updating a small set of documents where it is important to preserve all attributes of a document, ‘node-replace’ would be a better choice as it saves the overhead of finding the existing attributes by yourself. On the other hand, if query performance holds a higher priority over preserving the existing attributes of a document, ‘document-insert’ would likely be a better choice as it is faster when used without querying for the attributes. There is, however, no significant difference between the two if used in a similar fashion.

        With the release of MarkLogic Server versions 8.0-8 and 9.0-4, detailing memory use broken out by major areas is periodically recorded to the error log. These diagnostic messages can be useful for quickly identifying memory resource consumption at a glance and aid in determining where to investigate memory-related issues.

        Error Log Message and Description of Details

        At one hour intervals, an Info level log message will be written to the server error log in the following format:

        Info: Memory 18% phys=147456 virt=246146(166%) rss=27330(18%) anon=53794(36%) file=250(0%) forest=1021(0%) cache=40960(27%) registry=1(0%)

        The error log entry contains memory-related figures for non-zero statistics: Raw figures are in megabytes; Percentages are relative to the amount of physical memory reported by the operating system. The figures include:

        Memory: Percentage of physical memory consumed by the MarkLogic Server process;
        phys: Size of physical memory in the machine ;
        virt: Size of virtual address space reported by the operating system. This figure is often greater than 100%;
        swap: The amount of swap consumed by the MarkLogic Server process;
        rss: Resident Set Size reported by the operating system;
        anon: Anonymous mapped memory used by the MarkLogic Server;
        file: Total amount of memory-mapped data files used the MarkLogic Server. (The MarkLogic Server executable itself, for example, is memory-mapped by the operating system, but is not included in this figure.) ;
        forest: Forest-related memory allocated by the MarkLogic Server process;
        cache: User configured cache memory (list cache, expanded tree cache, etc) consumed by the MarkLogic Server process;
        registry: Amount of memory consumed by registered queries;
        huge: Huge page memory reserved by the operating system, and percentage comparing this to total physical memory;
        join: Memory consumed by joins for active running queries within the MarkLogic Server process, and percentage comparing this to total physical memory;
        unclosed: Unclosed memory, signifying memory consumed by unclosed or obsolete stands still held by the MarkLogic Server process, and percentage comparing this figure to total physical memory.

        In addition to reporting once an hour, the Info level error log entry is written whenever the amount of main memory used by MarkLogic Server changes by more than five percent from one check to the next. MarkLogic Server will check the raw metering data obtained from the operating system once per minute. If metering is disabled, the check will not occur and no log entries will be made.

        With the release of MarkLogic Server versions 8.0-8 and 9.0-5, this same information will be available in the output from the function xdmp:host-status().

        <host-status xmlns="">
        . . .
        . . .

        Additionally, with the release of MarkLogic Server 8.0-9.3 and 9.0-7, Warning-level log messages may be reported when the host is low on memory — the messages will indicate the areas involved, for example:

        Warning: Memory low: forest+cache=97%phys

        Warning: Memory low: huge+anon+swap+file=128%phys

        The messages are reported if the total memory used by the mentioned areas is greater than 90% of physical memory (phys). As best practice, the total of the areas should never be more than around 80% of physical memory, and should be even less if you are using the host for query processing.

        If the hosts are regularly encountering these warnings, remedial action to support the memory requirements might include:

        • Adding more physical memory to each of the hosts;
        • Adding additional hosts to the cluster to spread the data across;
        • Adding additional forests to any under-utilized hosts.

        Other action might include:

        • Archiving/dropping any older forest data that is no longer used;
        • Reviewing the group level cache settings to ensure they are not set too high, as they make up the cache part of the total. For reference, default (and recommended) group level cache settings based on common RAM configurations may be found in our Group Level Cache Settings based on RAM Knowledgebase article.


        This enhancement to MarkLogic Server allows for easy periodic monitoring of memory consumption over time, and records it in a summary fashion in the same place as other data pertaining to the operation of a running node in a cluster. Since all these figures have at their source raw Meters data, more in-depth investigation should start with the Meters history. However, having this information available at a glance can aid in identifying whether memory-related resources need to be explored when investigating performance, scale, or other like issues during testing or operation.


        The MarkLogic Monitoring History feature allows you to capture and view critical performance data from your cluster. By default, this performance data is stored in the Meters database. This article explains how you can plan for the additional disk space required for the Meters database.

        Meters Database Disk Usage

        Just like any other database, Meters database is also made up of forests which in turn are made up of stands that reside physically on-disk. As Meters database is used by Monitoring History to store critical performance data of your cluster, the amount of information can grow significantly with more number of hosts, forests, databases etc. Thus the need to plan and manage the disk space required by Meters database.


        Meters database stores critical performance data of your cluster. The size of data is proportional to the number of hosts, app servers, forests, databases etc. Typically, the raw retention settings have the largest impact on size.

        MarkLogic's recommendation for a new install is to start with the default settings and monitor usage over the first two weeks of an install. The performance history charts, constrained to just show the Meters database, will show an increasing storage utilization over the first week, then leveling off for the second week. This would give you a decent idea of space utilization going forward.

        You can then adjust the number of days of raw measurements that are retained.

        You can also add additional forests to spread the Meters database over more hosts if needed.

        Monitoring History

        The Monitoring History feature allows you to capture and view critical performance data from your cluster. Monitoring History capture is enabled at the group level. Once the performance data has been collected, you can view the data in the Monitoring History page.

        By default, the performance data is stored in the Meters database. A consolidated Meters database that captures performance metrics from multiple groups can be configured, if there is more than one group in the cluster.

        Monitoring History Data Retention Policy

        How long the performance data should be kept in the Meters database before it is deleted can be configured with the data retention policy. (

        If it is observed that meters data is not being cleared according to the retention policy, the first place to check would be the range indexes configured for the Meters database.

        Range indexes and the Meters Database

        Meters database is configured with a set of range indexes which, if not configured correctly (or not present) can prevent the cleaning up of Meters database according to the set retention policy.

        It is possible to have missing or misconfigured range indexes in either of the below scenarios

        •  if the cluster was upgraded from a version of ML before 7.0 and the upgrade had some issues
        •  if the indexes were manually created (when using another database for meters data instead of the default Meters database)

        The size of the meters database can grow significantly as the cluster grows, so it is important that the meters database is cleared per the retention policy.

        The required indexes (as of 8.0-5 and 7.0-6) are attached as an ML Configuration Manager package( Once these are added, the Meters database will reindex and the older data should be deleted.

        Note that deletion of data older than the retention policy occurs no sooner than the retention policy. Data older than the retention policy may still be maintained for an unspecified amount of time.

        Related documentation














        Prior to MarkLogic 4.1-5, role-ids were randomly generated.  We now use a hash algothm that ensures that roles created with the same name will be assigned the same role-id.  When attempting to migrate data from a forest created prior to MarkLogic 4.1-5 to a newer installation can cause the user to be met with a "role not defined error".  In order to work around this issue, we will need to create a new role with the role-id defined in the legacy system. 


        This process creates a new role with the same role-id from your legacy installation and assigns this old role to your new role with the correct name.

        Step 1: You will need to find the role-id of the legacy role. This will need to be run against the security DB on the legacy server. 


        xquery version "1.0-ml";
        import module namespace sec="" at

        let $role-name := "Enter Roll Name Here" 



        Step 2: In the new environment, store the attached module to the following location on the host containing the security DB.


        Step 3: Ensure that you have created the role on the new cluster.

        Step 4: Run the following code against the new clusters security DB. This will create a new role with the legacy role-id. Be sure to enter the role name, description, and role-id from Step 1.

        xquery version "1.0-ml";
        import module namespace cmr="" at

        let $role-name := "ENTER ROLE NAME"
        let $role-description := "ENTER ROLE DESCRIPTION"
        let $legacy-role-id := 11658627418524087702 (: Replace this with the Role ID from Step 1:)

        let $legacy-role := fn:concat($role-name,"-legacy")
        let $legacy-role-create := cmr:create-role-with-id($legacy-role, $role-description, (), (), (), $legacy-role-id)

        fn:concat("Inserted role named ",$legacy-role," with id of ",$legacy-role-id)


        Step 5: Run the following code against the new clusters security database to assign the legacy role to the new role.

        xquery version "1.0-ml";
        import module namespace sec="" at

        let $role-name := "ENTER ROLE NAME"
        let $legacy-role := fn:concat($role-name,"-legacy")

        sec:role-set-roles($role-name, ($legacy-role)),
        "Assigned ",$legacy-role," role to ",$role-name," role"



        You should now have a new role named [your-role]-legacy.  This legacy role will contain the role-id from your legacy installation and will be assigned to [your-role] on the new installation.  Legacy documents in your DB will now have the same rights they had in the legacy system.


        Those familiar with versions of MarkLogic Server prior to MarkLogic 7 may have heard the 3X disk space rule being mentioned. At the time of writing, references to are to be found in the MarkLogic 5 documentation and the MarkLogic 6 documentation

        The Monitoring Metrics of Interest section in the Monitoring MarkLogic Guide refers to the 3X rule as during a preparatory question on disk allocation for a database:

        • Is there enough disk space for forest data and merges? Merges require at least twice as much free disk space as used by the forest data (3X rule). If a merge runs out of disk space, it will fail.

        For anyone reading the requirements guidelines for MarkLogic 7 (and above), you may have noticed a section that suggests that you should plan to ensure disk space is available to:

        • 1.5 times the disk space of the total forest size. Specifically, each forest on a filesystem requires its filesystem to have at least 1.5 times the forest size in disk space (or, for each forest less than 32GB, 3 times the forest size). This translates to 1.5 times the disk space of the source content after it is loaded.

          For example, if you plan on loading content that will result in a 100 GB database, reserve at least 150GB of disk space. The disk space reserve is required for merges.

        This Knowledgebase article will cover both requirements and offer some further guidance as to how to plan and size your databases and - crucially - how you can take advantage of the newer 1.5X rule.


        The original logic behind the allocation of 3X disk space was to provide ample space to allow for a situation where a database is fully reindexed. The allocation would be in thirds according to the following measures:

        1. Your Data
        2. Space for reindexing
        3. Space for merges

        The 3X disk provision rule was offered as a very general (and very safe for production) rule to cover the most extreme example where your data gets reindexed in its entirety and then merges have to take place on top of that.

        ... but why 3X?

        To understand this, we need to briefly explore what happens when a document is updated in MarkLogic Server.

        As an update is made to a document - and the same rule applies to an update to a document when index changes are concerned - the transaction takes place at a given timestamp (a given point in time). At that point, the original fragment is marked as deleted and a new fragment is written to an in-memory-stand. Eventually, the in-memory stand is written to disk.

        For a period of time - especially at times where a MarkLogic instance/cluster is busy performing a large number of updates - it's likely that there will be occasions where two versions of the same fragment exist in different stands on disk; one stand will contain the fragment now marked as deleted and the other stand will contain the newly written fragment - which will be used by any subsequent queries running at later timestamps.

        ... so that covers 2X - what about the other third?

        When a merge takes place, merge candidate stands are identified and a new stand is created. As the candidate stands are read through, the active fragments are copied over to the new stand.

        At the point where the merge takes place, the new stand coexists with the older stand because - like updates and reindexing - queries will still need to run against the candidate stands; the timestamp will only get moved on to accommodate the data in the new stand as soon as the process has completed in it's entirety.

        While all of this is taking place, other updates could be taking place to documents in other stands and the same rules apply to those fragments too.

        So the 3X rule provides a true safeguard; allowing for a situation where forest sizes are likely to swell way above and beyond the size of the data they contain, to accommodate the fragments marked deleted for queries at earlier timestamps and to accommodate the additional headroom required by a merge of some very large stands.


        Some changes were made in MarkLogic 7 which effectively reduce the footprint of your data on-disk. With some careful planning, you can take advantage of the lower sizing rule.

        While the documentation still acknowledges the 3X rule (which is still true if you're performing an upgrade directly from MarkLogic 6 or earlier without making any other configuration changes), a new default configuration has been introduced to databases created under MarkLogic 7; this is the merge max size

        What does the merge max size do?

        This setting enforces an upper limit of 32GB on the size of an individual stand.

        With previous versions of the product, the expectation would be for the contents of a forest to merge down to one large stand. That is: given a quiesced database, on full completion of a merge, all content (all active fragments) should be in a single stand.

        For databases on MarkLogic 7 (and later), you can now expect to see more stands - each with a maximum size of 32GB.

        This means you should expect to see your data in more stands than you would have done on prior versions of the product, but it also means that you can lower the amount of disk space you need due to this size restriction.

        From MarkLogic 7 and onwards - with the merge max size correctly set - the largest amount of space a single merge operation should require would be 64GB

        ... but why 1.5X?

        If we return to this line in the documentation:

        • For example, if you plan on loading content that will result in a 100 GB database, reserve at least 150GB of disk space. The disk space reserve is required for merges.

        Given that we now have an upper limit on the size of a stand (32GB), as two smaller stands are being merged to create the new, larger stand and given the space required by other concurrent operations that may be taking place in other stands, a space limit of 1.5X should now cover any merges (and subsequent updates to documents).

        For further understanding or the 1.5X rule, read our knowledgebase article 'Explanation of the 1.5X Disk Space Requirement' .

        How do I find out whether my database is configured for this new merge max size?

        If you're on the admin interface at http://[yourhostname]:8001

        Go to: Configure > Databases > [Your Database Name] > Merge Policy

        On the right-hand panel, you should see the merge max size; the default should now be 32768

        Important caveats

        MarkLogic 7 is designed to allow you to work with more stands. While it's safe to say that you should be concerned when you see a system with a very large number of small stands exists, a slightly different rule requires a shift in thinking and this has implications in particular when you start to think about applying the 1.5x disk space rule in your environment.

        In releases prior to MarkLogic 6, the expectation (over time) was that all data in a forest would ultimately attempt to get merged into a single stand.

        In MarkLogic 7, at least with the default setting of the merge-max-size (to 32768 - 32GB), it is understood that a reasonably large forest would now be divided into a number of 32GB stands.

        If you are strictly following this rule for all reasonably large forests on your system - then the 1.5x rule can safely be used operationally in a production environment, but reliance on the rule should require careful management when migrating an existing system as running out of disk space can have catastrophic consequences for a live system.

        For very small forests, the 1.5X rule does not apply.  Due to the 32GB stand size overhead, your forests need to be sufficiently larger in order to use the 1.5X rule. 

        You should treat the 1.5x rule as an absolute minimum requirement for disk space for a given database. If you are going to use it, we would recommend having a strategy in place for allocating more space until you are confident that the cluster can run safely within the lower (1.5x) boundaries.

        I'm upgrading from an earlier version of MarkLogic to MarkLogic 7 - I have changed the merge max size to 32768. Can I reclaim the disk space?

        It's important to note that the 1.5x guidelines will only work if your forests all contain stands that have the new maximum size of 32GB. If your forests still contain larger stands, you'll need to break these down before you can consider reclaiming disk space. 

        ... Breaking Large Stands Down

        If your forests contain stands larger than 32 GB, you will want to break these stands down in order to take advantage of the lower disk space requirements.

        Different techniques can be followed to break the stands and reclaim disk space:

        1. Re-ingesting the content of the forests with large stands - When documents are re-ingested in a forest, the old fragments will be marked as deleted and the new fragment will be written to a new stand. Once there are sufficient deleted fragments, the large stands will be merged down into smaller stands.
        2. Perform re-indexing – A Forced re-index will update every fragment in the database, effectively re-loading the content - the original fragments will be marked as deleted and the new fragments will be written to a new stand. Once there are sufficient deleted fragments, the large stands will be merged down into smaller stands.  
        3. Forest rebalancing  - Rebalance active fragments from existing forests and retire old forest with Max Merge Size configured, this will merge out deleted fragments in old stand and maintain active fragments in smaller stand/stands in other rebalanced forests.


        The major points for the 1.5X rule:

        • The estimated 1.5X disk space utilization is only true for databases where merge-max-size is correctly set and for forests that are sufficiently large. For databases created in MarkLogic Server v7 or later, the default merge-max-size is to 32768 (32GB)
        • If you're upgrading from earlier releases, you would need to make sure you set this value as part of your upgrade process.
          • After upgrading from a version previous to MarkLogic 7, you will have to take explicit steps to decrease the size of any pre-existing large stands. 



        New and updated mimetypes were added for MarkLogic 8.  If your MarkLogic Server instance has customized mimetypes, the upgrade to MarkLogic Server v8.0-1 will not update the mimetypes table. 


        MarkLogic 8 includes the following new mimetype values:

        Name    Extension Format
        application/json json json
        application/rdf+json rj json
        application/sparql-results+json srj json
        application/xml xml xsd xvs sch    xml
        text/json   json
        text/xml   xml
        application/vnd.marklogic-javascript     sjs text
        application/vnd.marklogic-ruleset rules text

        If you upgraded to 8.0 from a previous version of MarkLogic Server and if you have ever customized your mimetypes (for example, using the MIME Types Configuration page of the Admin Interface), the upgrade will not automatically add the new mimetypes to your configuration. If you have not added any mimetypes, then the new mimetypes will be automatically added during the upgrade. You can check if you have these mimetypes configured by going to the Mimetype page of the Admin Interface and checking if the above mimetypes exist. If they exist, then there is nothing you need to do.


        Not having these mimetypes may lead to application level failures - for example: running Javascript code via Query Console will fail. 

        Resolving Manually

        If you do not have the above mimetypes after upgrading to 8.0, you can manually add the mimetypes to your configuration using the Admin Interface. To manually add the configuration, perform the following

        1. Open the Admin Interface in a browser (for example, open http://localhost:8001).
        2. Navigate to the Mimetypes page, near the bottom of the tree menu.
        3. Click the Create tab.
        4. Enter the name,the extension, and the format for the mimetype (see the table above).
        5. Click OK.
        6. Repeat the preceding steps for each mimetype in the above table.

        Please be aware that updating the mimetype table results in a MarkLogic Server restart.  You will want to execute this procedure when MarkLogic Server is idle or during a maintenance window.

        Resolve by Script

        Alternatively, if you do not have the above mimetypes after upgrading to 8.0, you can add the mimetypes to your configuration by executing the following script in Query Console:

        xquery version "1.0-ml";

        import module namespace admin = "" at "/MarkLogic/admin.xqy";
        declare namespace mt = "";

        let $config := admin:get-configuration()
        let $all-mimetypes := admin:mimetypes-get($config) (: existing mimetypes defined :)
        let $new-mimetypes := (admin:mimetype("application/json""json""json"),
            admin:mimetype("application/xml""xml xsd xvs sch""xml"),
            admin:mimetype("application/vnd.marklogic-javascript", "sjs", "text"),
            admin:mimetype("application/vnd.marklogic-ruleset", "rules", "text"))
        (: remove intersection to avoid conflicts :)
        let $delete-mimetypes :=
            for $mimetype in $all-mimetypes
            return if ($mimetype//mt:name/data() = $new-mimetypes//mt:name/data()) then $mimetype else ()
        let $config := admin:mimetypes-delete($config, $delete-mimetypes)
        (: save new mimetype definitions :)
        return admin:save-configuration( admin:mimetypes-add( $config, $new-mimetypes))
        (: executing this query will result in a restart of MarkLogic Server :)

        Please be aware that updating the mimetype table results in a MarkLogic Server restart.    You will want to execute this script when MarkLogic Server is idle or during a maintenance window.


        At the time of this writting, it is expected that the upgrade scripts will be improved in a maintenance release of MarkLogic Server where these updates will occur automatically.


        In this article, we discuss use of xdmp:cache-status in monitoring cache status, and explain the values returned.


        Note that this is a relatively expensive operation, so it’s not something to run every minute, but it may be valuable to run it occasionally for information on current cache usage.

        Output format

        The values returned by xdmp:cache-status are per host, defaulting to the current host. It takes an optional host-id to allow you to gather values from a specific host in the cluster.

        The output of xdmp:cache-status will look something like this:

        <cache-status xmlns="">


        cache-status contains information for each partition of the caches:

        • The list cache holds search term lists in memory and helps optimize XPath expressions and text searches.
        • The compressed tree cache holds compressed XML tree data in memory. The data is cached in memory in the same compressed format that is stored on disk.
        • The expanded tree cache holds the uncompressed XML data in memory (in its expanded format).
        • The triple cache hold triple data.
        • The triple value cache holds triple values.

        The following are descriptions of the values returned:

        • partition-size: The size of a cache partition, in MB.
        • partition-table: The percentage of the table for a cache partition that is currently used. The table is a data structure that has a fixed overhead per cache entry, for cache admin. This will fix the number of entries that can be resident in the cache. If the partition table is full, something will need to be removed before another entry can be added to the cache.
        • partition-busy: The percentage of the space in a cache partition that is currently used and cannot be freed.
        • partition-used: The percentage of the space in a cache partition that is currently used.
        • partition-free: The percentage of the space in a cache partition that is currently free.
        • partition-overhead: The percentage of the space in a cache partition that is currently overhead.

        When do I get errors?

        You will get a cache-full error when nothing can be removed from the cache to make room for a new entry.

        The "partition-busy" value is the most useful indicator of getting a cache-full error. It tells you what percent of the cache partition is locked down and cannot be freed to make room for a new entry. 


        MarkLogic DHS

        MarkLogic Data Hub Service (DHS) provides the fastest and most cost-effective way for enterprises to integrate, store, harmonize, analyze, and secure mission-critical data in the cloud. Because it is a managed service, not all of the monitoring options are available in DHS as are available using MarkLogic Server with the Data Hub Framework.

        Differences running on AWS and Azure

        The management ports differ depending on which cloud provider you are using to host DHS. DHS on AWS uses port 8002 for the management endpoint. DHS on Azure uses port 8003 for the management endpoint.

        Dashboard and History

        The Monitoring Dashboard, and Monitoring History can also be accessed on the management port.

        The Monitoring Dashboard provides task-based views of MarkLogic Server performance metrics in real time

        Monitoring History feature allows you to view critical performance data collected from your cluster

        Database Status

        Return status information for the named database:

        Database Metrics

        Retrieve historical monitoring data about the databases in the cluster

        Retrieve historical monitoring data about the named databases:

        Server Logs

        Available Log Files

        List the logs available on the server

        Retrieving, Filtering, and Formatting Logs

        The Log files can be retrieved with text, json, or xml formatting. The files can be retrieved in whole, or they can be filtered using any combination of:

        • Start time (start)
        • End time (end)
        • Regular expression/s (regex)

        Retrieve data-hub-STAGING app server error log with text, json, or xml formatting

        Retrieve Server ErrorLog entries for a specific time range with xml formatting

        Retrieve Server ErrorLog looking for the patterns SVC or XDMP. Regex or condition, |, is URL encoded (%7C) between the two patterns.

        MarkLogic recommends that the Security database only have 1 primary forest.  Having more than one primary forest for the Security database can cause failover issues when doing upgrades and restarts.  The Security database should have a single primary forest, and one replica forest to support High Availability.

        More details available in the knowledge base article How many forests should my Security database have?

        Refer to our documentation for Configuring the Security and Auxiliary Databases to Use Failover Forests


        When restarting very large forests, some customers have noted that it may take a while for them to mount. While the forests are mounting, the database is unable to come online, thus impacting the availability of your main site. This article shows you how to change a few database settings to improve forest-mounting time.



        When encountering delays with forest mounting time after restarts, we usually recommend the following settings:

        format-compatibility set to the latest format
        expunge-locks set to none
        index-detection set to none

        Additionally, some customers might be able to spread out the work of memory mapping forest indexes by setting preload-mapped-data to false - though it should be noted that instead of the necessary time being taken during the mounting of the forest, memory-mapped file data will be loaded on demand through page faults as the server accesses it.

        While the above settings should help with forest mounting time, in general, their effects can be situationally dependent. You can read more about each of these settings in our documentation here: In particular:

        1) Regarding format compatability: "The automatic detection occurs during database startup and after any database configuration changes, and can take some time and system resources for very large forests and for very large clusters. The default value of automatic is recommended for most installations." So to your question, while automatic is recommended in most cases, you should try changing the setting if you're seeing long forest mount times.

        2) Regarding expunge-locks: "Setting this to none is only recommended to speed cluster startup time for extremely large clusters. The default setting of automatic, which cleans up the locks as they expire, is recommended for most installations."

        3) Regarding index-detection: "This detection occurs during database startup and after any database configuration changes, and can take some time and system resources for very large forests and for very large clusters. Setting this to none also causes queries to use the current database index settings, even if some settings have not completed reindexing. The default value of automatic is recommended for most installations"

        It may also be worth considering why forests are taking a long time to mount. If your data size has grown significantly over the lifetime of the affected database, it might be the case that your forests are now overly large, in which case a better approach might be to instead distribute the data across more forests.

        MarkLogic Server's 'DatabaseClient' instance represents a database connection sharable across threads. The connection is stateless, except that authentication is done the first time a client interacts with the database via a Document Manager, Query Manager, or other manager. For instance: you may instantiate a DatabaseClient as follows:
        // Create the database client

        DatabaseClient client = DatabaseClientFactory.newClient(host, port,
                                                  user, password, authType);

        And release it as follows: