Most popular articles 
Newest articles 


This article discusses the "Stand s has n fragments" messages that may appear in error log or system log files. These messages can appear at different log levels (Notice, Warning, Error, Critical, Alert, and Emergency) as the severity will increase as the number of fragments in a single stand increases, indicating increasing risk. 

Fragment counts and their corresponding Log levels:

 In MarkLogic 8 and MarkLogic 9, the fragment count thresholds within a single stand for the log levels are:  

  • At around 84 million fragments, MarkLogic Server will report this with a Notice level log message
  • At around 109 million fragments, MarkLogic Server will report this with a Warning level log message
  • At around 134 million fragments, MarkLogic Server will report this with an Error level log message
  • At around 159 million fragments, MarkLogic Server will report this with a Critical level log message
  • At around 184 million fragments, MarkLogic Server will report this with an Alert level log message
  • At around 209 million fragments, MarkLogic Server will report this with an Emergency level log message

At 256 million fragments your data may be at risk of becoming corrupted due to integer overflow. The log level reflects the risk and is intended to get your attention at higher stand fragment counts.

Emergency level log entries

Consider an example Error Log entry where the following information is observed:

2015-06-20 10:13:39.746 Emergency: Stand /space/Data/Forests/App-Services/00000fae has 213404541 fragments.

At all levels, the messages should be monitored and managed, but at the Emergency level, you will need to take corrective action soon.  

Corrective Actions

Note that it is the number of fragments in a stand that is important, not the number of fragments in a forest.  The actions that you take should act to decrease the size of stands in a forest. 

Some of the actions you can take:

  • If not already configured, MarkLogic databases should be configured with a merge-max-size value smaller than the current forest size (Databases created in MarkLogic 7 or MarkLogic 8 have a default value of 32GB).
  • If merge-max-size already configured for the database, decrease the value of this setting. 


Occasionally, you might see an "Invalid Database Online Event" error in your MarkLogic Server Error Log. This article will help explain what this error means, as well as provide some ways to resolve it.

What the Error Means

The XDMP-INVDATABASEONLINEEVENT means that something went wrong during the database online trigger event. There are many situations that can trigger this event, such as a server-restart, or when any of the databases has a change in configuration). In most cases, this error is harmless - it is just giving you information.

Resolving the Error

We often see this error when the user id that is baked into the database online event created by CPF is no longer valid, and the net effect is that CPF's restart handling is not functioning. We believe reinstalling CPF should fix this issue.

If re-installing CPF does not resolve this error, you will want to further analyze and debug the code that is invoked by the restart trigger.





Upon boot of CentOS 6.3, MarkLogic users may encounter the following warning:

:WARNING: at fs/hugetlbfs/inode.c:951 hugetlb_file_setup+0x227/0x250() (Not tainted)

MarkLogic 6.0 and earlier have not been certified to run on CentOS 6.3. This message is due to MarkLogic using a resource that has been deprecated in CentOS 6.3. The message can be ignored, as it will not cause any issues with MarkLogic performance. Although this example points specifically points out CentOS 6.3, this message could potentially occur in other MarkLogic/Linux combinations.


Some customers have reported seeing kernel level messages like this in their /var/log/messages file:

Jan 31 17:41:46 ml-c1-u3 kernel: [17467686.201893] TCP: Possible SYN flooding on port 7999. Sending cookie

This may also be seen as part of the output from a call to dmesg and could possibly follow a stack trace, for example:

[<ffffffff810d3d27>] ? audit_syscall_entry+0x1d7/0x200 
[<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b possible SYN flooding on port 7999. Sending cookies. possible SYN flooding on port 7999. Sending cookies.

What does it mean?

The tcp_syncookies configuration is likely enabled on your system.  You can check for this by viewing the contents of /proc/sys/net/ipv4/tcp_syncookies

$ cat /proc/sys/net/ipv4/tcp_syncookies

If the value returned is 1 (as per the example above), then tcp_syncookies are enabled for this host

Possible SYN flooding

A SYN flood is a form of denial-of-service attack in which an attacker sends a succession of SYN requests to a target's system in an attempt to consume enough server resources to make the system unresponsive to legitimate traffic.

Source: Wikipedia

You would expect to see evidence of a SYN flood when a "flood" of TCP SYN messages are sent to the host. Under normal operation, your kernel should acknowledge these incoming SYNs with a SYN-ACK, are not followed by ACK messages from the client. The process (or pattern) described above is known as Three Way Handshaking. The goal of this is to firmly establish communication on both the server and the client.

In the event of a real attack, a SYN flood will most likely originate from a fake IP address; during an attack, the client performing the "flood" is not waiting for the SYN-ACK response back from the server it is attacking.

Under normal operation (i.e. without SYN cookies), TCP connections will be kept half-open after receiving the first SYN because of the handshake mechanism used to establish TCP connections. Due to the fact that there is a limit to how many half open connections that the kernel can maintain at any given time, this is where the problem becomes characterized as an attack.

The term half-open refers to TCP connections whose state is out of synchronization between the two communicating hosts, possibly due to a crash of one side. A connection which is in the process of being established is also known as embryonic connection.

Source: Wikipedia

If SYN cookies are enabled, then the kernel doesn't track half-open connections. Instead it relies on the sequence number in the following ACK datagram that the ACK follows a SYN and a SYN-ACK which establishes full communication between client and server. By ignoring half-open connections, SYN floods are no longer a problem.

In the case of MarkLogic, this message can appear if the rate of incoming messages is perceived to the kernel as being unusually high. In this case, this would not be indicative of a real SYN flooding attack, but to the TCP/IP stack it looks like it exhibits the same characteristics and the kernel responds by reporting a possible (fake) attack.

Notes from the kernel documentation

See the section of the kernel documentation for tcp_syncookies - BOOLEAN for some further information regarding this feature:

The syncookies feature attempts to protect a socket from a SYN flood attack. This should be used as a last resort, if at all. This is a violation of the TCP protocol, and conflicts with other areas of TCP such as TCP extensions. It can cause problems for clients and relays. It is not recommended as a tuning mechanism for heavily loaded servers to help with overloaded or misconfigured conditions. For recommended alternatives see tcp_max_syn_backlog, tcp_synack_retries, and tcp_abort_on_overflow.

Further down, they state:

Note, that syncookies is fallback facility. It MUST NOT be used to help highly loaded servers to stand against legal connection rate. If you see SYN flood warnings in your logs, but investigation shows that they occur because of overload with legal connections, you should tune another parameters until this warning disappear. See: tcp_max_syn_backlog, tcp_synack_retries, tcp_abort_on_overflow.


Tuning on a MarkLogic Server

Any dmesg output indicating "possible SYN flooding on port 7999" may appear in tandem with very heavy XDQP (TCP) traffic within a MarkLogic cluster - this link provides further detail in relation to a similar scenario with Apache HTTP server. You can tune your TCP settings to try to avoid SYN Flooding error messages, but SYN flooding can also be a symptom of a system under resource pressure. 

If a MarkLogic Server instance sees SYN flooding message on a system that is otherwise healthy and the messages occur because of normal and expected MarkLogic Server communications, you may want to increase the backlog (tcp_max_syn_backlog) or adjust some of the other settings (such as tcp_synack_retries, tcp_abort_on_overflow). However, if SYN Flooding message only occurs on a system that is under resource pressures, then solving the resource issue should be the focus.  

How to disable SYN cookies

You can disable syncookies by adding the following line to /etc/sysctl.conf:

# disable TCP SYN Flood Protection
net.ipv4.tcp_syncookies = 0

Also note that the new setting will take only effect after a host reboot.

Further reading


After upgrading to MarkLogic 10.x from any of the previous versions of MarkLogic, examples of the following Warning and Notice level messages may be observed in the ErrorLogs:

Warning: Lexicon '/var/opt/MarkLogic/Forests/Documents/00000006/c4ea1b602ee84a34+Lexicon' collation='' out of order

Notice: Repairing out of order lexicon /var/opt/MarkLogic/Forests/Documents/00000006/c4ea1b602ee84a34+Lexicon collation '' version 0 to 602

Warning: String range index /space/Forests/Documents/0006ef0e/c0dc932d1b4bcaae-37c6e3905909f64e+string collation '' out of order. 

Notice: Repairing out of order string range index /space/Forests/Documents/0006ef0e/c0dc932d1b4bcaae-37c6e3905909f64e+string collation '' version 0 to 602

Starting with MarkLogic 10.0, the server now automatically checks for any lexicons or string range indexes that may be in need of repair.  Lexicons and range indexes perform "self-healing" in non-read-only stands whenever a lexicon/range index is opened within the stand.


This is due to changes introduced to the behavior of MarkLogic's root collation.

Starting with MarkLogic 10.0, the root collation has been modified, along with all collations that derive from it, which means there may be some subtle differences in search ordering.

For more information on the specifics of these changes, please refer to

This helps the server to support newer collation features, such as reordering entire blocks of script characters (for example: Latin, Greek, and others) with respect to each other. 

Implementing these changes has, under some circumstances, improved the performance of wildcard matching by more effectively limiting the character ranges that search scans (and returns) for wildcard-based matching.

Based on our testing, we believe this new ordering yields better performance in a number of circumstances, although it does create the need to perform full reindexing of any lexicon or string range index using the root collation.

MarkLogic Server will now check lexicons and string range indexes and will try to repair them where necessary.  During the evaluation, MarkLogic Server will skip making further changes if any of the following conditions apply:

(a) They are already ordered according to the latest specification provided by ICU (1.8 at the time of writing)

(b) MarkLogic Server has already checked the stand and associated lexicons and indexes

(c) The indexes use codepoint collation (in which case, MarkLogic Server will be unable to change the ordering).

Whenever MarkLogic performs any repairs, it will always log a message at Notice level to inform users of the changes made.  If for any reason, MarkLogic Server is unable to make changes (e.g. a forest is mounted as read-only), MarkLogic will skip the repair process and nothing will be logged.

As these changes have been introduced from MarkLogic 10 onwards, you will most likely observe these messages in cases where recent upgrades (from prior releases of the product) have just taken place.

Repairs are performed on a stand by stand basis, so if a stand does not contain any values that require ordering changes, you will not see any messages logged for that stand.

Also, if any ordering issues are encountered during the process of a merge of multiple stands, there will only be one message logged for the merge, not one for each individual stand involved in that merge.


  • Repairs will take place for any stand that has been found to have a lexicon or string index that has an out-of-order and out-of-date (e.g. utilising a collation described by an earlier version of ICU) collation, unless that stand is mounted as read only.
  • Any repair will generate Notice messages when maintenance takes place.
  • Whenever a lexicon or string Range index is opened, this check/repair will take place for any string range index; lexicon call (e.g. cts:values); range query (e.g. cts:element-range-query) and during merges merges.
  • The check looking for ICU version mismatches plus items that are out-of-order, so any lexicon / string range index with older ordering (and which requires no further changes), no further action will be taken for that stand.

Known side effects

If the string range index or lexicon is very large, repairing can cause some performance overhead and may impact search performance during the repair process.


These messages can be avoided by issuing a full reindex of your databases immediately after performing your upgrade to MarkLogic 10.


Forests in MarkLogic Server may be in one of several mount states. On mounting, local disk failover forests or database replication forests should both eventually reach the sync replicating or async replicating state. There are occasions, however, where local disk failover or database replication forests will sometimes get stuck in the wait replication state. This knowledgebase article will itemize many of these wait replication scenarios, as well as the operational tactics to use in response. 

Wait replication scenarios

Wait replication as a result of lack of quorum

A quorum in MarkLogic server represents more than 50% of the total number nodes of the cluster. It's very important to note the total number of nodes - regardless of group membership, forest assignment, whether nodes are running/not running, etc. - if a machine exists in the hosts.xml configuration file and in the list of hosts in the Admin UI, it contributes to the total count.

While it's possible to run a MarkLogic cluster with only a subset of the configured nodes up, it's not a recommended configuration. In addition, if the number of active nodes in your cluster falls below the greater than 50% quorum threshold, you might run into forests in the wait replication state due to the lack of quorum.

What to do about it? You'll need to alter your cluster's configuration to meet the quorum requirement. That can mean either removing missing nodes from the cluster's configuration (essentially telling the cluster to stop looking for those missing nodes), or alternatively bringing up nodes that are currently part of the configuration, but not actively returning heartbeats (effectively letting the cluster see nodes it expects to be there). 

You can read more about quorum at the following knowledgebase articles:

Wait replication as a result of mixed file permissions

The root MarkLogic process is simply a restarter process which waits for the non-root (daemon) process to exit. If the daemon process exits abnormally, for any reason, the root process will fork and exec another process under the daemon process. The root process runs no XQuery scripts, opens no sockets, and accesses no database files. While it's possible to run the MarkLogic process as a non-root user, be very careful about forest file permissions - if your configured MarkLogic user doesn't have the necessary permissions, you might see wait replication and an inability to correctly failover to local disk failover forests when necessary - in which case you'll need to set your forest file permissions correctly to move forward. You can read more about running the MarkLogic process as a non-root user at:

Wait replication due to upgrading in the wrong order

Per our documentation, when upgrading you must first upgrade your replica environment, then subsequently upgrade your master environment.

if your cluster upgrades aren’t done in the correct order, you’re going to need to:

  1. Decouple your master and replica clusters, then stop the replica cluster

  2. Edit your replica cluster's databases.xml to remove entries with Security database replication

  3. Start the replica cluster, beginning with the node that hosts the Security forest

  4. Manually run http://(hostname of node hosting the Security forest):8001/security-upgrade-go.xqy?force=true

  5. Re-couple your master and replica clusters

You can read more about upgrading environments using database replication at:

Wait replication because you downgraded

MarkLogic Server does not support downgrades. If you do attempt to downgrade your installation, your replica forests will be stuck in wait replication.

What to do about it? As in the case of upgrading in the wrong order, you'll need to manually run http://(hostname of node hosting the Security forest):8001/security-upgrade-go.xqy?force=true. You can read more about MarkLogic Server and downgrades at:

Wait replication because your master and replica forest names don't match

By default, the "Connect Forests by Name" option is set to true. This means the server has certain expectations around how master and replica forests should be named

What to do about it? Set "Connect Forests by Name" to false, then manually connect master and replica forests. You can read more about wait replication due to forest name mismatch at:

Wait replication as a result of merge blackouts (completely disabled merges)

What is merging and why do we need merge blackouts?

MarkLogic Server does lazy deletes, which marks documents obsolete (but doesn't actually delete them). Merges are when obsolete documents are actually deleted - in bulk, while also optimizing your data. Merge blackouts prevent this deferred deletion and optimization from happening. Merge blackouts can also sometimes result in wait replication. Consider a database that has both master and local disk failover forests where you have configured a merge blackout with the “disable merges completely” option (instead of “limit merges to” option). If a node failure on any of the nodes holding some of these forests were to occur during the merge blackout period, as soon as the failed node comes back online, all the forests associated with that specific node go into a “wait replication” state until the merge blackout period ends or is manually removed.


  • Avoid completely disabling merges
  • If you do need to control merges, it's much better to set the maximum merge size in your blackout to a smaller number (“limit merges to” option)


When configuring database replication, it is important to note that the Connect Forests by Name field is true by default. This works great because, when new forests of the same name are later added to the Master and Replica databases, they will be automatically configured for Database Replication.

The issue

The problem arises when you use replica forest names that do not match the original Master forest names. In that case, you may find that failover events cause forests to get stuck in the Wait Replication state. The usual methods of failing back to the designated masters will not work - restarting the replicas will not work, and neither will shutting down cluster/removing labels/restarting cluster.


In this case, the way to fix the issue is to set Connect Forests by Name to false, and then you must manually connect the Master forests on the local cluster to the Replica forests on the foreign cluster, as described in the documentation: Connecting Master and Replica Forests with Different Names.

it is worth noting that, starting MarkLogic 7, you are also allowed to rename the replica forests. Once you rename the replica forests to the same name as the forest name of the designated master database (e.g., the Security database should have a Security forest in both the master and replica), then they will be automatically configured for Database Replication, as expected.


Wednesday, April 13,2022 : This article had updates on new releases for Data Hub Framework (DHF) -  DHF 5.7.1  ,  Data Hub Central. - Data Hub Central 5.7.1 

Monday, April 04, 2022: This article had been updated to account for the new guidance and remediation steps in CVE-2022-22965;

Thursday, March 31, 2022: Original article published.

Subject :

(Spring4Shell) CVE-2022-22965: Spring Framework RCE via Data Binding on JDK 9+

Summary :

Wednesday March 30, 2022, reports emerged of a new remote code execution flaw that affects Spring Framework. This vulnerability also popularly known as "Spring4Shell" is a new, previously unknown security vulnerability.

The CVE designation is CVE-2022-22965 with a CVSS Score of 9.8. Spring have acknowledged the vulnerability and released 5.3.18 and 5.2.20 to patch the issue as well as version 2.6.6 for spring-boot .

MarkLogic is aware of this vulnerability and is in the process of assessing the impact to our products and Client API's.

Update on Analysis as of 4/22/2022 - 

1.1. MarkLogic Server

MarkLogic Server, both on-premise or on AWS/Azure are not vulnerable to CVE-2022-22965. 

There are no known impact on Admin GUI, Query Console and Monitoring History/Dashboard. 

1.2. MarkLogic Java Client

No direct impact : In Java Client API, we only used spring-jdbc, 5.2.7

It doesn’t meet the prerequisites listed in CVE-2022-22965 of

These are the prerequisites for the exploit:

  • JDK 9 or higher
  • Apache Tomcat as the Servlet container
  • Packaged as WAR
  • spring-webmvc or spring-webflux dependency

Spring-jdbc has a transitive dependency on spring-core and spring beans ( identified as vulnerable ) . Hence, we have upgraded our spring-jdbc to version 5.3.18 which is available in latest Java Client API 5.5.3 Release available on DMC and GitHub.

1.3. MarkLogic Data Hub & Hub Central

MarkLogic Data Hub and Data Hub Central are impacted.  Data Hub Framework (DHF) V 5.7.1 is now available . 

1.4. MarkLogic Data Hub Service

  • Hub Central is impacted. The Hub Central component exists only on DHS versions >= 3.0. For customers using Hub Central in DHS wishing to update dependencies or versions once the new version is available, please contact MarkLogic Support assigned to the attention of the Cloud Services team.
  • mlcmd is not affected.
  • Sumo Logic is not affected. Sumo Logic Support validated that it is not vulnerable to known exploitable CVE-2022-22965 methods. The Sumo Logic collector also is not vulnerable to known Spring Cloud framework exploitation methods. Out of an abundance of caution, Sumo Logic will be updating its Sumo Logic Service; no action is required on your part, however. 

1.5. Marklogic-supported client libraries, tools

1.5.1. Un-Impacted versions  




1. XCC No action is needed at this time. All systems have been thoroughly scanned and patched with the recommended fixes wherever needed. 
2. MLCP No action is needed at this time. All systems have been thoroughly scanned and patched with the recommended fixes wherever needed. 
3, mlcmd  MLCMD uses XMLSH and it is not effected by this vulnerability. 

1.5.2. Impacted versions (Scroll down the table




1. Java Client Util ml-javaclient-util-4.3.1 is now available on github, maven central. Download link is here.
2. ml-gradle/ml-app-deployer

ml-gradle-4.3.4 is now available on github. Download link is here.

ml-app-deployer-4.3.3 is now available on github, maven central. Download link is here

3. Data Hub Framework DHF 5.7.1 is now available . Download link for Github and Maven Central are available .
4. Data Hub Client Jar Data Hub Client Jar.  Download link for Github are available . 
5. Data Hub Central Data Hub Central 5.7.1 is now available. Downlink link Download link for Github and Maven Central are available .
6. Data Hub Central Community DHCCE 5.7.1 is now available on github  
7. Apache Spark Connector Spark connector 1.0.1 is now available at -  
8. AWS Glue Connector

Glue connector 1.0.1 is now available at - 

Please find the documentation here -

9. Pega Connector Upgrade to ml-gradle 4.3.4    

1.6. MarkLogic Open Source and Community-owned projects

1.6.1. Un-Impacted versions


Community Libraries


1. MuleSoft Connector  MuleSoft applications do not run in Tomcat containers or get packaged as WARs, the affected Spring versions are not vulnerable.The current MuleSoft Connector does not fall into the prerequisites, even though it does have a dependency on ml-javaclient-util, which appears to have Spring Framework llbraries that are affected. Although, ml-javaclient-util Spring dependencies should be updated
2. ml-javaclient-util Affected Spring versions in the dependencies for 4.2.0 and the latest 4.3.0, but should be safe as-is unless bundled into a Tomcat/Spring app. Although,, ml-javaclient-util Spring dependencies should be updated

1.6.2. Impacted versions  - 

Details will be updated here if any are identified..

MarkLogic is dedicated to supporting our customers, partners, and developer community to ensure their safety. If you have a registered support account, feel free to contact with any additional questions.

1.6.2. Impacted versions  - 

Details will be updated here if any are identified..

1.7. Contact and Links

MarkLogic is dedicated to supporting our customers, partners, and developer community to ensure their safety. If you have a registered support account, feel free to contact with any additional questions.


This article will show you how to add a Fast Data Directory (FDD) to an existing forest.


The fast data directory stores transaction journals and stands. When the directory becomes full, larger stands will be merged into the data directory. Once the size of the fast data directory approaches its limit, then stands are created in the data directory.

Although it is not possible to add an FDD path to a currently-existing forest, it is possible to do the following:

1. Destroy an existing forest configuration (while preserving the data)

2. Recreate a forest with the same name and data, with an FDD added


The queries below illustrate steps one and two of the process. Note that you can also do this with Admin UI.

The query below will delete the forest configurations but not data.


1. Schedule a downtime window for this procedure (DO NOT DO THIS ON A LIVE PRODUCTION SYSTEM)

2. Ensure that all ingestion and merging has stopped

3. Just to be on safer side, take a Backup of the forest first before applying this in Production

3. Detach the forest before running these queries

1) Use the following API to delete an existing forest configuration

NOTE: make sure to set the $delete-data parameter to false().

$config as element(configuration),
$forest-ids as xs:unsignedLong*,
$delete-data as xs:Boolean {=FALSE}
) as element(configuration)

2) Use the following API to create a new forest  pointing to the old data directory which includes the configured FDD:

$config as element(configuration),
$forest-name as xs:string,
$host-id as xs:unsignedLong,
$data-directory as xs:string?,
[$large-data-directory as xs:string?],
[$fast-data-directory as xs:string?]
) as element(configuration)

Here's an example query that uses these APIs:

xquery version "1.0-ml";

declare namespace html = "";

import module namespace admin = "" 
at "/MarkLogic/admin.xqy";

let $config := admin:get-configuration()

(: preserve some path values from the old forest :)

let $forest-name := "YOUR_FOREST_NAME"

let $new-fast-data := "YOUR_NEW_FAST_DATA_DIR"

let $old-data := admin:forest-get-data-directory($config, admin:forest-get-id($config, $forest-name))

let $old-large-data := admin:forest-get-large-data-directory($config, admin:forest-get-id($config, $forest-name))

$config, admin:forest-get-id($config, $forest-name),

let $config1 := admin:get-configuration()

You can create and attach the forest in a single transaction. This is also possible using the admin UI (as two separate transactions); i.e., deleting only configuration of forest without data.

After attaching the forest, please reindex and data will then migrate to FDD. Note that the sample query needs to be executed on the host where the forest resides.




MarkLogic has shipped with a REST API since MarkLogic 7.

In MarkLogic 8 the REST API was vastly expanded, allowing ways for MarkLogic Database administrators to manage almost all common MarkLogic administration tasks over an HTTP connection to MarkLogic's REST endpoints.

This Knowledgebase article will cover some examples of common administration tasks and will show some working examples to give you a taste of what can be done if you're using the latest version of MarkLogic Server.

While there are a significant number of examples throughout our extensive documentation in this area, many of these make use of CURL. In this Knowledgebase article, we're going to use XQuery calls to demonstrate how the payloads are structured.

Creating a backup using a call to the REST API (XQuery)

In the example code below, we demonstrate a call that will perform a backup of the Documents forest which places the backup in the /tmp directory.

Running the query in the above code example will return a response (in JSON format) containing a job ID for the requested task:

"job-id": "4903378997555340415", 
"host-name": "yourhostnamehere"

The next example will demonstrate a status check for a given job ID

Query the status of an active or recent job

The above query will return a response that would look like this:

"job-id": "4903378997555340415", 
"host-name": "yourhostnamehere", 
"status": "completed"

Further reading on the MarkLogic REST API:

Alternatives to Configuration Manager


The MarkLogic Server Configuration Manager provided a read-only user interface to the MarkLogic Admin UI and could be used for saving and restoring configuration settings. The Configuration Manager tool was deprecated starting with MarkLogic 9.0-5, and is no longer available in MarkLogic 10.


There are a number of alternatives to the Configuration Manager. Most of the options take advantage of the MarkLogic Admin API, either directly or behind the scenes. The following is a list of the most commonly used options:

  • Manual Configuration
  • ml-gradle
  • Configuration Management API

Manual Configuration

For a single environment, the following Knowledge base covers the process of Transporting Resources to a New Cluster.


For a repeatable process, the most widely used approach is ml-gradle.

A project would be created in Gradle, with the desired configurations. The project can then be used to deploy to any environment - test, prod, qa etc - creating a known configuration that can be maintained under source control, which is a best practice.

Similar to Configuration Manager, ml-gradle also allows for exporting the configuration of an existing cluster. You can refer to transporting configuration using ml-gradle for more details.

While ml-gradle is an open source community project that is not directly supported, it enjoys very good community and developer support.  The underlying APIs that ml-gradle uses are fully supported by MarkLogic.

Configuration Management API

An additional option is to use the Configuration Management API directly to export and import resources.


Both ml-gradle and the Configuration Management API use the MarkLogic Admin API behind the scenes but, for most use cases, our recommendation is to use ml-gradle rather than writing the same functionality from scratch.

Alternatives to Ops Director


The MarkLogic Ops Director provided a basic dashboard for monitoring the health one or more MarkLogic Server clusters, and sending out basic alerts based on predefined conditions. It has been deprecated starting with MarkLogic 10.0-5, and will no longer be supported as of November 14, 2021. Our experience has shown that our customers are most effective monitoring MarkLogic Servers by integrating commercial off the shelf monitoring tools with our Management APIs.

Monitoring DHS

Note: Our Data Hub Service customers are not impacted by this announcement. One of the benefits of using our Data Hub Service is that the MarkLogic Corporation will manage and monitor the underlying MarkLogic Server instances for you.


There are a number of different alternatives to Ops Director, depending on your requirements, and existing monitoring infrastructure. Ops Director used the Management API to obtain the required information, specifically the /manage/v2/logs endpoint to read the logs and look for any "Critical" or "Error" messages using a Regular Expression (Regex). These endpoints are still available, and could be leveraged by administrators with shell or python scripts, which could also include alerting.

Detecting and Reporting Failover Events

If there is also a requirement to monitor at the Host or Database level there are REST API endpoints available for any scripted solution. Performance related data stored in the Meters database can also be accessed via REST.

The MarkLogic Monitoring History can also be extended to provide some basic visualizations.

Hacking Monitoring History

Commercial Alternatives

If your requirements are more complex than can be easily met by the options above, there are many commercial monitoring solutions that can be used to monitor MarkLogic. These include Elastic/Kibana, Splunk, DataDog and NewRelic, among others. Many organizations are already using enterprise monitoring applications provided by a commercial vendor. Leveraging the existing solutions will typically be the simplest option. If a monitoring solution already exists within your organization, you can check to see if there is an existing plugin, extension or library for monitoring MarkLogic.

If a plugin, extension or library does not already exist, most monitoring solutions also allow for retrieving data from REST endpoints, allowing them to pull metrics directly from MarkLogic even if a there is not a pre-existing solution.

Available Plugins - Extensions - Libraries

Here are a sample of some of the available options that are being used by other customers to monitor their MarkLogic infrastructure. These options are being listed here for reference only. MarkLogic Support does not provide support for any issues encountered using the productions mentioned here. Please refer to the solution vendor, or the github project page for any issues encountered.


MarkLogic Monitoring for Splunk provides configurations and pre-built dashboards that deliver real-time visibility into Error, Access, and Audit log events to monitor and analyze MarkLogic logs with Splunk.


Monitor MarkLogic with Datadog




New Relic

MarkLogic New Relic Plugin

Note: Currently there is a published New Relic Plugin that works with the latest versions of MarkLogic. However, New Relic has decided to deprecate plugins in favor of New Relic Integrations. Currently New Relic has limited plugin access to accounts that have access plugins in the last 30 days, but they plan to discontinue this access in June, 2021.

Other Resources


On Internet Explorer 9 and Internet Explorer 10, application services UI should be run in Compatibility Mode.


When using the Application Services UI in Internet Explorer 9 or Internet Explorer 10, you may notice some minor UI bugs.  These minor UI bugs occur just within MarkLogic Application Services, NOT within application built with it.  These UI bugs can be avoided if you run IE 9 or IE 10 in compatibility view.

Instructions on how to configure compatibility modes in IE 9 or IE 10: 

1. Press ALT-T to bring up the Tools menu
2. On the Tools menu, click 'Compatibility View Settings' 
3. Add the domain to the list of domains to render in compatibility view.


A question that customers frequently ask is for advice on managing backups outside the standard XQuery APIs or the web interface provided by MarkLogic.

This Knowledgebase article demonstrates two approaches to allow you to integrate the backup of a MarkLogic database into your dev-ops workflow by allowing such processes to be scripted or managed outside the product.

Creating a backup using the ReST API

You can use the ReST API to perform a database backup and to check on the status at any given time.

The examples listed below use XQuery to make the calls to the ReST API over http but you could similarly adapt the below examples to work with cURL - examples will also be given for this approach.

The process

Here is an example that demonstrates a backup of the Documents database:

Running this should give you a job id as part of the response (in this example, we're using JSON to format the response but this can easily be changed by modifying the headers elements in the above sample to return application/xml instead):

{"job-id":"8774639830166037592", "host-name":"yourhostnamehere"}

Below is an example that demonstrates checking for the status of a given backup with the job-id given in the first step:

Example: using cURL (instead of XQuery)

Adapting the above examples so they work from cURL instead, you can generate a call that looks like this:

curl -s -X POST  --anyauth -u username:password --header "Content-Type:application/json" -d '{"operation": "backup-database", "backup-dir": "/tmp/backup", "journal-archiving": true, "include-replicas": true}'  http://localhost:8002/manage/v2/databases/Documents\?format\=json

And to check on the status, the cURL payload could be modified to look like this:

{"operation": "backup-status", "job-id" : "8774639830166037592","host-name": "yourhostnamehere"}

Further reading


Customers using the MarkLogic AWS Cloud Formation Templates may encounter a situation where someone has deleted an EBS volume that stored MarkLogic data (mounted at /var/opt/MarkLogic).  Because the volume, and the associated data are no longer available, the host is unable to rejoin the cluster.  

Getting the host to rejoin the cluster can be complicated, but it will typically be worth the effort if you are running an HA configuration with Primary and Replica forests.

This article details the procedures to get the host to rejoin the cluster.

Preparing the New Volume and New Host

The easiest way to create the new volume is using a snapshot of an existing host's MarkLogic data volume.  This saves the work of manually copying configuration files between hosts, which is necessary to get the host to rejoin the cluster.

In the AWS EC2 Dashboard:Elastic Block Store:Volumes section, create a snapshot of the data volume from one of the operational hosts.

Next, in the AWS EC2 Dashboard:Elastic Block Store:Snapshots section, create a new volume from the snapshot in the correct zone and note the new volume id for use later.

(optional) Update the name of the new volume to match the format of the other data volumes

(optional) Delete the snapshot

Edit the Auto Scaling Group with the missing host to bring up a new instance, by increasing the Desired Capacity by 1

This will trigger the Auto Scaling Group to bring up a new instance. 

Attaching the New Volume to the New Instance

Once the instance is online, and startup is complete connect to the new instance via ssh

Ensure MarkLogic is not running, by stopping the service and checking for any remaining processes.

  • sudo service MarkLogic stop
  • pgrep -la MarkLogic

Remove /var/opt/MarkLogic if it exists, and is mounted on the root partition.

  • sudo rm -rf /var/opt/MarkLogic

Edit /var/local/mlcmd and update the volume id listed in the MARKLOGIC_EBS_VOLUME variable to the volume created above.

  • MARKLOGIC_EBS_VOLUME="[new volume id],:25::gp2::,*"

Run mlcmd to attach and mount the new volume to /var/opt/MarkLogic on the instance

  • sudo /opt/MarkLogic/mlcmd/bin/mlcmd init-volumes-from-system
  • Check that the volume has been correctly attached and mounted

Remove contents of /var/opt/MarkLogic/Forests (if they exist)

  • sudo rm -rf /var/opt/MarkLogic/Forests/*

Run mlcmd to sync the new volume information to the DynamoDB table

  • sudo /opt/MarkLogic/mlcmd/bin/mlcmd sync-volumes-to-mdb

Configuring MarkLogic With Empty /var/opt/MarkLogic

If you did not create your volume from a snapshot as detailed above, complete the following steps.  If you created your volume from a snapshot, then skip these steps, and continue with Configuring MarkLogic and Rejoining Existing Cluster

  • Start the MarkLogic service, wait for it to complete its initialization, then stop the MarkLogic service:
    • sudo service MarkLogic start
    • sudo service MarkLogic stop
  • Move the configuration files out of /var/opt/MarkLogic/
    • sudo mv /var/opt/MarkLogic/*.xml /secure/place (using default settings; destination can be adjusted)
  • Copy the configuration files from one of the working instances to the new instance
    • Configuration files are stored here: /var/opt/MarkLogic/*.xml
    • Place a copy of the xml files on the new instance under /var/opt/MarkLogic

Configuring MarkLogic and Rejoining Existing Cluster

Note the host-id of the missing host found in /var/opt/MarkLogic/hosts.xml

  • For example, if the missing host is ip-10-0-64-14.ec2.internal
    • sudo grep "ip-10-0-64-14.ec2.internal" -B1 /var/opt/MarkLogic/hosts.xml

  • Edit /var/opt/MarkLogic/server.xml and update the value for host-id to match the value retrieved above

Start MarkLogic and view the ErrorLog for any issues

  • sudo service MarkLogic start; sudo tail -f /var/opt/MarkLogic/Logs/ErrorLog.txt

You should see messages about forests synchronizing (if you have local disk failover enabled, with replicas) and changing states from wait or async replication to sync replication.  Once all the forests are either 'open' or 'sync replicating', then your cluster is fully operational with the correct number of hosts.

At this point you can fail back to the primary forests on the new instances to rebalance the workload for the cluster.

You can also re-enable xdqp ssl enabled, by setting the value to true on the Group Configuration page, if you disabled the setting as part of these procedures.

Update the Userdata In the Auto Scaling Group

To ensure that the correct volume will be attached if the instance is terminated, the Userdata needs to be updated in a Launch Configuration.

Copy the Launch Configuration associated with the missing host.

Edit the details

  • (optional) Update the name of the Launch Configuration
  • Update the User data variable MARKLOGIC_EBS_VOLUME and replace the old volume id with the id for the volume created above.
    • MARKLOGIC_EBS_VOLUME="[new volume id],:25::gp2::,*"
  • Save the new Launch Configuration

Edit the Auto Scaling Group associated with the new node

Change the Launch Configuration to the one that was just created and save the Auto Scaling Group.

Next Steps

Now that normal operations have been restored, it's a good opportunity to ensure you have all the necessary database backups, and that your backup schedule has been reviewed to ensure it meets your requirements.


Microsoft Azure Key Vault TLS certificates are being migrated to use the DigiCert Root G2 CA from the existing Baltimore CA.

Impact on MarkLogic Server

As Marklogic Server is currently not shipped with the DigiCert Root G2 CA certificate in the Certificate Authorities store, the following issues can occur if MarkLogic uses a new or migrated Azure Key Vault endpoint.

1. Any call to an Azure Key Vault endpoint service using xdmp:http-* (XQuery) or xdmp.http* (Javascript) will fail with a certificate verification error.

2. If Azure Key Vault is used as an external Key Management Store (KMS), calls to the Azure Key Vault to retrieve the encryption keys will fail, and any encrypted Forests will not be mounted.


Until MarkLogic Server is updated to include the required DigiCert Root G2 Certificate, you can use the following procedures to address these issues.

1. Download the DigiCert Root G2 CA certificate and import it to the MarkLogic Security Database using the Admin UI. Configure->Security->Certificate Authorities->Import

DigiCert Global Root G2 Download

2. For users who have enabled Encryption at Rest in MarkLogic Server, the following additional step is required.

i. Copy the DigiCert Global Root G2 PEM certificate downloaded above to the MarkLogic Server node.

ii. Append the PEM contents to the following file in the MarkLogic Server installation directory.



Note: This file will need to be updated on each node in a MarkLogic Server Cluster


Backup/Restore settings for Local Disk Failover

When configuring backups for a database, the 'include replica forests' setting is important  in order to handle forest failover events.   When 'include replica forests' is set to 'true', both the master and the replica forests will also be included in the database backup.

This KB article will go over an example failover scenario, and will show how a scheduled backup/restore works with different 'include replica forests' and 'journal archiving' settings.


Consider a 3 node cluster with hosts Host-A, Host-B and Host-C; and a database 'backup-test' with the following forest assignments: (forests ending with 'p' are primary and those ending with 'r' are replica).  Under normal conditions, the primary forests will be in 'open' state, and the replica forests will be in the 'sync replicating' state.

Host A Host B Host C
forest-1p (open) forest-2p(open) forest-3p(open)
forest-3r (sync replicating) forest-1r (sync replicating) forest-2r (sync replicating)

Failover and Forest states

Now consider what happens when Host-A goes offline. When Host-A's primary forests complete failover, it's replica forests will take over.   The following will be the forest state layout when this happens

Host A Host B Host C
forest-1p (disabled) forest-2p (open) forest-3p (open)
forest-3r (disabled) forest-1r (open) forest-2r (sync replicating)

Backup Examples: 

When 'Include replica Forests' is false and 'Journal Archiving' is true

Forest 1p is disabled, and the corresponding replica forest-1r is now Open because of the failover.  In this case a backup task will not succeed during this time because replica forests have not been configured for backups. The following 'Warning' level message will be logged:

Warning: Not backing up database backup-test because first forest master forest-1p is not available, and replica backups aren't enabled

When Host-A is brought up again, the forest states will be

forest-1p - sync replicating
forest-1r - open

At this time, backups will succeed and because journal archiving is enabled, journals will be written to the backup data.

However, you will not be able to do a "point in time restore' using journal archiving. When the configured master is not the acting master and backup is not enabled for replicas, the following error occurs when a restore to a point in time is attempted :

Operation failed with error message: xdmp:database-restore((xs:unsignedLong("5138652658926200166"), "/space/20160927-1125008228810", xs:dateTime("2016-09-27T11:06:21-07:00"), fn:true(), ()) -- Unable to restore replica forest forest-1r because the master forest forest-1p is not also restored, or is not acting master. Check server logs.

To get past this, the forests need to be failed back in order to make the 'configured master' same as the 'acting master'

When 'Include replica forests' is true and 'Journal Archiving' is true

In this case, backups will succeed when forests are failed over to their replica forests because replica forests are configured for backups. And, because journal archiving is enabled, journals will be also written to the backup data.

Even in this case, point in time restore will not work similar to the previous case, until the forests are failed back.

Related documentation

MarkLogic Administrator's Guide: Backing up and Restoring a Database Following Local Disk Failover 

MarkLogic Administrator's Guide: Restoring Databases with Journal Archiving

MarkLogic Knowledgebase Article: Understanding the role of journals in relation to backup and restore journal archiving

MarkLogic Knowledgebase Article: Database backup / restore and local disk failover

Before executing significant operational procedures on production systems, such as

  • Production Go Live events;
  • Major version Upgrades;
  • Adding/removing nodes to a cluster;
  • Deploying a new application or an application upgrade;
  • ...

MarkLogic recommends:

  • Thorough testing of any operational procedures on non-production systems.
  • Opening a ticket with MarkLogic Technical Support to give them a heads up, along with any useful collateral that would help expedite diagnostics of issues if any occur, such as
    • The finalized plan & timeline or schedule of the operational procedure
    • support dump, taken before the operational procedure, in order to record the configuration of the system ahead of time; This may come in handy if an incident occurs as we may want to know the actual changes that had been made. You can create a MarkLogic Server support dump from our Admin UI by selecting the 'Support' tab; select scope=cluster, detail=status only, destination=browser -> save output to disk. Attach the support dump to the ticket as a file either as an email attachment or uploading through our support portal. 
    • A few days of error logs from before the operational event so that we can determine whether artifacts in the error logs are new or whether they existed prior to the event.
    • You can alternatively turn Telemetry on before the event and force an upload of the support dump & error logs.
    • Any architecture or design details of the system that you are able to share.
  • Please make sure that all individuals who are responsible for the event and who may need to contact the MarkLogic Technical Support team are registered MarkLogic Support contacts. They can register for an account per instructions available at  They will want to register before the event as ONLY registered support contacts can create a ticket with MarkLogic Technical Support. We do not want registration and entitlement verification to get in the way of the ability to work on an urgent production issue.
  • Review the MarkLogic Support Handbook - The following sections in the "HOW TO RECEIVE SUPPORT SERVICES" chapter of the handbook are useful to be acquainted with before an incident occurs
    • Section: What to do Prior to Logging a Service Request 
    • Section: Working with Support
    • Section: Escalation Process
    • Section: Understanding Case Priority and Response Time Targets
  • For urgent issues (production outages), remember that you can raise an urgent incident per the instructions in the support handbook; MarkLogic takes urgent incidents seriously, as every urgent issue results in a text message being sent to every support engineer, engineering management and the senior executive at MarkLogic. 
  • Enable Debug level logging so that any issues that arise can be more easily diagnosed.  Debug level logging does not have any noticeable impact on system performance.


In some cases it is required to change the default environment variables of a MarkLogic Server installation or configuration

Making Changes to Defaults

When changes to the default configurations need to be made, we recommend using /etc/marklogic.conf to make those changes. The file will not exist in a default installation, and should be manually created. We recommend the file only contain the variables that are being changed or added. This file will also be unaffected by MarkLogic upgrades.

Note: We do not recommend making changes to /etc/sysconfig/MarkLogic, as this file is part of the MarkLogic installation package, and it may be replaced or changed during a MarkLogic upgrade with no notification. Any direct file customizations will be overwritten and lost, which can result in various problems when the MarkLogic service is restarted.

During startup, MarkLogic will first source its own environment variable file, and then it will source /etc/marklogic.conf, which ensures the locally defined variables take precedence.

Changing the Default Data Directory

A common use of the /etc/marklogic.conf file is to change the default data directory (/var/opt/MarkLogic).

export MARKLOGIC_DATA_DIR="/my/custom/path/MarkLogic"

If that file exists when the server is first initialized, then MarkLogic will run from the custom location. If MarkLogic has already been initialized, then you may need to stop the service and manually move /var/opt/MarkLogic to your custom location.

Using the MarkLogic AMI

When using the MarkLogic AMI, without using the MarkLogic Cloud Formation template, it is necessary to create /etc/marklogic.conf to disable the Managed Cluster feature.


If this is done after the instance is launched, then you may encounter the issue mentioned in the KB SVC_SOCHN Warning During Start Up on AWS.

Common Configurable Variables

  • MARKLOGIC_INSTALL_DIR - Where the MarkLogic binaries are installed
  • MARKLOGIC_DATA_DIR - Where MarkLogic stores configurations and forest data
  • MARKLOGIC_EC2_HOST - Whether MarkLogic will utilize EC2 specific features and settings
  • MARKLOGIC_AZURE_HOST - Whether MarkLogic will utilize Azure specific features and settings
  • MARKLOGIC_MANAGED_NODE - Whether MarkLogic will utilize the Managed Cluster feature
  • MARKLOGIC_USER - User that MarkLogic runs as
  • MARKLOGIC_HOSTNAME - Manually set the MarkLogic host name. Must be set prior to initialization or the hostname from the OS will be used
  • TZ - Allows for MarkLogic to operate with a different time zone setting than the OS

Further reading

Best Practice for Adding an Index in Production


It is sometimes necessary to remove or add an index to your production cluster. For a large database with more than a few GB of content, the resulting workload from reindexing your database can be a time and resource intensive process, that can affect query performance while the server is reindexing. This article points out some strategies for avoiding some of the pain-points associated with changing your database configuration on a production cluster.

Preparing your Server for Production

In general, high performance production search implementations run with tight controls on the automatic features of MarkLogic Server. 

  • Re-indexer disabled by default
  • Format-compatibility set to the latest format
  • Index-detection set to none.
  • On a very large cluster (several dozen or more hosts), consider running with expunge-locks set to none
  • On large clusters with insufficient resources, consider bumping up the default group settings
    • xdqp-timeout: from 10 to 30
    • host-timeout: from 30 to 90

The xdqp and host timeouts will prevent the server from disconnecting prematurely when a data-node is busy, possibly triggering a false failover event. However, these changes will affect the legitimate time to failover in an HA configuration. 

Preparing to Re-index

When an index configuration must be changed in production, you should:

  • First, index-detection should be set back to automatic
  • Then, the index configuration change should be made

When you have Database Replication Configured:

If you have to add or modify indexes on a database which has database replication configured, make sure the same changes are made on the Replica cluster as  well. Starting with ML server version 9.0-7, index data is also replicated from the Master to the Replica, but it does not automatically check if both sides have the same index settings. Reindexing is disabled by default on a replica cluster. However, when database replication configuration is removed (such as after a disaster),  the replica database will reindex as necessary. So it is important that the Replica database index configuration matches the Master’s to avoid unnecessary reindexing.

Note: If you are on a version prior to 9.0-7 - When adding/updating index settings, it is recommended that you update the settings on the Replica database before updating those on the Master database; this is because changes to the index settings on the Replica database only affect newly replicated documents and will not trigger reindexing on existing documents.

Further reading -

Master and Replica Database Index Settings

Database Replication - Indexing on Replica Explained

  • Finally, the reindexer should be enabled during off-hours to reindex the content.

Reindexing works by reloading all the Uris that are affected by the index change, this process tends to create lots of new/deleted fragments which then need to be merged. Given that reindexing is very CPU and disk I/O intensive, the re-indexer-throttle can be set to 3 or 2 to minimize impact of the reindex.

After the Re-index

After the re-index has completed, it is important to return to the old settings by disabling the reindexer and setting index-detection back to none.

If you're reindexing over several nights or weekends, be sure to allow some time for the merging to complete. So for example, if your regular busy time starts at 5AM, you may want to disable the reindexer at around midnight to make sure all your merging is completed before business hours.

By following the above recommendations, you should be able to complete a large re-index without any disruption to your production environment.


MarkLogic Server can ingest and query all sorts of data such as XMLtextJSON, binary, generic, etc. There are some things to consider when choosing to simply load data "as-is" vs. doing some degree of data modeling or data transformation prior to ingestion.


Loading data "as-is" can minimize time and complexity during ingest or document creation. That can, however, sometimes mean more complex, slower performing queries. It may also mean more storage space intensive indexing settings.

In contrast, doing some degree of data transformation prior to ingestion can sometimes result in dramatic improvements in query performance and storage space utilization due to reduced indexing requirements.

An Example

A simple example will demonstrate the how a data model can affect performance. Consider the data model used by Apple's iTunes:

<plist version="1.0">
  <key>Major Version</key><integer>10</integer>
  <key>Minor Version</key><integer>1</integer>
  <key>Application Version</key><string>10.1.1</string>
  <key>Show Content Ratings</key><true/>
    <key>Track ID</key><integer>290</integer>
    <key>Name</key><string>01-03 Good News</string>

Note the multiple <key> sibling elements, at multiple levels - where both levels are named the same thing (in this case, <dict>). Let's say you wanted to query a document like this for "Application Version." In this case, time will be spent performing index resolution for the encompassing element (here, <key>). Unfortunately, because there are multiple sibling elements all sharing the same element name, all of those sibling elements will need to be retrieved and then evaluated to see which of them actually match the given query criteria. Consider a slightly revised data model, instead:


<iTunesLibrary version="1.0">
    <name>01-03 Good News</name>

Here, we only need to query and therefore retrieve and evaluate the single <app-version> element, instead of multiple retrievals/evaluations as in the previous example data model.  

At Scale

Although this is a simple example, when processing millions or even billions of records, eliminating small processing steps could have significant performance impact.


Handling large amounts of data can be expensive in terms of both computing resources and runtime. It can also sometimes result in application errors or partial execution. In general, if you’re dealing with large amounts of data as either output or input, the most scalable and robust approach is to break-up that workload into a series of smaller and more manageable batches.

Of course there are other available tactics. It should be noted, however, that most of those other tactics will have serious disadvantages compared to batching. For example:

  • Configuring time limit settings through Admin UI to allow for longer request timeouts - since you can only increase timeouts so much, this is best considered a short term tactic for only slightly larger workloads.
  • Eliminating resource bottlenecks by adding more resources – often easier to implement compared to modifying application code, though with the downside of additional hardware and software license expense. Like increased timeouts, there can be a point of diminishing returns when throwing hardware at a problem.
  • Tuning queries to improve your query efficiency – this is actually a very good tactic to pursue, in general. However, if workloads are sufficiently large, even the most efficient implementation of your request will eventually need to work over subset batches of your inputs or outputs.

For more detail on the above non-batching options, please refer to XDMP-CANCELED vs. XDMP-EXTIME.


1.    If you can’t break-up the data into a series of smaller batches - use xdmp:save to write out the full results from query console to the desired folder, specified by the path on your file system. For details, see xdmp:save.

2.    If you can break-up the data into a series of smaller batches:

            a.    Use batch tools like MLCP, which can export bulk output from MarkLogic server to flat files, a compressed ZIP file, or an MLCP database archive. For details, see Exporting Content from MarkLogic Server.

            b.    Reduce the size of the desired result set until it saves successfully, then save the full output in a series of batches.

            c.    Page through result set:

                               i.     If dealing with documents, cts:uris is excellent for paging through a list of URIs. Take a look at cts:uris for more details.

                               ii.     If using Semantics

                                             1.    Consider exporting the triples from the database using the Semantics REST endpoints.

                                             2.    Take a look at the URL parameters start? and pageLength? – these parameters can be configured in your SPARQL query to return the results in batches.  See GET /v1/graphs/sparql for further details.


1.    If you’re looking to update more than a few thousand fragments at a time, you'll definitely want to use some sort of batching.

             a.     For example, you could run a script in batches of say, 2000 fragments, by doing something like [1 to 2000], and filtering out fragments that already have your newly added element. You could also look into using batch tools like MLCP

             b.    Alternatively, you could split your input into smaller batches, then spawn each of those batches to jobs on the Task Server, which has a configurable queue. See:

                            i.     xdmp:spawn

                            ii.    xdmp:spawn-function

2.    Alternatively, you could use an external/community developed tool like CoRB to batch process your content. See Using Corb to Batch Process Your Content - A Getting Started Guide

3.    If using Semantics and querying triples with SPARQL:

              a.    You can make use of the LIMIT keyword to further restrict the result set size of your SPARQL query. See The LIMIT Keyword

              b.    You can also use the OFFSET keyword for pagination. This keyword can be used with the LIMIT and ORDER BY keywords to retrieve different slices of data from a dataset. For example, you can create pages of results with different offsets. See  The OFFSET Keyword


This article outlines various factors influencing the performance of xdmp:collection-delete function and furthermore provides general best practices for improving the performance of large collection deletes.

What are collections?

Collections in MarkLogic Server are used to organize documents in a database. Collections are a powerful and high-performance mechanism to define and manage subsets of documents.

How are collections different from directories?

Although both collections and directories can be used for organizing documents in a database, there are some key differences. For example:

  • Directories are hierarchical, whereas collections are not. Consequently, collections do not require member documents to conform to any URI patterns. Additionally, any document can belong to any collection, and any document can also belong to multiple collections
  • You can delete all documents in a collection with the xdmp:collection-delete function. Similarly, you can delete all documents in a directory (as well as all recursive subdirectories and any documents in those directories) with a different function call - xdmp:directory-delete
  • You can set properties on a directory. You cannot set properties on a collection

For further details, see Collections versus Directories.

What is the use of the xdmp:collection-delete function?

xdmp:collection-delete is used to delete all documents in a database that belong to a given collection - regardless of their membership in other collections.

  • Use of this function always results in the specified unprotected collection disappearing. For details, see Implicitly Defining Unprotected Collections
  • Removing a document from a collection and using xdmp:collection-delete are similarly contingent on users having appropriate permissions to update the document(s) in question. For details, see Collections and Security
  • If there are no documents in the specified collection, then nothing is deleted, and the function still returns the empty sequence

What factors affect performance of xdmp:collection-delete?

The speed of xdmp:collection-delete depends on several factors:

Is there a fast operation mode available within the call xdmp:collection-delete?

Yes. The call xdmp:collection-delete("collection-uri") can potentially be fast in that it won't retrieve fragments. Be aware, however, that xdmp:collection-delete will retrieve fragments (and therefore perform much more slowly) when your database is configured with any of the following:

What are the general best practices in order to improve the performance of large collection deletes?

  • Batch your deletes
    • You could use an external/community developed tool like CoRB to batch process your content
    • Tools like CoRB allow you to create a "query module" (this could be a call to cts:uris to identify documents from a number of collections) and a "transform module" that works on each URI returned. CoRB will run the URI query and will use the results to feed a thread pool of worker threads. This can be very useful when dealing with large bulk processing. See: Using Corb to Batch Process Your Content - A Getting Started Guide
  • Alternatively, you could split your input (for example, URIs of documents inside a collection that you want to delete) into smaller batches
    • Spawn each of those batches to jobs on the Task Server instead of trying to delete an entire collection in a single transaction
    • Use xdmp:spawn-function to kick off deletions of one document at a time - be careful not to overflow the task server queue, however
      • Don't spawn single document deletes
      • Instead, make batches of size that work most efficiently in your specific use case
    • One of the restrictions on the Task Server is that there is a set queue size - you should be able to increase the queue size as necessary
  • Scope deletes more narrowly with the use of cts:collection-query

Related knowledgebase articles:



MarkLogic Server delivers performance at scale, whether we're talking about large amounts of data, users, or parallel requests. However, people do run into performance issues from time to time. Most of those performance issues can be found ahead of time via well-constructed and well-executed load testing and resource provisioning.

There are three main aspects to load testing against and resource provisioning for MarkLogic:

  1. Building your load testing suite
  2. Examining your load testing results
  3. Addressing hot spots

Building your load testing suite

The biggest issue we see with problematic load testing suites is unrepresentative load. The inaccuracy can be in the form of missing requests, missing query inputs, unanticipated query inputs, unanticipated or underestimated data growth rates, or even a population of requests that skews towards different load profiles compared to production traffic. For example - a given load test might heavily exercise query performance, only to find in production that ingest requests represent the majority of traffic. Alternatively, perhaps one kind of query represents the bulk of a given load test when in reality that kind of query is dwarfed by the number of invocations of a different kind of query.

Ultimately, to be useful, a given load test needs to be representative of production traffic. Unfortunately, the less representative a load test is, the less useful it will be.

Examining your load testing results

Beginning with version 7.0, MarkLogic Server ships a Monitoring History dashboard, visible from any host in your cluster at port 8002/history. The Monitoring History dashboard will illustrate the usage of resources such as CPU, RAM, disk I/O, etc... both at the cluster and individual host levels. The Monitoring History dashboard will also illustrate the occurrence of read and write locks over time. It's important to get a handle on both resource and lock usage in the course of your load test as both will limit the performance of your application - but the way to address those performance issues depends on which class of usage is most prevalent.

Addressing hot spots

By having a representative load test and closely examining your load testing results, you'll likely find hot spots or slow performing parts of your application. MarkLogic Server's Monitoring History allows you to correlate resource and lock usage over time against the workload being submitted by your load tests. Once you find a hot spot, it's worthwhile examining it more closely by either running those requests in isolation or at larger scales. For example, you could run 4x and 16x the number of parallel requests, or 4x and 16x the number of inputs to an individual request - both of which will give you an idea of how the suspect requests scale in response to increased load.

Once you've found a hot spot - what should you do about it? Well, that ultimately depends on the kind of usage you're seeing in your cluster's Monitoring History. If it's clear that your suspect requests are running into a resource bound (for example, 100% utilization of CPU/RAM/disk I/O/etc.), then you'll either need to provision more of that limiting resource (either through more machines, or more powerful machines, or both), or reduce the amount of load on the system provisioned as-is. It may also be possible to re-architect the suspect request to be more efficient with regard to its resource usage.

Alternatively, you may find that your system is not, in fact, seeing a resource bound - where it appears there are plenty of spare CPU cycles/free RAM/low amounts of disk I/O/etc. If you're seeing poor performance in that situation, it's almost always the case that you'll instead see large spikes in the number of read/write locks taken as your suspect requests work through the system. Provisioning more hardware resources may help to some small degree in the presence of read/write locks, but what really needs to happen is the requests need to be re-architected to use as few locks as possible, and preferably to run completely lock free.





While there are many different ways to define schemas in MarkLogic Server, one should be aware of both the location strategy the server will use (defined here:, as well as the different locations in which your particular schema may reside.

Schema Location

Schemas can reside in either the Schemas database defined for your content database, or within the server's Config directory.  If there is no explicit schema map defined, the server will use the following schema location strategy:

1) If the XQuery program explicitly references a schema for the namespace in question, MarkLogic Server uses this reference.
2) Otherwise, MarkLogic Server searches the schema database for an XML schema document whose target namespace is the same as the namespace of the element that MarkLogic Server is trying to type.
3) If no matching schema document is found in the database, MarkLogic Server looks in its Config directory for a matching schema document.
4) If no matching schema document is found in the Config directory, no schema is found.

There can sometimes be issues with step #2 when there are multiple schema documents in the schema database whose target namespace matches the namespace of the element that MarkLogic Server is trying to type. In that situation, it would be best to explicitly define a default schema mapping - schema maps can be defined through the Admin API or the Admin User Interface. Be aware that you can define schema mappings at both the group level (in which case the mapping would then apply to all application servers in the group) or at the individual application server level.

Best Practices

Now that we know how the server locates schemas and where schema can potentially reside - what are the best practices?

In general, it's best to localize your schema impacts as narrowly as possible. For example, instead of using a single Schemas database or the server's one and only Config directory, it would instead be better to define a specific Schemas database that would be used for the relevant content database. Similarly, unless you know you need a defined schema mapping to apply to every application server in a group, it would instead be better to define your schema mappings at the application server level as opposed to the group level.


Although not exhaustive, this article lists some best practices for the use of MarkLogic Server and Amazon's VPC


  1. Nodes within a MarkLogic cluster need to communicate with one another directly, without the presence of a load balancer in-between them.
  2. Whether in the context of a VPC or not, before attempting to join a node to a cluster, one should verify whether each node is able to ping or to ssh from the one node to the other (or vice versa). If you're not able to ping or ssh from one machine to another, then issues seen during a MarkLogic cluster join is very likely to be localized to the network configuration and should be diagnosed at the network layer.
  3. The following items should be double-checked when using VPCs:
    1. If a private subnet is used for any MarkLogic instance, that subnet needs access to the public internet for the following situations:
      1. If Managed Cluster support is used, MarkLogic requires access to AWS services which require outbound connectivity to the internet (at minimum to the AWS service web sites).
      2. If foreign clusters are used then MarkLogic needs to connect to all hosts in the foreign cluster
      3. If Amazon S3 is used then MarkLogic needs to communicate with the S3 public web services.
    2. It is assumed that the creator of the VPC has properly configured all subnets which MarkLogic needs to be installed to have outbound internet. There are many ways that private subnets can be configured to communicate outbound to the public internet. NAT instances are one example [AWS VPC NAT]. Another option is using DirectConnect to route outbound traffic through the organization's internet connection.
    3. All subnets which host instances running MarkLogic in the same cluster need to be able to communicate via port 7999.
    4. Inbound ssh connectivity is required for command line administration of each server requiring port 22 to be accessible from either a VPN or a public subnet.
    5. With regard to application traffic (as opposed to intra-cluster traffic as seen during cluster joining) connectivity to the MarkLogic server(s) needs to be open to whatever applications for which it is required. Application traffic can be sent through an internal or external load balancer, a VPN, direct access from applications in the same subnet or routing through another subnet.


This knowledgebase article contains critical tips and best practices you'll need to know to best use MarkLogic Server with your favorite BI Tools.

BI Tool Q&A

Q: What's a TDE? Is that a Tableau Data Extract?

A: In MarkLogic terms, TDE stands for Template Driven Extraction. A template is a document (XML or JSON) that declares how a view is to be populated. It defines a context -- the root path of all the documents that are involved in this view -- then, for each column in the view, it defines a column name, type, and a path to the data inside the document. You can define the value of a column using several pieces of data in the document, plus some functions, even some programming operations such as IF. For example, if your documents have the "last-updated" year and month and day in different parts of the document, your Template can pull in those three pieces, concatenate them, then cast the result as a date.

Q: When modifying TDEs, do I need to reindex?

A: TDEs map an SQL-like view on top of MarkLogic. If you change an existing view, you do need to reindex the database. Before kicking off a resource- and time- intensive reindex, however, be aware that there are some TDE configurations that cannot be updated. You can read more about exactly which kinds of TDEs may or may not be updated at the following knowledgebase article: Updating a TDE View.

Q: Can MarkLogic handle queries that require a large number of columns?

A: Yes, but you'll want to pay attention to potential performance impacts. In general, it's much better to spread a large number of columns across multiple TDEs, instead of having a single TDE containing all those same columns. Data modeling is also important here - TDEs should be meaningful with regard to their intended use. Definitely check out MLU's Data Modeling Series, in particular Progressive Transformation using the Envelope Pattern and Impact of Normalization: Lessons Learned.

Q: What are some common patterns and antipatterns for good performance with BI tools?

A: First, avoid using Nullable columns in filters and drilldowns. There are optimizations in MarkLogic Server's SQL engine to detect patterns with "null" - but different BI tool generate their code in different ways and can sometimes result in code that circumvents those optimizations. In general, if performance is a priority, it's usually better to use an actual value such as "N/A" or "0".

Second, enable Query Reduction or similar options in your BI tool of choice. Without this option, if you choose to filter on a year - say "2018" - and then also select "2019", multiple SQL queries will be sent to MarkLogic in quick succession unnecessarily.

Q: What do I need to watch out for when connecting my BI tool to MarkLogic?

A: If performance is a priority, exercise caution when using joins. In general, the best practice is to create collections of data in MarkLogic that represent the subsets of data needed externally as closely as possible. You can learn more about what tools are available to see how many and what kind of joins are being used by your query in the What debugging tools are available for Optic, SQL, or SPARQL code in MarkLogic Server? knowledgebase article, and you can learn more about how to create more meaningful data models and subsets of your data models in the aforementioned MLU's Data Modeling Series, as well as in the MarkLogic World presentation Getting the Most from MarkLogic Semantics (also available in video form).



If you're looking to use any of the interfaces built on top of MarkLogic's semantics engine (Optic API, SQL, or SPARQL) - you'll want to make sure you're using the best practices itemized in this knowledgebase article. It's not unusual to see one or even two orders of magnitude performance improvements, as a result. Note that this article is really just a distillation of the MarkLogic World presentation "Getting the Most from MarkLogic Semantics" - available in both pdf and YouTube formats.

Best Practices for Using Semantics at Scale

1) Scope your query - more constrained queries will do less work, and will therefore take less time

  • Trim resultsets early
  • Partition
    • Query partitions or subsets of your data, instead of your entire database
    • Define partitions with Collections
    • Make use of your partitions with collection queries
    • Use cts:query to partition even further
  • Keep like-triples in the same document
  • Use MarkLogic indexes to scope a query
    • Collection query (or SPARQL FROM) to partition the RDF space
    • Put ontologies and other lookup/mapping triples into their own graphs/collections
    • Consider pushing-down some SPARQL FILTERs to the document

2) Pay attention to your data model

3) Resultset size specific tips

  • For small resultsets – from SPARQL, get the docs with a search
  • For large resultsets
    • Get docs in a single read, no joins
    • Large result sets may incur connection churning overhead – paginate large resultsets to ensure connection reuse

4) Hardware tips

  • Add more memory - allows the optimizer to choose faster plans
  • Add more hardware - allows for increased parallelization

5) Avoid unnecessary work

  • Re-use queries with bind variable - query plan is cached for 5 minutes
  • Dedup processing
    • De-duplication has no effect on results if you have no duplicate triples and/or you use DISTINCT
    • Skipping dedup processing can result in substantial performance improvements


Backing up multiple databases simultaneously may make some of the backups fail with error XDMP-FORESTOPIN.



While configuring a scheduled backup, one can also select to backup the associated auxiliary databases like security, schemas, triggers. Generally, all the content databases share these auxiliary databases so issue may arise when more than one scheduled backup tries to backup the same auxiliary database. When two backups try to backup the same auxiliary database, the backup will fail throwing XDMP-FORESTOPIN error. Generally this error comes when the system attempts to start one forest operation (backup, restore, remove, clear, etc.) while another, exclusive operation is already in progress. For example, starting a new backup while a previous backup is still in progress.



One should be extra cautious while configuring scheduled backups and selecting auxiliary databases with them. If one really wants to backup the auxiliary databases with the content database then one needs to pay special attention to the timing and ensure that no two backups pose this timing threat.

As most of the applications don't make frequent changes to their auxiliary databases hence MarkLogic recommends to schedule backup for them separately - instead of selecting them together with the content databases.


Problems can occur when trying to explicitly search (or not search) parts of documents when using a global configuration approach to include and exclude elements.

Global Approach

Including and excluding elements in a document using a global configuration approach can lead to unexpected results that are complex to diagnose.  The global approach will require positions to be enabled in your index settings, expanding the disk space requirements of your indexes and may result in greater processing time of your position dependent queries.  It may also require adjustments to your data model to avoid unintended includes or excludes; and may require changes to your queries in order to limit the number of positions used.

If circumstances dictate that you must instead use the less preferred global configuration approach, you can read more about including/excluding elements in word queries here:

Recommended Approach

In general, it's better to define specific fields, which are a mechanism designed to restrict your query to portions of documents based on elements. You can read more about fields here:




In MarkLogic 8, support for native JSON and server side JavaScript was introduced.  We discuss how this affects the support for XML and XQuery in MarkLogic 8.


In MarkLogic 8, you can absolutely use XML and XQuery. XML and XQuery remain central to MarkLogic Server now and into the future. JavaScript and JSON are complementary to XQuery and XML. In fact, you can even work with XML from JavaScript or JSON from XQuery.  This allows you to mix and match within an application—or even within an individual query—in order to use the best tool for the job.

See also:

Server-side JavaScript and JSON vs XQuery and XML in MarkLogic Server

XQuery and JavaScript interoperability


Sometimes you may find that there are one or more tasks that are taking too long to complete or are hogging too many server resources, and you would like to remove them from the Task Server.  This article presents a way to cancel active tasks in the Task Server.


To cancel active tasks in the Task Server, you can browse to the Admin UI, navigate to the Status tab of the Group's Task Server, and cancel the tasks. However, this may get tedious if there are many tasks to be terminated.

As an alternative, you can use the server monitoring built-ins to programmatically find and cancel the tasks. The documentation for the MarkLogic Server API contains includes information for all the builtin functions you will need (refer to

Sample Script

Here is a sample script that removes the task based on the path to the module that is being executed:

let $host-id := xdmp:host()
let $host-task-server-id := xdmp:host-status($host-id)//*:task-server/*:task-server-id/text()
let $task-server-status := xdmp:server-status($host-id,$host-task-server-id)
let $task-server-requests := $task-server-status/*:request-statuses
let $scheduled-task-request := $task-server-requests/*:request-status[*:request-text = "/PATH/TO/SCHEDULED/TASK/MODULE.XQY"]/*:request-id/text()


MarkLogic stores all signed Certificates, private keys, and Certificate Authority Certificates inside the Security Database. The Security Database also stores Users, Passwords, Roles, Privileges, and many other Authentication related configurations. While setting up DR Cluster, many Administrators prefers to Replicate the Security Database to a DR (Disaster Recovery) cluster to avoid re-configuring DR cluster with Same User/Role/Privileges etc. 

Security Database Replication presents design challenges and issues while Accessing Application Servers on the DR cluster.

  • Certificates installed on the Master Cluster Security Database will get replicated to the DR cluster Security Database; However those Replicated Certificates are not useful to the DR Cluster, since Signed Certificates are typically tied to a single host (though exceptions include SAN and Wild Card Certificates).  
  • At the same time, since replicated databases are read-only, we are not able to install a new Signed Certificates on the DR Cluster as the replicated Security Database is read-only.

This article discusses the different aspect of the above problem and provides a solution.

Configuration: Security Database replicated to DR Cluster

For article discussion purpose, we will consider a 3 node Master cluster coupled to a 3 node DR cluster, where the Security DB is replicated from Master to DR Cluster. We will also have an Application Server configured attached to "DemoTemp1" Template in Master cluster. 

       Master_Cluster_Hosts.png         DR_Cluster_Hosts.png

Issues in DR Cluster.

Certificate Authentication based on CN field 

When client browsers connect to the application server using HTTPS, they check to make sure your SSL Certificate matches the host name in the address bar. There are three ways for browsers to find a match:

  1.    The host name (in the address bar) exactly matches the Common Name (CN) in the certificate's Subject.
  2.    The host name matches a Wildcard Common Name. For example, matches the common name *
  3.    The host name is listed in the Subject Alternative Name field.

The most common form of SSL name matching is the first option -  SSL client compares server name to the Common Name in the server's certificate. 

Since Temporary Signed Certificates have CN field of Master Cluster nodes, the Application Server on the DR Cluster will fail when used with the MarkLogic generated Temporary Signed Certificate.

Certificate Requests

When we attach Template on DR Cluster to any application server and generate a certificate request, MarkLogic Server will generates a Temporary Signed Certificate for all the nodes in Cluster in the Application Server Group.

Master_Cert_Template_Status.png    DR_Cert_Template_Status_1.png

To install Certificate Signed by 3rd party, replacing temporary Signed Certificate, we will need to generate a certificate requests. You can generate a certificate requests in MarkLogic for All nodes using the Request button under "Needed Certificate Request" on Certificate Template "Status" tab.

  • On the Master cluster, MarkLogic will generate 3 Certificate requests with CN field matching for each of 3 nodes. All 3 new Certificate Request are internally stored in the Security Database.
  • On the DR Cluster, Clicking Certificate Request will result in an ERROR, since the DR Cluster has a replicated Security Database that is in a Read-Only ("open replica") state i.e. security database updates arel not allowed.

Pending Certificate Requests

Each Certificate request are intended for specific individual nodes, as Certificate request originator will incorporate client FDQN into Certificate CN field while request generation. MarkLogic Server will use the hostname (which in most cases matches your FDQN) as the CN field value in the Certificate Request.

Certificate request generated on Master Cluster are stored in Security Database, which will get replicated to DR Cluster Security Database (as/when Security DB replication is configured); However Certificate requests generated on Master Cluster are not relevant to DR Cluster as they have Master Cluster nodes FQDN as CN Fields in them.

Master_Cert_Template_Status_Post_Request.png    DR_Cert_Template_Status_Post_Request.png


To install Signed Certificates intended for the DR Cluster, where Certificate CN field matches the FQDN of DR Cluster, we will need to install the DR cluster's Signed Certificates on the Master Cluster.  That certificate will then be replicated to the DR Cluster through the normal database replication of the Security database. 

Step 1. Generate Certificate Request (intended for DR nodes).

You would generate Certificate request using XQuery on QConsole against the Security database on the Master cluster itself, but the values used in your XQuery will be for DR/Replica Cluster nodes FQDN. For example, for the first node in DR Cluster ", you would run below Query from Query Console on any Node on Master Cluster against Security Database. We will change the FQDN value to each node and run Query total 3 times.

xquery version "1.0-ml"; 
import module namespace pki = "" at "/MarkLogic/pki.xqy";

Step 2. Download Certificate Request and Get them Signed.

We should be able to see Certificate request pertaining to each nodes (for Master as well as DR Nodes) on Certificate Template status tab on Master Cluster GUI and DR Cluster GUI both. Download them and get them signed by the favorite Certificate Authority.

Master_Cert_Template_Status_QC_Request.png    DR_Cert_Template_Status_QC_Request.png

Step 3. Install All Signed Certificates (for Master + DR Nodes) on Master Cluster 

Install all Signed Certificates (including Cert intended for Replica Cluster) on Master Cluster Admin GUI Certificate Template Import tab. If we try to Install Certificates on DR/Replica cluster from Admin GUI, we will get XDMP-FORESTNOT --Forest Security not available: open replica Error. Our Application Server on the DR Cluster will find the appropriate Certificates for the node from the list of all Certificates. Below screenshot shows the status of Certificate Template from Master cluster as well as DR cluster (Both should be identical).

Master_Cert_Template_Status_Final.png    DR_Cert_Template_Status_Final.png

Step 4. Importing Pre-Signed Cert where Keys are generated outside of MarkLogic.

Please read "Import pre-signed Certificate and Key for MarkLogic HTTPS App Server" to import Certificate Req/Key generated outside of MarkLogic; For our purpose, we will need to import Certificates (and their respective Keys) for both Clusters (Master as well as DR/Replica) from the QConsole on Master Cluster itself.

Further Reading


Each node in MarkLogic Server Cluster has a hostname, a human-readable nickname corresponding to the network address of the device. MarkLogic retrieves the hostname from underlying operating system during installation. On Linux, we can retrieve platform hostname value by running "$ hostname" from a shell prompt. 

$ hostname

In most environments, hostname is the same as the platform's Fully-Qualified-Domain-Name (FQDN). However, there are scenarios where hostname could be different than the FQDN. On such environments you would use FQDN ( to connect to platform instead of hostname

$ ping

PING ( 56(84) bytes of data.

64 bytes from ( icmp_seq=1 ttl=64 time=0.011 ms

During Certificate Installation to Certificate template on environments where hostname and FQDN mismatch, MarkLogic looks for the CN field in the Installed Certificate to find a matching hostname in the cluster. However since CN field (reflecting FQDN) does not match the hostname known to MarkLogic, MarkLogic does not assign the installed Certificate to any specific host in Cluster.

Subject: C=US, ST=NJ, L=Princeton, O=MarkLogic, OU=Eng,

Installing Certificates in this scenario results in the installed Certificate not replacing the Temporary Certificate, and the Temporary Certificate will still be used with HTTPS App Server instead of the installed Certificates.

This article details different solutions to address this issue. 


1) Hostname change

By default MarkLogic picks the hostname value presented by the underlying operating system. However we can always change the hostname string stored in MarkLogic Server after installation using Admin API admin:host-set-name ( )

Changing the hostname in MarkLogic (to reflect the FQDN name) will not affect the underlying Platform/OS hostname values, but will result in MarkLogic being able to find the correct host for the installed Certificate (CN field = hostname), and thus able to link then installed Certificate to specific host in Cluster.

2) XQuery code linking Installed Cert to specific Host

You can also use below XQuery code from QConsole against Security DB (as content source) to update Certificate xml files in Security DB, linking Installed Certificate to Specific host.

Please change the Certificate Template-Name, and Host-Name in below XQuery to reflect values from your environment.

xquery version "1.0-ml";

import module namespace pki = ""  at "/MarkLogic/pki.xqy";
import module namespace admin = ""  at "/MarkLogic/admin.xqy";

(: Change to your hostname string :)
(: if Qconsole is launched from the same host, then below can be used as well :)
(: let $hostname := xdmp:host-name()    :)
let $hostname :=""
let $hostid := admin:host-get-id(admin:get-configuration(), $hostname)

(: FQDN name matching Certificate CN field value :)
let $fqdn := ""

(: Change to your Template Name string :)
let $templateid := pki:template-get-id(pki:get-template-by-name("YourTemplateName"))

for $i in cts:uris()
(   (: locate Cert file with Public Key :)
    and fn:doc($i)//pki:certificate/pki:authority=fn:false()
    and fn:doc($i)//pki:certificate/pki:host-name=$fqdn
return <h1> Cert File - {$i} 
{xdmp:node-insert-child(doc($i)/pki:certificate, <pki:host-id>{$hostid}</pki:host-id>)}
    (: extract cert-id :)
    let $certid := fn:doc($i)//pki:certificate/pki:certificate-id
    for $j in cts:uris()
        (: locate Cert file with Private key :)
        and fn:doc($j)//pki:certificate-private-key/pki:certificate-id=$certid
    return <h2> Cert Key File - {$j}
    {xdmp:node-insert-child(doc($j)/pki:certificate-private-key, <pki:host-id>{$hostid}</pki:host-id>)}
} </h1>

Also, note that above will not replace/overwrite the temporary Certificate, however our App Server will start using Installed Certificate from this point instead of Temporary Certificate. One can also delete the now unused Temporary Certificate file from QConsole without any negative effect.

3) Certificate with Subject Alternative Name (SAN Cert)

You can also request your IT (or Certificate issuer) to provide a Certificate with altSubjectName that matches MarkLogic's understanding of the host. MarkLogic, during the Installation of the Certificate, will look for Alternative names and link Certificate to correct host based on altSubjectName field.


Further Reading


Introduction: When you may need to change the state of forests

In most cases, all forests in your MarkLogic cluster will be configured to allow all (any) updates to be made.

If we consider running the following example in Query Console:

In the majority of cases, calling the above function should return "all", indicating that the forest is in a state to allow incoming queries to read data from the forest and to allow queries to update content (and to add new content) into that forest.

At any given time, a forest can be configured to be in one of four different states:

  • all
  • read-only
  • delete-only
  • flash-backup

You may want to change the state of the forests in a given database for several reasons

To run your application in maintenance mode where data can be read but no data on-disk can be changed
In a situation where you are migrating data from a legacy database or removing data from a given forest
In a situation where you need to quiesce all forests in a given database for long enough to allow you to make a file level backup of the forest data.

Forest states explained

Sample state management module

Below is an example template for modifying the state of all forests in a given database:

Further reading

Forest States
Setting Forests to "read only"
Setting Forests to "delete only"

Link to Example Code



This article discusses some of the issues you should think about when preparing to change the IP address' of a MarkLogic Server.


If the hostnames stay the same, then changing IP addresses should not have any adverse side effects since none of the default MarkLogic Server settings require an IP address.

Here are some caveats:

  1. Make sure there are no application servers that have an 'address' setting to an IP address that will no longer be accessible/exist after the change.
  2. Similarly, make sure there a no external (to MarkLogic Server) dependencies on the original IP addresses.
  3. Make sure you allow some time (on the order of minutes) for the routing tables to propagate across the DNS servers before bringing up MarkLogic Server.
  4. Make sure the hosts themselves are reachable via the standard Unix channels (ping, ssh, etc) before starting MarkLogic Server.
  5. Make sure you test this in a non-production environment box before you implement it in production.


If you have an existing MarkLogic Server instance running on EC2, there may be circumstances where you need to change the size of available storage.

This article discusses approaches to ensure a safe increase in the amount of available storage for your EC2 instances without compromising MarkLogic data integrity.

This article assumes that you have started your cluster using the CloudFormation templates provided by MarkLogic.

The recommended method (I.) is to shut down the cluster, do the resize using snapshots and start again. If you wish to avoid downtime an alternative procedure (II.) using multiple volumes and rebalancing is described below.

In both procedures we are recommending a single, large EBS volume as opposed to multiple smaller ones because:

1. Larger EBS volumes have faster IO as described by the Amazon EBS Volume types at

2. You have to keep enough spare capacity on every single volume to allow for merges.  MarkLogic disk space requirements are described in our Installation Guide.

I. Resizing using AWS snapshots

This is the recommended method. This procedure follows the same steps as official Amazon AWS documentation, but highlights MarkLogic specific steps. Please review AWS Documentation in detail before proceeding:

1. Make sure that you have an up to date backup of your data and a working restore plan.

2. Stop the MarkLogic cluster by going to AWS Console -> CloudFormation -> Actions -> Update Stack


Click through the pages and leave all other settings intact, but change Nodes to and review and confirm updating the stack. This will stop the cluster.

This is also covered in Marklogic EC2 documentation:

4. Create a snapshot of the volume to resize.

5. Create a new volume from the snapshot.

Ensure that the new volume is sufficiently large to cover MarkLogic disk space requirements (generally at least 1.5x of the planned total forest size).

6. Detach the old volume.

7. Attach the newly expanded volume.

Steps 4-7 are exactly as covered in AWS documentation and have no Marklogic specific parts.

8. Restart MarkLogic cluster, by going to AWS Console -> CloudFormation -> Actions -> Update Stack and changing Nodes to the original setting.

9. Connect to the machine using SSH and resize the logical partition to match the new size. This is covered in AWS documentation, the commands are:

- resize2fs for ext3 and 4

xfs_growfs for xfs

10. The new volume will have a different id. You need to update the CloudFormation template so that the data volumes are retained and remounted when the cluster or nodes are restarted. The easiest way is to use mlcmd shell script provided by Marklogic. Also using SSH, run the following:

/opt/MarkLogic/bin/mlcmd sync-volumes-to-mdb

This will synchronise the EBS volume id with the CloudFormation template.

At this point the procedure is complete and you can delete the old EBS volume and once you have verified that everything is working fine, also delete the snapshot created in step 4.

II. Resizing with no downtime, using MarkLogic Rebalancing

This method avoids cluster downtime but it is slightly more complicated than procedure 1 and rebalancing will take additional time and add load to the cluster during rebalancing. In most cases procedure 1 takes far less time to complete, however, the cluster is down for the duration. With this procedure the cluster can serve requests at all times.

This procedure follows the same steps as official Amazon AWS documentation where possible, but highlights MarkLogic specific steps. Please review AWS Documentation in detail before proceeding:

The procedure is described in more detail in the MarkLogic Server on Amazon EC2 Guide at

1. Create a new volume.

Ensure that the new volume is sufficiently large to cover MarkLogic disk space requirements (generally at least 1.5x of the planned total forest size).

2. Attach the volume to the EC2 instance. Please take a note of the EC2 device mount point, for example /dev/sdg and see here where it maps to in Linux and in RedHat:

3. SSH into the instance and execute the /opt/MarkLogic/bin/mlcmd init-volumes-from-system command to create a filesystem for the volume and update the Metadata Database with the new volume configuration. The init-volumes-from-system command will output a detailed report of what it is doing. Note the mount directory of the volume from this report.

4. Once the volume is attached and mounted to the instance, log into the Administrator Interface on that host and create a forest or forests, specifying host name of the instance and the mount directory of the volume as the forest Data Directory. For details on how to create a forest, see Creating a Forest in the Administrator's Guide.

5. Once the status of the new forest is set to "open", attach the new forest(s) to the database and retire all the forest(s) on the old volume. If you only have 1 data volume then this includes forests for Schemas, Security, Triggers, Modules etc. It is possible to script this part using XQuery, JS or REST:

This will trigger rebalancing - database fragments will start to move to the new forests. This process will take several hours or days, depending on the size of data and the Admin UI will show you an estimate.

The Admin UI for this is covered here:

and here is more information on rebalancing:

6. Once the old forest(s) have 0 fragments in them you can detach them and delete the old forest(s). The migration to a new volume is complete.

7. Optional removing of the old volume. If your original volume was data only, the original volume should be empty after this procedure and you can:

a) unmount the volume in Linux

b) delete the volume in AWS EC2 console

c) issue /opt/MarkLogic/bin/mlcmd sync-volumes-to-mdb. This will preserve the new volume mappings in the Cloud Formation template and the volumes will be preserved and remounted when nodes are restarted or even terminated.


A common use case in many business applications is to find if an element exists in any document or not. This article provide ways to find such documents and explain points that should be taken care of while designing a solution.



In general, existence of an element in a document can checked by using below XQuery.


Note the empty cts:and-query construct here. An empty cts:and-query is used to fetch all fragments.

Hence running below search query will bring back all the documents having element "myElement".


Wrapping the query in cts:not-query will bring back all the documents *not* having element "myElement" 


As a search using cts:not-query is only guaranteed to be accurate if the underlying query that is being negated is accurate from its index resolution, hence to check existence of a specific XPath, we need to index that XPath.
e.g. if you want to find documents having /path/1/A (and not /path/2/A) then you can create a field index for path /path/1/A and then use it in your query instead.


Things to remember

1.) Have unique element name in a single document i.e. try not to use same element name at multiple places within a document if they have different meaning for your use case. Either give them different element names or put them under different namespaces to remove any ambiguity. e.g. if you have element "table" at two places in a single document then you can put them both under different namespaces such as html:table & furniture:table or you can name them differently such as html_table & furniture_table.

2.) If element names are unique within a document then you don't need to create additional indexes. If element names are not unique within a document and you are interested in only a specific XPath then create path(field) indexes on those XPaths and use the same in your not-query.



MarkLogic Server has shipped with full support for the W3C XML Schema specification and schema validation capabilities since version 4.1 (released in 2009).

These features allow for the validation of complete XML documents or elements within documents against an existing XML Schema (or group of Schemas), whose purpose is to define the structure, content, and typing of elements within XML documents.

You can read more about the concepts behind XML Schemas and MarkLogic's support for schema based validation in our documentation:

Caching XML Schema data

In order to ensure the best possible performance at scale, all user created XML Schemas are cached in memory on each individual node within the cluster using a portion of that node's Expanded Tree Cache.

Best practices when making changes to pre-existing XML Schemas: clearing the Expanded Tree Cache

In some cases, when you are redeploying a revised XML Schema to an existing schema database, MarkLogic can sometimes refer to an older, cached version of the schema data associated with a given document.

Therefore, it's important to note that whenever you plan to deploy a new or revised version of a Schema that you maintain, as a best practice, it may be necessary to clear the cache in order to ensure that you have evicted all cached data stored for older versions of your schemas.

If you don't clear the cache, you may sometimes get references to the old, cached schema references and as result, you may get errors like:

XDMP-LEXVAL (...) Invalid lexical value

You can clear all data stored in the Expanded Tree Cache in two ways:

  1. By restarting MarkLogic service on every host in the cluster. This will automatically clear the cache, but it may not be practical on production clusters.
  2. By issuing a call to xdmp:expanded-tree-cache-clear() command on each host in the cluster. You can run the function in query console or via REST endpoint and you will need a user with admin rights to actually clear the cache.

An example script has been provided that demonstrates the use of XQuery to execute the call to clear the Expanded Tree Cache against each host in the cluster:

Please contact MarkLogic Support if you encounter any issues with this process.

Related KB articles and links:


XDMP-ODBCRCVMSGTOOBIG can occur when a non-ODBC process attempts to connect to an ODBC application server.  A couple of reasons that this can happen is that there is an http application that has been accidentally configured to point to the ODBC port, or a load balancer is sending http health checks to an ODBC port. There are a number of common error messages that can indicate whether this is the case.

Identifying Errors and Causes

One method of determining the cause of an XDMP-ODBCRCVMSGTOOBIG error is to take the size value and convert it to Characters.  For example, given the following error message:

2019-01-01 01:01:25.014 Error: ODBCConnectionTask::run: XDMP-ODBCRCVMSGTOOBIG size=1195725856, conn=

The size, 1195725856, can be converted to the hexadecimal value 47 45 54 20, which can be converted to the ASCII value "GET ".  So what we see is a GET request being run against the ODBC application server.

Common Errors and Values

Error Hexadecimal Characters
XDMP-ODBCRCVMSGTOOBIG size=1195725856 47 45 54 20 "GET "
XDMP-ODBCRCVMSGTOOBIG size=1347769376 50 55 54 20 "PUT "
XDMP-ODBCRCVMSGTOOBIG size=1347375956 50 4F 53 54 "POST"
XDMP-ODBCRCVMSGTOOBIG size=1212501072 48 45 4C 50 "HELP"


XDMP-ODBCRCVMSGTOOBIG errors, do not affect the operation of MarkLogic Server, but can cause error logs to fill up with clutter.  Determining that the errors are caused by an http request to an ODBC port can help to identify the root cause, so the issue can be resolved.


Meters data can be a good resource for getting an approximation of the number of requests being managed by the server at a given time. It's also important to understand how Meters data is generated, should there be a discrepancy between the Meters samples, and the entries in the access log.

Meters Request Data

The Meters data is designed to record a sampling of activity, every few seconds. Meters data is not designed to accurately record server request rates much lower than every few seconds. Request rates are 15-second moving averages, recalculated every second and available in real time through the xdmp:host-status, xdmp:server-status and xdmp:forest-status built-in functions.

Meters Samples

The metering subsystem samples these real-time rates on the minute and saves the samples in the Meters database. Meters sampled data of events that occur less frequently than the moving average period will be lower than the number of access log entries. The difference between the two will depend on when the last event happened and when the sample was taken.

This mean that if an event happens once a minute, the request rate will rise when an event happens, but then decay away within a few seconds. If the sample is taken after the event has decayed, the saved meters data will be lower than the actual number of requests


The result of the Meters sampling method means that it is not unusual for Meters to under report the number of requests in certain circumstances.


In MarkLogic Server v7.0-2, the tokenizer keys, for languages where MarkLogic provides generic language support, were removed so that they now all use the same key. For example, Greek falls into this class of languages. This change was made as part of an optimization for languages in which MarkLogic Server has advanced stemming and tokenization support.  

Stemmed searches that include characters from languages that do not have advanced language support, performed on MarkLogic Server v7.0-2 or later releases, against content loaded on a version previous to v7.0-2, may not return the expected results.


In order to successfully run these stemmed searches, you can either:

  • Reindexing the database ; or
  • Reinsert the affected documents (i.e. the documents that contain characters in languages for which MarkLogic Server only has generic language support).

If these are not possible in your environment, you can always run the query unstemmed.

An Example

The following example demonstrates the issue

  1. On MarkLogic Server version 7.0-1, insert a document (test.xml) that contains the Greek character 'ε'.
  2. Run this query 
    xdmp:estimate( cts:search( doc('test.xml'), 'ε')),
    cts:contains( doc('test.xml'), 'ε')
  3. The query will return the correct results: 1, true
  4. Upgrade MarkLogic Server to version 7.0-3 or later and run the query again
  5. The query will return incorrect results: 0, false 
  6. Reindex the database and re-run the query
  7. The query will return the correct result once again.


As the Configuration Manager has been deprecated starting with MarkLogic version 9.0-5, there is a common question on the ways how the configuration of database or an application server from an old version of MarkLogic instance to new version of MarkLogic server or between any two versions of MarkLogic server post 9.0-4

This article outlines the steps on how to migrate the resource configuration information from one server to other using Gradle and ml-gradle plugin.


As a pre-requisite, have the compatible gradle (6.x) and the latest ml-gradle plugin(latest version is 4.1.1) installed and configured on the client (local machine or a machine from where the gradle project has to run) machine. 


The entire process is divided into two major parts Exporting resource configuration from the source cluster and Importing the resource configuration onto the destination cluster.

1. Exporting resource configuration from the source cluster/host:

On the machine where gradle is installed and the plug-in is configured, create a project as suggested in

In the example steps below the source project is  /Migration

1.1 Creating the new project with the source details:

While creating this new project, please provide the host MarkLogic server host, username, password, REST port, multiple environment details in the command line and once the project creation is successful, you can verify the Source server details in the file.

macpro-user1:Migration user1$ gradle mlNewProject
Starting a Gradle Daemon (subsequent builds will be faster)
> Configure project :For Jackson Kotlin classes support please add "com.fasterxml.jackson.module:jackson-module-kotlin" to the classpath 
> Task :mlNewProject
Welcome to the new project wizard. Please answer the following questions to start a new project. Note that this will overwrite your current build.gradle and files, and backup copies of each will be made.

[ant:input] Application name: [myApp]
<--<-<--<-------------> 0% EXECUTING [20s]
[ant:input] Host to deploy to: [SOURCEHOST]
<-------------> 0% EXECUTING [30s]
<-------------> 0% EXECUTI[ant:input] MarkLogic admin username: [admin]
<-------------> 0% EXECUTING [34s]
[ant:input] MarkLogic admin password: [admin]
<-<---<--<-------------> 0% EXECUTING [39s]
[ant:input] REST API port (leave blank for no REST API server):
<---<-------------> 0% EXECUTING [50s]
[ant:input] Test REST API port (intended for running automated tests; leave blank for no server):
<-------------> 0% EXECUTING [1m 1s]
[ant:input] Do you want support for multiple environments?  ([y], n)
<-------------> 0% EXECUTING [1m 6s]
[ant:input] Do you want resource files for a content database and set of users/roles created? ([y], n)
<-------------> 0% EXECUTING [1m 22s]
Making directory: ~/Migration/src/main/ml-config
Making directory: ~/Migration/src/main/ml-modules
Use '--warning-mode all' to show the individual deprecation warnings.


1 actionable task: 1 executed

Once this build was successful, you can see the below directory structure created under the project directory:

1.2 Exporting the configuration of required resources:

Once the new project is created, export the required resources from the source host/cluster by creating a properties file(Not in the project directory but some other directory) as suggested in the documentation with all the resources details that need to be exported to the destination cluster. In that properties file, specify the names of the resources (Databases, Forests, app servers etc..)using the keys mentioned below with comma-delimited values:

For example, a sample properties file looks like below: 


Once the file is created, run the below: 

macpro-user1:Migration user1$ gradle -PpropertiesFile=~/ mlExportResources

> Task :mlExportResources
Exporting resources to: ~/Migration/src/main/ml-config

Exported files:
Export messages:
The 'forest' key was removed from each exported database so that databases can be deployed before forests.
The 'range' key was removed from each exported forest, as the forest cannot be deployed when its value is null.
The exported user files each have a default password in them, as the real password cannot be exported for security reasons.
Use '--warning-mode all' to show the individual deprecation warnings.


1 actionable task: 1 execute

Once this build was successful, the below directory structure is created under the project directory which includes the list of resources that have been exported and their config files (Example screenshot below):

With this step finished, the export of required resources from the source cluster is created. This export is now ready to be imported with these configurations(resources) into the new/destination cluster.

2. Importing Resources and the configuration on new/Destination host/Cluster:

For importing resource configuration on to the destination host/cluster, again create a new project and use the export that has been created in step 1.2 Exporting the configuration of required resources. Once these configuration files are copied to the new project, make the necessary modification to reflect the new cluster (Like hosts and other dependencies) and then deploy the configuration into the new project.

2.1 Creating a new project for the import with the Destination Host/cluster details:

While creating this new project, provide the destination MarkLogic server host, username, password, REST port, multiple environment details in the command line and once the project creation is successful, please verify the destination server details in the file. In the example steps below the source project is  /ml10pro

macpro-user1:ml10pro user1$ gradle mlNewProject
> Task :mlNewProject
Welcome to the new project wizard. Please answer the following questions to start a new project.

Note that this will overwrite your current build.gradle and files, and backup copies of each will be made.
[ant:input] Application name: [myApp]
<-------------> 0% EXECUTING [11s]
[ant:input] Host to deploy to: [destination host]

<-------------> 0% EXECUTING [25s]
[ant:input] MarkLogic admin username: [admin]

<-------------> 0% EXECUTING [28s]
[ant:input] MarkLogic admin password: [admin]

<-------------> 0% EXECUTING [36s]
[ant:input] REST API port (leave blank for no REST API server):

<-------------> 0% EXECUTING [41s]
[ant:input] Do you want support for multiple environments?  ([y], n)

<-------------> 0% EXECUTING [44s]
[ant:input] Do you want resource files for a content database and set of users/roles created? ([y], n)

<-------------> 0% EXECUTING [59s]

Making directory: /Users/rgunupur/Downloads/ml10pro/src/main/ml-config
Making directory: /Users/rgunupur/Downloads/ml10pro/src/main/ml-modules
Use '--warning-mode all' to show the individual deprecation warnings.


1 actionable task: 1 executed

Once the project is created, you can observe the below directory structure created:


2.2 Copying the required configuration files from Source project to destination project:

In this step, copy the configuration files that have been created by exporting the resource configuration from the source server in step “ 1.2 Exporting the configuration of required resources”

For example, 

macpro-user1:ml10pro user1$ cp ~/Migration/src/main/ml-config  ~/ml10pro/src/main/ml-config

After copying, the directory structure in this project looks like below:


Please make sure that after copying configuration files from source to destination, review each and every configuration file and make the necessary changes for example, the host details should be updated to Destination server host details. Similarly, perform any other changes that are needed per the requirement.

For example, under ~/ml10pro/src/main/ml-config/forests/<database>/<forestname>.xml file you see the entry:

"host" : "Sourceserver_IP_Adress",
change the host details to reflect the destination host details. So after changing, it should look like:
"host" : "Destination_IP_Adress",
Similarly, For each forest, please define the host details of the specific node that is required.
For example for forest 1, if it has to be on node 1, define forest1.xml with 
"host" : "node1_host",
Similarly, any other configuration parameters that have to be updated, it has to be updated in that specific resource.xml file under the destination ml-config directory.
Best Practice:
As this involves modifying the configuration files, it is advised to have back up and maintain version control(like GitHub or svn) to track back the modifications.
If there is a requirement to deploy the same configuration to multiple environments (like PROD, QA, TEST) all that is needed is to have files created for a different environment where this configuration needs to be deployed. As explained in step 2.1 Creating a new project for the import with the Destination Host/cluster details, the property values for different environments need to be provided while creating the project so that the file for different environments are created.

2.3 Importing the configuration (Running mlDeploy):

In this step, import the configuration that has been copied/exported from a resource. After making sure that the configuration files are all copied from the source and then modified for the correct host details and other required changes, run the below:

macpro-user1:ml10pro user1$ gradle mlDeploy
> Task :mlDeleteModuleTimestampsFile

Module timestamps file /Users/rgunupur/Downloads/ml10pro/build/ml-javaclient-util/ does not exist, so not deleting
Use '--warning-mode all' to show the individual deprecation warnings.See


3 actionable tasks: 3 executed

Once the build is successful, go to the admin console of the destination server and verify that all the required configurations have been imported from the source server.


Further read:

For more information, refer to our documentation and knowledge base articles:



This Knowledgebase article outlines the procedure to enable HTTPS on an AWS Elastic Load Balancer (ELB) using Route 53 or an external supplier as the DNS provider and with an AWS generated certificate.

The AWS Certificate Manager (ACM) automatically manages and renews the certificate and this certificate will be accepted by all current browsers without any security exceptions.

The downside is that you do need control over your Hosted DNS name entry - either through Route 53 or through another provider.


  1. MarkLogic AWS Cluster
  2. An AWS Route 53 hosted Domain or similar externally hosted Domain; the procedure described in this article assumes that Route 53 is being used, however where possible we have tried to detail the changes needed and these should also be applicable for another external DNS provider.


  1. Click on your hostname in Route 53 to edit it

  1. Create a new Alias Record Set to point to your Elastic Load Balancer.

  1. In the Record Set entry on the right hand side, enter an Alias name for your ELB host, select Alias and from the Alias Target select the ELB load balancer to use, then click the Create button to update the Route 53 entry.

  1. In can take a little while for AWS to propagate the DNS update throughout the network but once it is available it is worth checking that you are able to reach your MarkLogic cluster using the new address, e.g.

  1. Once the Route 53 entry is updated and available you will need to request a new certificates through ACM, if you have other certificates already in ACM you can select Request a certificate

Otherwise select Get Started with Provision Certificates and select Request a public certificate

  1. Enter your required Certificate domain name and click Next:

Note: This should match your DNS Alias name entry created in Step 3.

In addition you can also add additional records such as a "Wildcard" entry, this is particularly useful if you want to use the same certificate for multiple hostnames, e.g if you have Clusters identified by versions such as ml9.[yourdomain].com & ml10.[yourdomain].com

  1. Select DNS as the Validation Method and click "Review"

  1. Before confirming and proceeding check the Hostnames are correct as certificates with invalid hosts names will not be usable.

  1. To complete validation, AWS will require you to add random CNAME entries to the DNS record to confirm that you are the owner. If you are using Route 53 this is as simple as selecting each entry in turn, numbers will vary depending on the number of Doamin name entries you specified in step 6, and clicking "Create record in Route 53". Once all entries have been created click Continue

  1. If the update is successful a Success message is displayed

  1. If your DNS Hostname is provided by an external provider you will need to download the entries using the "Export DNS configuration to a file link" and provide this information to your DNS provider to make the necessary updates.

The file is a simple CSV file and specifies one or more CNAME entries that need to be created with the required name and values. Once the AWS DNS validation process picks up these changes have been made the certificate creation process will be completed automatically.

Domain Name,Record Name,Record Type,Record Value
  1. Once the Certificate has been validated by either of the methods in Steps 9 or 11 the certificate will be marked as Issued and be available for the Load Balancer to use.

  1. Configure the ELB for HTTPS And the new AWS generated Certificate
  2. Edit the ELB Listeners and change the Cipher

  1. (Optional) For production environments it is recommended to allow TLSv1.2 only

  1. Next select the Certificate and repeat Steps 15 and 16 for each listener that you want to secure.

  1. From the ACM available certificates select the newly generated certificate for this domain and click Save

  1. Save the Listeners updates and ensure the update was successful.

  1. You should now be able to access your MarkLogic cluster securely over HTTPS using the AWS generated certificate.


HAProxy ( is a free, fast and reliable solution offering high availability, load balancing and proxying for TCP and HTTP-based applications.

MarkLogic 8 (8.0-8 and above) and MarkLogic 9 (9.0-4 and above) include improvements to allow you to use HAProxy to connect to MarkLogic Server.

MarkLogic Server supports balancing application requests using both the HAProxy TCP and HTTP balancing modes depending on the transaction mode being used by the MarkLogic application as detailed below:

  1. For single-statement auto-commit transactions running on MarkLogic version 8.0.7 and earlier or MarkLogic version 9.0.3 and earlier, only TCP mode balancing is supported. This is due to the fact that the SessionID cookie and transaction id (txid) are only generated as part of a multi-statement transaction.
  2. For multi-statement transactions or for single-statement auto-commit transactions running on MarkLogic version 8.0.8 and later or MarkLogic version 9.0.4 and later both TCP and HTTP balancing modes can be configured.

The Understanding Transactions in MarkLogic Server and Single vs. Multi-statement Transactions in the MarkLogic documentation should be referenced to determine whether your application is using single or multi-statement transactions.

Note: Attempting to use HAProxy in HTTP mode with Single-statement transactions prior to MarkLogic versions 8.0.8 or 9.0.4 can lead to unpredictable results.

Example configurations

The following example configurations detail only the parameters relevant to enabling load balancing of a MarkLogic application, for details of all parameters that can be used please refer to the HAProxy documentation.

TCP mode balancing

The following configuration is an example of how to balance requests to a 3-node MarkLogic application using the "roundrobin" balance algorithm based on the source IP address. The health of each node is checked by a TCP probe to the application server every 1 second.

backend app
mode tcp
balance roundrobin
stick-table type ip size 200k expire 30m
stick on src
default-server inter 1s
server app1 ml-node-1:8012 check id 1
server app2 ml-node-2:8012 check id 2
server app3 ml-node-3:8012 check id 3

HTTP mode balancing

The following configuration is an example of how to balance requests to a 3-node MarkLogic application using the "roundrobin" balance algorithm based on the "SessionID" cookie inserted by the MarkLogic server.

The health of each node is checked by issuing an HTTP GET request to the MarkLogic health check port and checking for the "Healthy" response.

backend app
mode http
balance roundrobin
cookie SessionID prefix nocache
option httpchk GET / HTTP/1.1\r\nHost:\ monitoring\r\nConnection:\ close
http-check expect string Healthy
server app1 ml-node-1:8012 check port 7997 cookie app1
server app2 ml-node-2:8012 check port 7997 cookie app2
server app3 ml-node-3:8012 check port 7997 cookie app3


MarkLogic Server organizes Trusted Certificate Authorities (CA) by Organization Name.  Trusted Certificate Authorities are the issuers of digital certificates, which in turn are used to certify the public key on behalf of the named subject as given in the certificate.  These certificates are used in the authentication process by:

  1. A MarkLogic Application Server configured to use SSL (HTTPS).
  2. Any Web Client which is making a connection to a MarkLogic Application Server over HTTPS (in the case of SSL Client Authentication).

Example Scenarios

Consider the following example:

$openssl x509 -in CA.pem -text -noout
        Version: 3 (0x2)
        Serial Number: 18345409437988140316 (0xfe97fcaf8a61b51c)
    Signature Algorithm: sha1WithRSAEncryption
        Issuer: C=US, ST=CA, L=San Carlos, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic CA
            Not Before: Nov 30 04:08:31 2015 GMT
            Not After : Nov 29 04:08:31 2020 GMT
        Subject: C=US, ST=CA, L=San Carlos, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic CA

In this example, From viewing the Trusted CA Subject field, the CA Certificate name will be listed with the organisation name of "MarkLogic Corporation" (O=MarkLogic Corporation) in MarkLogic's list of Certificate Authorities.

You can view the full list of currently configured Trusted Certificate Authorities by logging into the MarkLogic administration Application Server (on port 8001) and viewing the status page: Configure -> Security -> Certificate Authorities

Trusted CA Certificate without Organization name (O=)

In some cases, there are legitimate Trusted CA Certificates which do not contain any further information about the Organization responsible for the certificate.

The example below shows a sample self signed root CA (DemoLab CA) which highlights this scenario:

$openssl x509 -in DemoLabCA.pem  -text -noout
        Version: 3 (0x2)
        Serial Number: 12836463831212471403 (0xb22447d80f91b46b)
    Signature Algorithm: sha1WithRSAEncryption
        Issuer: CN=DemoLab CA
            Not Before: Nov 30 05:23:13 2015 GMT
            Not After : Nov 29 05:23:13 2020 GMT
        Subject: CN=DemoLab CA

If this Certificate were to be loaded into the MarkLogic, no name would appear under the list of <em>Certificate Authorities</em>in the list provided through the administration Application Server at Configure -> Security -> Certificate Authorities

In the case of the above example, it would be difficult to use the certificate validated by DemoLab CA (and to use DemoLab CA as our Trusted Certificate Authority) as MarkLogic will only list certificates that are associated with an Organization.


To workaround this issue, we can configure MarkLogic to use the certificate through some scripting with Query Console.

1) Loading the CA using Query Console

Start by using a call to pki:insert-trusted-certificates to load the Trusted CA into MarkLogic.  The sample Query Console code below demonstrates this process (Please ensure this query is executed against the Security database)

Make a note of value of the id returned by MarkLogic. It will return an unsigned long (xs:unsignedLong) which is the id value that can be used later to retrieve that certificate

2) Attach Trusted CA with "SSL Client Certificate Authorities" using Query Console

The next step is to associate the certificate that we just inserted from our filesystem (DemoLabCA.pem) with a given MarkLogic Application Server. Once this is done, any client connecting to that application server over SSL will be presented with the cerificate and DemoLab CA will be used to match the certificate using the Common Name value (Common Name eq "DemoLab CA")

3) Verify attached Trusted CA for Client Cetificate Authorities

Executing the above code should return the same identifier (for the Trusted CA) as returned as result of the code executed in step 1. Additionally, we can see that our Application Server (DemoAppServer) is now configured to expect an SSL Client Certificate Authority signed by DemoLab CA.

Further Reading


MarkLogic Server is engineered to scale out horizontally by easily adding forests and nodes. Be aware, however, that when adding resources horizontally, you may also be introducing additional demand on the underlying resources.


On a single node, you will see some performance improvement in adding additional forests, due to increased parallelization. This is a point of diminishing returns, though, where the number of forests can overwhelm the available resources such as CPU, RAM, or I/O bandwidth. Internal MarkLogic research (as of April 2014) shows the sweet spot to be around six forests per host (assuming modern hardware). Note that there is a hard limit of 1024 primary forests per database, and it is a general recommendation that the total number of forests should not grow beyond 1024 per cluster.

At cluster level, you should see performance improvements in adding additional hosts, but attention should be paid to any potentially shared resources. For example, since resources such as CPU, RAM, and I/O bandwidth would now be split across multiple nodes, overall performance is likely to decrease if additional nodes are provisioned virtually on a single underlying server. Similarly, when adding additional nodes to the same underlying SAN storage, you'll want to pay careful attention to making sure there's enough I/O bandwidth to accommodate the number of nodes you want to connect.

More generally, additional capacity above a bottleneck generally exacerbates performance issues. If you find your performance has actually decreased after horizontally scaling out some part of your stack, it is likely that a part of your infrastructure below the part at which you made changes is being overwhelmed by the additional demand introduced by the added capacity.


MarkLogic application servers will keep a connection open after completing and responding to a request, waiting for another new request, until the Keep Alive timeout expires. However, there is an exception scenario where the connection will close regardless of timeout settings when the content is unknown before the reply is started. This article is intended to provide further insight into connection close with respect to payload size.

HTTP Header


In general, application servers communicating in HTTP send the Content-Length header as part of their response HTTP Headers to indicate how many bytes of data the client application should expect to receive. For example,

HTTP/1.1 200 OK
Content-type: application/sparql-results+json; charset=UTF-8
Server: MarkLogic
Content-Length: 1264
Connection: Keep-Alive
Keep-Alive: timeout=5

This requires application servers to know the length of the entire response data before the very first bytes (Response HTTP Headers) are put on to the wire. For small amounts of data, the time to calculate the content-length is fast; for large amounts of content, the calculation may be time consuming with the extreme being that the client finds the server unresponsive due to the delay in calculating the entire response length. Additionally, the server may need to bring the entire content into memory, putting a further burden on server resources.

A related situation occurs when the response is compressed.  The final compressed length isn't known up front.


To allow servers to begin transmitting dynamically generated content before knowing the total size of that content, HTTP 1.1 supports chunked encoding. This technique is widely used in music and video streaming and other industries. Chunked encoding eliminates the need of knowing the entire content length before sending a portion of the data, thus making the server looks more responsive.

MarkLogic Server 11 adds both compression and chunking capabilities; see HTTP Compression and Chunking.

Connection Close

In MarkLogic Server v7 and v8, MarkLogic Server closes the connection after transmitting content greater 1 MB, which allows MarkLogic to avoid calculating content length in advance. The client will not then see Content-Length Header for larger (>1 MB) content in the HTTP response from MarkLogic. Instead it will receive a Connection Close header in the HTTP response. After sending the entire content, MarkLogic Server will terminate the connection, to indicate to Client that the end of content has been reached.

Closing the existing connection for content larger then 1 MB is an exception to the Keep-Alive configuration. This may result in unexpected behavior on clients that relying on MarkLogic Server respecting the Keep-Alive configuration, so this behavior should be accounted while designing Client Application Connection Pool.

Client Applications may have to send TCP SYN again to establish new connection to send subsequent request, which will add overhead of TCP 3 way handshake before sending next request. However, in the context of the data transfer for larger payload (>1MB), where many more round trips are added in overall communication, overhead of TCP 3 way handshake is very nominal.

Similarly, if the client accepts gzip compression, the length is unknown and the connection will be closed after the reply.  Turning on chunking will create a reply where the entire length of the response does not need to be known ahead of time, and so keep-alive can be maintained.

Further Reading


CSV files are a very common data exchange format. It is often used as an export format for spreadsheets, databases or any other application. Depending on the application, you might be able to change the delimiter character to a #hash or *asterix etc. One of the default delimiter definitions is a tab character. Content Pump supports reading and loading such CSV files.


The Content Pump -delimiter option defines which delimiter will be used to split the columns. Defining a tab as a value for the delimiter option on the command line isn't straight forward.

Loading tab delimited data files with content pump can result in an error massage like the following:

mlcp>bin/ IMPORT -host localhost -port 9000 -username admin -password secret -input_file_path sample.csv -input_file_type delimited_text -delimiter '    ' -mode local
13/08/21 15:10:20 ERROR contentpump.ContentPump: Error parsing command arguments: 
13/08/21 15:10:20 ERROR contentpump.ContentPump: Missing argument for option: delimiter
usage: IMPORT [-aggregate_record_element <QName>]

Depending on the command line shell, a tab needs to be escaped to be understand from the shell script: 

On bash shell, this should work: -delimiter $'\t'
On Bourne shell, this should work: -delimiter 'Ctrl+V followed by tab' 
Alternative way would be to use: -delimiter \x09 

If none of these work, another approach you can try is to use the -options_file /path/to/options-file parameter. The options file can contains all of the same parameters as the command line does. The benefit of using an option file is that the command line is simpler and characters are interpreted as intended. The options file will contain multiple lines where the first line is always the action like IMPORT,  EXPORT etc. followed by a pair of lines. The first line is the option parameter and second the value for the option.

A sample could look like the following:

' '

Make sure the file is saved in UTF-8 format to avoid any parsing problems. To define a tab as delimiter, place a real tab between single quotes (i.e. '<tab>')

To use this option file with mlcp execute the following command:

Linux, Mac, Solaris:

mlcp>bin/ -options_file /path/to/sample.options


mlcp>bin/mlcp.bat -options_file /path/to/sample.options

The options file can take any paramter which mlcp understands. It is important that the action command is defined on the first line. It is also possible to use both command line parameters and the option file. Command line parameters take precedence over those defined in the options file.


There are sometimes circumstances where the MarkLogic data directory owner can be changed.  This can create problems where MarkLogic Server is unable to read and/or write its own files but is easily corrected.

MarkLogic Server user

There are sometimes circumstances where the MarkLogic data directory owner can be changed; this can create problems where MarkLogic Server is unable to read and/or write its own files.

The default location for the data directory on Linux is /var/opt/MarkLogic and the default owner is daemon.

If you are using a nondefault (non-daemon) user to run MarkLogic, for example mlogic, you would usually have 

    export MARKLOGIC_USER=mlogic



Correct the data directory ownership

If the file ownership is incorrect, the way forward is to change the ownership back to the correct user.  For example, if using the default user daemon:

1.  Stop MarkLogic Server.

2.  Make sure that the user you are using is correct and available on this machine.

3.  Change the ownership of all the MarkLogic files (by default /var/opt/MarkLogic and any/all forests for this node) to daemon.  The change needs to be made recursively below the directory to include all files.  Assuming all nodes in the cluster run as daemon, you can use another unaffected node as a check.  You may need to use root/sudo permissions to change owner.  For example:

chown -R daemon:daemon /var/opt/MarkLogic

4.  Start MarkLogic Server.  It should now come up as the correct user and able to manage its files.



MarkLogic Server allows you to set-up an alerting application to notify users when new content is available that matches a predefined query. This can be achieved through the Alerting API with the Content Processing Framework (CPF). CPF is designed to keep state for documents, so it is easy to use CPF to keep track of when a document in a particular scope is created or updated, and then perform some action on that document. However, although alerting works for document updates and inserting, it does not occur for document deletes. You will have to create a custom CPF pipeline to catch the delete through an appropriate status transition.


To achieve alerting for document delete, you will have to write your own custom pipeline with status transition to handle deletes. For example:

   <annotation>custom delete action</ annotation>

The higher 'priority' value and 'always' = true indicates that the custom pipeline has precedence over the default status change handling pipeline to handle document deletes.  Similarly, in the action module, you can write your custom code for alerting.

Note: By default, when a document is deleted, the on-delete pre-commit trigger is fired and it calls the action in the Status Change Handling pipeline (if enabled) for ‘delete’ status transition. It is recommended that you do not modify this pipeline as it can cause compatibility problems in future upgrades and releases of MarkLogic server.


Packer from HashiCorp is a provisioning tool, allowing for the automated creation of machine images, extending the ability to manage infrastructure to machine images. Packer supports a number of different image types including AWS, Azure, Docker, VirtualBox and VMWare.

Packer can be used to create a customized MarkLogic Amazon Machine Image (AMI) which can then be deployed to AWS and used in a Cluster. We recommend using the official MarkLogic AMIs whenever possible, and making the necessary customizations to the official images. This ensures that MarkLogic Support is able to quickly diagnose any issues that may occur, as well as reducing the risk of running MarkLogic in a way that is not fully supported.

The KB article, Customizing MarkLogic with Packer and Terraform, covers the process of customizing the official MarkLogic AMI using Packer.

Setting Up Packer

For the purpose of this example, I will assume that you have already installed the AWS CLI, with the correct credentials, and you have installed Packer.

Packer Templates

A Packer template is a JSON configuration file that is used to define the image that we want to build. Templates have a number of keys available for defining the machine image, but the most commonly used ones are builders, provisioners and post-processors.

  • builders are responsible for creating the images for various platforms.
  • provisioners is the section used to install and configure software running on machines before turning them into images.
  • post-processors are actions applied to the images after they are created.

Creating a Template

For our example, we are going to take build from the official Amazon Linux 2 AMI, where we will install the required prerequisite packages, install MarkLogic, and apply some customizations before creating a new image.

Defining Variables

Variables help make the build more flexible, so we will utilize a separate variables file, marklogic_vars.json, to define parts of our build.

  "vpc_region": "us-east-1",
  "vpc_id": "vpc-06d3506111cea30d0",
  "vpc_public_sn_id": "subnet-03343e69ae5bed127",
  "vpc_public_sg_id": "sg-07693eb077acb8635",
  "instance_type": "t3.large",
  "ssh_username": "ec2-user",
  "ami_filter": "amzn2-ami-hvm-2.*-ebs",
  "ami_owner": "amazon",
  "binary_source": "./",
  "binary_dest": "/tmp/",
  "marklogic_binary": "MarkLogic-10.0-4.2.x86_64.rpm"

Here we've identified the instance details so our image can be launched, as well as the filter values, ami_filter and ami_owner, that will help us retrieve the correct base image for our AMI. We are also identifying the name of the MarkLogic binary, along with some path details on where to find it locally, and where to place it on the remote host.

Creating Our Template

Now that we have some of the specific build details defined, we can create our template, marklogic_ami.json. In this case we are going to use the build and provisioners keys in our build.

    "builders": [
        "type": "amazon-ebs",
        "region": "{{user `vpc_region`}}",
        "vpc_id": "{{user `vpc_id`}}",
        "subnet_id": "{{user `vpc_public_sn_id`}}",
        "associate_public_ip_address": true,
        "security_group_id": "{{user `vpc_public_sg_id`}}",
        "source_ami_filter": {
          "filters": {
          "virtualization-type": "hvm",
          "name": "{{user `ami_filter`}}",
          "root-device-type": "ebs"
          "owners": ["{{user `ami_owner`}}"],
          "most_recent": true
        "instance_type": "{{user `instance_type`}}",
        "ssh_username": "{{user `ssh_username`}}",
        "ami_name": "ml-{{isotime \"2006-01-02-1504\"}}",
        "tags": {
          "Name": "ml-packer"
    "provisioners": [
        "type": "shell",
        "script": "./"
        "destination": "{{user `binary_dest`}}",
        "source": "{{user `binary_source`}}{{user `marklogic_binary`}}",
        "type": "file"
        "type": "shell",
        "inline": [ "sudo yum -y install /tmp/{{user `marklogic_binary`}}" ]

In the build section we have defined the network and security group configurations and the source AMI details. We have also defined the naming convention (ml-YYYY-MM-DD-TTTT) for the our new AMI with ami_name and added a tag, ml-packer. Both of those will make it easier to find our AMI when it comes time to deploy it.


In our example, we are using the shell provisioner to execute a script against the machine, the file provisioner to copy the MarkLogic binary file to the machine, and the shell provisioner to install the MarkLogic binary, all of which will be run prior to creating the image. There are also provisioners available for Ansible, Salt, Puppet, Chef, and PowerShell, among others.

Provisioning Script

For our custom image, we've determined that we need install Git, to create a symbolic link MarkLogic needs on Amazon Linux 2, and to setup /etc/marklogic.conf to disable the MarkLogic Managed Cluster feature, all of which we will do inside a script. We've named the script, and it is stored in the same directory as our Packer template.

#!/bin/bash -x
echo "**** Starting ****"
echo "**** Creating LSB symbolic link ****"
sudo ln -s /etc/system-lsb /etc/redhat-lsb
echo "**** Installing Git ****"
sudo yum install -y git
echo "**** Setting Up /etc/marklogic.conf ****"
echo "export MARKLOGIC_MANAGED_NODE=0" >> /tmp/marklogic.conf
sudo cp /tmp/marklogic.conf /etc/
echo "**** Finishing ****"

Executing Our Build

Now that we've completed setting up our build, it's time to use Packer to create the image.

packer build -debug -var-file=marklogic_vars.json marklogic_ami.json

Here you can see that we are telling Packer to do a build using marklogic_ami.json and referencing our variables file with the -var-file flag. We've also added the -debug flag which will disable parallelism and enable debug mode. In debug mode, Packer will stop after each step and prompt you to hit Enter to go to the next step.

The last part of the build output will print out the details of our new image:

Wrapping Up

We have now created a customized MarkLogic AMI using Packer, which can be used to deploy a self managed cluster.


If you're looking at the MarkLogic Admin UI on port 8001, you may have noticed that the status page for a given database displays the last backup dateTime for a given database.

We have been asked in the past how this gets computed so the same check can be performed using your own code.

This Knowledgebase article will show examples that utilise XQuery to get this information and will explore the possibility of retrieving this using the MarkLogic ReST API

XQuery: How does the code work?

The simple answer is in the forest status for each of the forests in the database (note these values only appear if you have created a backup already).  For the sake of these examples, let's say we have a database (called "test") which contains 12 forests (test-1 to test-12).  We can get the backup status for these using a call to our ReST API:


In the results returned, you should see something like this:

last-backup : 2016-02-12T12:30:39.916Z datetime
last-incr-backup : 2016-02-12T12:37:29.085Z datetime

In generating that status page, what the MarkLogic code does is to create an aggregate: a database doesn't contain documents in MarkLogic; it contains forests and those forests contain documents.

Continuing the example above (with a database called "test" containing 12 forests) if I run the following:

This will return the forest status(es) for all forests in the database "test" and return the forest names using XPath, so in this case, we would see:

<forest-name xmlns="">test-1</forest-name>
<forest-name xmlns="">test-12</forest-name>

Our admin UI is interrogating each forest in turn for that database and finding out the metrics for the last backup.  So to put that into context, if we ran the following:

This gives us:

<last-backup xmlns="">2016-02-12T12:30:39.946Z</last-backup>
<last-backup xmlns="">2016-02-12T12:30:39.925Z</last-backup>

The code (or the status report) doesn't want values for all 12 forests, it just wants the time the last forest completed the backup (because that's the real time the backup completed), so our code is running a call to fn:max:

Which gives us the max value (as these are all xs:dateTimes, it's finding the most recent date), which in the case of this example is:


The same is true for the last incremental backup (note all that we're changing here is the XPath to get to the correct element:

So we can get the max value for this by getting the most recent time across all forests:

This would give us 2016-02-12T12:37:29.161Z

Using the ReST API

The ReST API also allows you to get this information but you'd need to jump through a few hoops to get to it; the ReST API status for a given database would give you the names of all the forests attached to that database:


And from there you could GET the information for all of those forests:


Once you'd got all those values, you could do what MarkLogic's admin code does and get the max values for them - although at this stage, it might make more sense to write a custom endpoint that returns this information, something like:

Where you could make a call to that module to get the aggregates (e.g.):


This would return the database status for any given parameter-name that is passed in.



When searching for matches using OR'ed word-queries, and in the case where there are overlapping matches, (i.e. one query contains the text of another query), the results of a cts:highlight query are not as desired.


For example:


let $p := <p>From the memoirs of an accomplished artist</p>


let $query :=



(cts:word-query("accomplished artist"),

cts:word-query("memoirs of an accomplished artist"))



return cts:highlight($p, $query, <m>{$cts:text}</m>)


 The desired outcome of this would be:

               <p>From the <m>memoirs of an accomplished artist</m> </p>

 Whereas, the actual results are:

                <p>From the <m>memoirs of an </m> <m>accomplished artist</m></p>


This behavior is by design and the results are expected. It is because cts:highlight  breaks up overlapping  areas into separate matches.

The cts:highlight built-in variables – $cts:queries and $cts:action help in understanding how this works, as well as to work-around this problem.

  $cts:queries --> returns the matching queries for each of the matched texts.

  $cts:action --> can be used with xdmp:set to specify what should happen next

  • "continue" - (default) Walk the next match. If there are no more matches, return all evaluation results.
  • "skip" - Skip walking any more matches and return all evaluation results
  • "break" - Stop walking matches and return all evaluation results

   For eg., replacing the return statement with the following in the original query:


 cts:highlight($p, $query,






<p>From the

     <m>memoirs of an



      <cts:word-query xmlns:cts="">

       <cts:text xml:lang="en">memoirs of an accomplished artist</cts:text>





   <m>accomplished artist



      <cts:word-query xmlns:cts="">

     <cts:text xml:lang="en">memoirs of an accomplished artist</cts:text>


      <cts:word-query xmlns:cts="">

    <cts:text xml:lang="en">accomplished artist</cts:text>




These results give us a better understanding of how the text is being matched. We can see that " accomplished artist" is matched by both the word-queries 'accomplished artist' and 'memoirs of an accomplished artist'; hence the results of cts:highlight seem different.

To work around this problem, we can insert a small piece of code: 


let $p := <p>From the memoirs of an accomplished artist</p>

let $query :=


        (cts:word-query("accomplished artist"),

        cts:word-query("memoirs of an accomplished artist")))


     return cts:highlight($p,$query,


       ( if (count($cts:queries) gt 1) then xdmp:set($cts:action, "continue")


       ( let $matched-text := <x>{$cts:queries}</x>/cts:word-query/cts:text/data(.)

        return <m>{$matched-text}</m> )





<p>From the <m>memoirs of an accomplished artist</m></p>



Please note that this solution relies on assumptions about what's inside the or-query, but this example could be modified to handle other overlapping situations.




      These results giv

      e us a better understanding of how the text is being matched. We can see that " accomplished artist" is matched by both the word-queries, and hence the results of cts:highlight seem different.


      Packer from HashiCorp is an open source provisioning tool, allowing for the automated creation of machine images, extending the ability to manage infrastructure to machine images. Packer supports a number of different image types including AWS, Azure, Docker, VirtualBox and VMWare.

      These powerful tools can be used together to deploy a MarkLogic Cluster to AWS using the MarkLogic CloudFormation Template, using a customized Amazon Machine Image (AMI). The MarkLogic CloudFormation Template is the preferred method recommended by MarkLogic for building out MarkLogic clusters within AWS. By default the MarkLogic CloudFormation Template uses the official MarkLogic AMIs.

      While this guide will cover a some portions of Terraform, the primary focus will be using Packer to customize an official MarkLogic AMI. For more detailed information on Terraform, we recommend reading Deploying MarkLogic to AWS with Terraform, which includes more detailed information on using Terraform, as well as the example files referenced later in this article.

      Setting Up Packer

      For the purpose of this example, I will assume that you have already installed the AWS CLI, with the correct credentials, and you have installed Packer.

      Packer Templates

      A Packer template is a JSON configuration file that is used to define the image that we want to build. Templates have a number of keys available for defining the machine image, but the most commonly used ones are builders, provisioners and post-processors.

      • builders are responsible for creating the images for various platforms.
      • provisioners is the section used to install and configure software running on machines before turning them into images.
      • post-processors are actions applied to the images after they are created.

      Creating a Template

      For our example, we are going to take the official MarkLogic AMI and apply some customizations before creating a new image.

      Defining Variables

      Variables help make the build more flexible, so we will utilize a seperate variables file, vars.json, to define parts of our build.

      "vpc_region": "us-east-1",
      "vpc_id": "vpc-06d3506111cea30d0",
      "vpc_public_sn_id": "subnet-03343e69ae5bed127",
      "vpc_public_sg_id": "sg-07693eb077acb8635",
      "ami_filter": "release-MarkLogic-10*",
      "ami_owner": "679593333241",
      "instance_type": "t3.large",
      "ssh_username": "ec2-user"

      Creating Our Template

      Now that we have some of the specific build details defined, we can create our template, base_ami.json. In this case we are going to use the build and provisioners keys in our build.

        "builders": [
            "type": "amazon-ebs",
            "region": "{{user `vpc_region`}}",
            "vpc_id": "{{user `vpc_id`}}",
            "subnet_id": "{{user `vpc_public_sn_id`}}",
            "associate_public_ip_address": true,
            "security_group_id": "{{user `vpc_public_sg_id`}}",
            "source_ami_filter": {
              "filters": {
              "virtualization-type": "hvm",
              "name": "{{user `ami_filter}}",
              "root-device-type": "ebs"
              "owners": ["{{user `ami_owner`}}"],
              "most_recent": true
            "instance_type": "{{user `instance_type`}}",
            "ssh_username": "{{user `ssh_username`}}",
            "ami_name": "ml-{{isotime \"2006-01-02-1504\"}}",
            "tags": {
              "Name": "ml-packer"
        "provisioners": [
            "type": "shell",
            "script": "./"
            "destination": "/tmp/",
            "source": "./marklogic.conf",
            "type": "file"
            "type": "shell",
            "inline": [ "sudo mv /tmp/marklogic.conf /etc/marklogic.conf" ]

      In the build section we have defined the network and security group configurations and the source AMI details. We have also defined the naming convention (ml-YYYY-MM-DD-TTTT) for the our new AMI with ami_name and added a tag, ml-packer. Both of those will make it easier to find our AMI when it is time to use it with Terraform.


      In our example, we are using the shell provisioner to execute a script against the machine, the file provisioner to copy the marklogic.conf file to the machine, and the shell provisioner to move the file to /etc/, all of which will be run prior to creating the image. There are also provisioners available for Ansible, Salt, Puppet, Chef, and PowerShell, among others.

      Provisioning Script

      For our custom image, we've determined that we need an additional piece of software installed, which we will do inside a script. We've named the script, and it is stored in the same directory as our packer template.

      echo "**** Starting ****"
      echo "Installing Git"
      sudo yum install -y git
      echo "**** Finishing ****"

      Executing Our Build

      Now that we've completed setting up our build, it's time to use packer to create the image.

      packer build -debug -var-file=vars.json base_ami.json

      Here you can see that we are telling packer to do a build using base_ami.json and referencing our variables file with the -var-file flag. We've also added the -debug flag which will disable parallelism and enable debug mode. In debug mode, packer will stop after each step and prompt you to hit Enter to go to the next step.

      The last part of the build output will print out the details of our new image:

      ==> Builds finished. The artifacts of successful builds are:
      --> amazon-ebs: AMIs were created:
      us-east-1: ami-0100....

      Terraform and the MarkLogic CloudFormation Template

      At this point we have our image and want to use it when deploying the MarkLogic CloudFormation Template. Unfortunately there is no simple way to do this, as the MarkLogic CloudFormation Template does not have the option to specify a custom AMI. Fortunately Terraform has some functions available that we can use to make the changes to the Template.


      First we want to add a couple entries to our existing Terraform variables file.

      variable "ami_tag" {
        type = string
        default = "ml-packer"

      variable "search_string" {
        type = string
        default = "ImageId: "

      The first variable, ami_tag is the tag we added to AMI when it was built. The second variable, search_string will be described in the Updates to Terraform Root Module section below.

      Data Source

      To retrieve the AMI, we need to define a data source. In this case it will be an aws_ami data source. We are going to call the file

      data "aws_ami" "ml_ami" {
        filter {
          name = "state"
          values = ["available"]

        filter {
          name = "tag:Name"
          values = ["${var.ami_tag}"]
        owners = ["self"]
        most_recent = true

      So we are filtering the available AMIs, only looking at ones that are owned by our own account (self), tagged with the value that we defined in our variables file, and then if more than one AMI is returned, using the most recent.

      Updates to Terraform Root Module

      Now we are ready to make a couple of updates to our Terraform root module file to integrate the new AMI into our deployment. In our last example, we used the MarkLogic CloudFormation template from its S3 bucket. For this deployment, we are going to use a local copy of the template, mlcluster-template.yaml.

      Replace the template_url line with the following line:

      template_body = replace(file("./mlcluster-template.yaml"), "/${var.search_string}.*/","${var.search_string} ${}")

      When we updated the variables in our Terraform variable file, we created the variable search_string. In the MarkLogic CloudFormation Template, the value for the Image ID is identified by the region and whether you are running the Essential Enterprise or Bring Your Own License version of MarkLogic Server. Here we are taking a regular expression, and using the replace function to manually update the line to reference the AMI we just created with Packer, which we have already retrieved already.

      Deploying with Terraform

      Now we are ready to run Terraform to deploy our cluster. First we want to double check that the template looks correct before we attempt to create the CloudFormation stack. The output of terraform plan will show the CloudFormation template that will be deployed. Check the output to make sure that the value for ImageId shows our desired AMI

      Once we have confirmed our new AMI is being referenced, we can then run terraform apply to create a new stack using the template. This can be validated by opening a command line on one of the new hosts, and checking to see if Git is installed, and if /etc/marklogic.conf exists:

      Wrapping Up

      At this point, we have now customized the official MarkLogic AMI to create our own AMI using Packer. We have then used Terraform to update the MarkLogic CloudFormation Template and to deploy a CloudFormation stack based on the updated template.


      Long URI prefix may lead to imbalance in data distribution among the forests. 


      Database assignment policy is set to 'Bucket'. Rebalancer is set to enable, and no fragments is pending to be rebalanced; However, data is imbalanced across forests associated with database. Few forests has higher number of fragments compared to other forests in a given database.

      Root cause

      For bucket assignment policy, document uri is hashed to match specific bucket. The bucket policy algorithm maps a document’s URI to one of 16K “buckets,” with each bucket being associated with a forest. A table mapping buckets to forests is stored in memory for fast assignment.

      Bucket algorithm does not consider whole uri length for the calculation while determining bucket based on uri hash. Uri based bucket determination in bucket assignment policy rely largely on initial characters for hashing algorithm.

      If document uri includes long common prefix then all documents uri will result in same hash value and same bucket, even if they different suffix number, and hence result is skewed if there is larger common prefix.


      To confirm if uneven number of fragments between different forests in database, you can run below query which will give 100 sample documents from each forests and you can review if there are common prefix in document uri in forests with higher number of fragments.

      xquery version "1.0-ml";

      for $i in xdmp:database-forests(xdmp:database('<dbname>'))
          let $uri := for $j in cts:uris((),(),(),(), $i)[0 to 100]
                      return <uri>{$j}</uri>
      return <forests><forest>{$i}</forest><uris>{$uri}</uris></forests>


      We recommend document uri to not have long name and common prefix. Certain common document uri values can be changed to collection.

      Example uri -  /Prime/InternationalTradeDay/Activity/AccountId/ABC0001/BusinessDate/2021-06-14/CurrencyCode/USD/ID/ABC0001-XYZ-123.json

      Can be -  /ABC0001-XYZ-123.json. with collection "USD", "Prime", and doc that have date element with "2021-06-14".

      Above is just an example, but suggestion is to have an URI naming pattern to avoid large common prefix or save under collection. 

      You can use document-assign built-in to verify if URI’s are distributed per the bucket algorithm.

      Additional Resources



      Further Reading

      What is Data Hub?

      The MarkLogic Data Hub is an open-source software interface that works to:

      1. ingest data from multiple sources
      2. harmonize that data
      3. master that data
      4. then search and analyze that data

      It runs on MarkLogic Server, and together, they provide a unified platform for mission-critical use cases.


      How do I install Data Hub?

      Please see the referenced documentation Install Data Hub

      What software is required for Data Hub installation?


      What is MarkLogic Data Hub Central?

      Hub Central is the Data Hub graphical user interface


      What are the ways to ingest data in Data Hub?

      • Hub Central (note that Quick Start has been deprecated since Data Hub 5.5)
      • Data Hub Gradle Plugin
      • Data Hub Client JAR
      • Data Hub Java APIs
      • Data Hub REST APIs
      • MarkLogic Content Pump (MLCP)


      What is the recommended batch size for matching steps?

      • The best batch size for a matching step could vary due to the average number of matches expected
      • Larger average number of matches should use smaller batch sizes
      • A batch size of 100 is the recommended starting point


      What is the recommended batch size for merging steps?

      The merge batch size should always be 1


      How do I kill a long running flow in Data Hub?

      At the moment, the feature to stop/kill a long running flow in DataHub isn't available.

      If you encounter this issue, please provide support with the following information to help us investigate further:

      • Error logs and exception traces from the time the job was started
      • The job document for the step in question
        • You can find that document under the "data-hub-JOBS" db using the job ID
          • Open the query console
          • Select data-hub-JOBS db from the dropdown
          • Hit explore
          • Enter the Jobs ID from the screenshot in the search field and hit enter:
            • E.g.: *21d54818-28b2-4e56-bcfe-1b206dd3a10a*
          • You'll see the document in the results

      Note: If you want to force it, you can cycle the Java program and stop the requests from the corresponding app server status page on the Admin UI.

      KB Article:

      What do we do if we are receiving SVC-EXTIME error consistently while running the merging step?

      “SVC-EXTIME” generally occurs when a query or other operation exceeds its processing time limit. There are various reasons behind this error. For example,

      • Lack of physical resources
      • Infrastructure level slowness
      • Network issues
      • Server overload 
      • Document locking issues

      Additionally, you need to review the step where you match documents to see how many URIs you are trying to merge in one go. 

      • Reduce the batch size to a value that gives a balance between processing time and performance (the SVC-EXTIME timeout error)
      • Modify your matching step to work with fewer matches per each run rather than a huge number of matches
      • Turning ON the SM-MATCH and SM-MERGE traces would give a good indication of what it is getting stuck on. Do note, however, to turn them OFF once the issue has been detected/resolved.


      What are the best practices for performing Data Hub upgrades?

      • Note that Data Hub versions depend on MarkLogic Server versions - if your Data Hub version requires a different MarkLogic Server version, you MUST upgrade your MarkLogic Server installation before upgrading your Data Hub version
      • Take a backup
      • Perform extensive testing with all use-cases on lower environments
      • Refer to release notes (some Data Hub upgrades require reindexing), upgrade documentation, version compatibility with MarkLogic Server

      KB Article:

      How can I encrypt my password in Gradle files used for Data Hub?

      You may need to store the password in encrypted Gradle properties and reference the property in the configuration file. 



      How can I create a Golden Record using Data Hub?

      A golden record is a single, well-defined version of all the data entities in an organizational ecosystem.

      • In the Data Hub Central, once you have gone through the process of ingest, map and master, the documents in the sm-<EntityType>-mastered collection would be considered as golden records

      KB article:

      What authentication method does Data Hub support?

      DataHub primarily supports basic and digest authentication. The configuration for username/password authentication is provided when deploying your application.

      How do I know the compatible MarkLogic server version with Data Hub version?

      Refer to Version Compatibility matrix.

      Can we deploy multiple DHF projects on the same cluster?

      This operation is NOT supported.

      Can we perform offline/disconnected Data Hub upgrades?

      This is NOT supported, but you can refer to this example to see one potential approach

      TDE Generation in Data Hub

      For production purposes, you should configure your own TDE's instead of depending solely on TDE's generated by Data Hub (which may not be optimized for performance or scale)

      Where does gradle download all the dependencies we need to install DHF from?

      Below is the list of sites that Gradle will use in order to resolve dependencies:

      This tool is helpful to figure out what the dependencies are:

      • It provides a shareable and centralized record of a build that provides insights into what happened and why
      • You can create build scans using this tool and even publish those results at to see where Gradle is trying to download each dependency from under the "Build Dependencies" section on the results page.


      In the Scalability, Availabilty & Failover Guide, the node communication section describes a quorum as >50% of the nodes in a cluster.

      Is it possible for a database to be available for reads and writes, even if a quorum of nodes is not available in the cluster?

      The answer is yes, there are configurations and sequences of events that can lead to forests remaining online when there are fewer than 50% of the hosts being online.


      If a single forest in a database is not available, the database is not be accessible. It is also true that as long as all of a database's forests are available in the cluster, the database will be available for reads and writes regardless of any quorum issues.

      Of course, the Security database must also be available in the cluster for the cluster to function.

      Forest Availability: Simple Case

      In the simplest case, if you have a forest that is not configured with either local disk failover or shared disk failover and as long as the forest's host is online and exists in the cluster, the forest will be available regardless of any quorum issues.

      To explain this case in more detail: if we have a 3-node MarkLogic cluster containing 3 hosts (let's call them host-a, host-b and host-c); if we were to then initialize host-a as the primary host (so this is the first host is set up in the cluster and is the host containing the master security database) and we then join host-b and host-c to host-a to complete the cluster. 

      Shortly after that, if we shut both the joiner hosts (host-b and host-c) down, so only host host-a remained online, we would see a chain of messages in the primary host's ErrorLog that indicated there was no longer quorum within the cluster:

      2020-05-21 01:19:14.632 Info: Detected quorum (3 online, 1 suspect, 0 offline)
      2020-05-21 01:19:18.570 Warning: Detected suspect quorum (3 online, 2 suspect, 0 offline)
      2020-05-21 01:19:29.715 Info: Disconnecting from domestic host because it has not responded for 30 seconds.
      2020-05-21 01:19:29.715 Info: Disconnected from domestic host
      2020-05-21 01:19:29.715 Info: Detected suspect quorum (2 online, 1 suspect, 1 offline)
      2020-05-21 01:19:33.668 Info: Disconnecting from domestic host because it has not responded for 30 seconds.
      2020-05-21 01:19:33.668 Info: Disconnected from domestic host
      2020-05-21 01:19:33.668 Warning: Detected no quorum (1 online, 0 suspect, 2 offline)

      Under these circumstances, we would be able to access the host's admin GUI on port 8001 and it would respond without issue.  We would be able to access Query Console on that host on port 8000 and would be able to inspect the primary host's databases.  We would also be able to access the Monitoring History on port 8002 - all directly from the primary host.

      In this scenario, because the primary host remains online and the joining hosts are offline; and because we have not yet set up failover anywhere, there is no requirement for quorum, so host-a remains accessible.

      If host-a also happened to have a database with forests that only resided on that host, these would be available for queries at this time.  However, this is a fairly limited use case because in general, if you have a 3-node cluster, you would have a database whose forests reside on all three hosts in the cluster with failover forests configured on alternating hosts. 

      As soon as you do this, if you lose one host and you don't have failover configured, the database would now become unavailable (due to a crucial forest being offline) and if you had failover forests configured, you would still be able to access the database on the remaining two hosts.

      However, if you then shut down another host, you would lose quorum (which is a requirement for failover).

      Forest Availability: Local Disk Failover

      For forests configured for local disk failover, the sequence of events is important:

      In response to a host failure that makes an "open" forest inaccessible, the forest will failover to the configured forest replica as long as a quorum exists and the configured replica forest was in the "sync replicating" state. In this case, the configured replica forest will transition to the "open" state; the configured replica forest becomes the acting master forest and is available to the database for both reads and writes.

      Additionally, an "open" forest will not go offline in response to another host being evicted from the cluster.

      However, once cluster quorum is lost, forest failovers will no longer occur.


      Depending on how your forests are distributed in the cluster and depending of the order of host failures, it is possible that a database can remain online even when there is no longer a quorum of hosts in the cluster.

      Of course, databases with many forests spread across many hosts typically can't stay online if you lose quorum because some forest(s) will become unavailable.


      Even though it is possible to have a functioning cluster with less than a quorum of hosts online, you should not architect your high availability solution to depend on it.


      This article discusses what happens when you backup or restore your database after a local disk failover event on one of the database forests.


      MarkLogic Server provides high availability in the event of a data node failure. Data node failures can include operating system crashes, MarkLogic Server restarts, power failures, or persistent system failures; for example hardware failures. With Forest level failover enabled and configured, a machine that hosts a forest can go down and the MarkLogic Server cluster automatically recovers from the outage and keep continuing to process queries without any immediate action needed by an administrator. In MarkLogic Server, if a forest becomes unavailable then the entire database to which this forest is attached becomes unavailable for further query operations. Without failover, such a failure requires a manual intervention (such as administrator) to either reconfigure the forest to another host or to remove this forest from the configuration (cluster). With failover, you can configure the forest to automatically switch to a replica forest on a different host. MarkLogic Server Failover provides for high availability and maintains data and transactional integrity in the event of a data node failure.

      The failover scenarios are well documented on our developer web site.

      Local Disk Failover

      You to configure a forest on another host to serve as a replica forest which will take over when a primary master forest's host goes offline. Local-disk failover allows you to create one or more replica forests for each primary forest. Replica forests contain the exact same data as the primary forest and are kept consistent transactionally. 

      It is helpful to use the following terms to refer to the forest configurations and states:

      • Configured Master is the forest which is originally configured as the primary forest.
      • Configured Replica is a forest on another host that is configured as a replica forest of the primary. 
      • Acting Master is the forest that is server as the master forest, regardless of the configuration.
      • Acting Replica is the forest that is server as the replica forest, regardless of the configuration.

      Database Backup when a forest is failed over

      If you attempt to take a Database back up or perform a database restore when One of the forests of the database had failed over to the replica (i.e. Configured Replica is serving as Acting Master), it may result in XDMP-FORESTNOTOPEN or XDMP-HOSTDOWN errors.

      When a database backup takes place, by default, everything associated with database gets backed up. You can also choose to backup any individual forests (only the forests selected while configuring backup are backed up). T

      Replica Forest will only be backed up when the 'Include replica forests' are enabled.  If you have not configured the backup to include replica forests, then the replica forests will not be backed up even if it is the acting master. If the Configured Master is also not available, then neither forest will be backed up. In this circumstance, you may see a message in the error logs similar to "Warning: Not backing up database test because first forest master is not available, and replica backups aren't enabled."

      Restore when a forest is failed over

      Restore's will fail if executed when a forest is failed over (i.e. Configured Replica is serving as Acting Master). In this circumstance, you may see a message in the error logs similar to "Operation failed with error message. Check server logs." or "XDMP:HOSTDOWN".

      How to detect if a forest is failed over

      In the Admin UI:

      1. Click the Forests icon in the left tree menu;
      2. Click the Summary tab;
      3. You see the configured replica in open state; (This indicates that the Configured Replica is serving as Acting Master).

      At the time of the failover event, you may see messages in the Error Log similar to:
      2013-10-03 12:49:53.873 Info: Disconnecting from domestic host in cluster 16599165797432706248 because it has not responded for 30 seconds.
      2013-10-03 12:49:53.873 Info: Disconnected from host
      2013-10-03 12:49:53.873 Info: Unmounted forest test_P
      2013-10-03 12:49:53.875 Info: Forest test_R assuming the role of master with new precise time 13808297938747190
      2013-10-03 12:49:53.875 Debug: Recovering undo on forest test_R
      2013-10-03 12:49:53.875 Debug: Recovered undo at endTimestamp 13807844927734200 minQueryTimestamp 0 on forest test_R

      Revert back from the failover state:

      When the configured master is the acting replica, this is considered the "failover state".  In order to revert back, you must either restart the acting master forest or restart the host in which the acting master forest is locally mounted. After restarting, the forest will automatically revert to Configured Master if it's host is online. To check the status of the forests, see the Forests Summary tab in the Admin Interface. 


      For backup and restore to work correctly, clusters configured with local disk failover must have no forests in a failed over state. If a cluster is configured with local disk failover, and if some of its forests are failed over to their local disk replicas, the conditions causing the fail over must be resolved, and the cluster must be returned to the original forest configuration before backup and restore operations may resume.


      From the documentation:

      Queries on a Replica database must run at a timestamp that lags the current cluster commit timestamp due to replication lag. Each forest in a Replica database maintains a special timestamp, called a Non-blocking Timestamp, that indicates the most current time at which it has complete state to answer a query. As the Replica forest receives journal frames from its Master, it acknowledges receipt of each frame and advances its nonblocking timestamp to ensure that queries on the local Replica run at an appropriate timestamp. Replication lag is the difference between the current time on the Master and the time at which the oldest unacknowledged journal frame was queued to be sent to the Replica.

      To read more:


      Consider the following customer scenario:

      • The storage the database resides on at one site fails.
      • This requires the customer to run for a period of time on a single site.
      • The storage / MarkLogic server are recovered at the site where the failure occurred.
      • The customer needs to re-establish replication between the two sites


      Q: Should we tune the lag limit to suit our application?

      AWe have found in our own performance testing that increasing the lag limit beyond the default is typically not helpful.

      When the master has a sustained rate of updates, a large lag limit causes it to run quickly ahead of the replica, then stall for an extended period of time until the replica catches up. This pattern repeats over and over and gives inconsistent performance on the master.

      A smaller lag limit causes the master to suspend updates more frequently but for shorter periods of time, resulting in more consistent perceived performance.

      Q: Is there any option to restore the replica database to a point in time from a backup of the master database & re-initiate replication from that point onwards?

      A: It's fine to restore a backup to the failed system when it comes back online and before configuring replication in the reverse direction.

      Q: Is there a limit to how old a backup of the replica database can be (e.g. can a replica be restored from months back in comparison to the master) and will it still sync back to the master without issue? And does this depend on what journal data is available?

      A: There is no limit to how old a backup can be; the system will calculate all the deltas and apply them.

      Q: Are there any documented API built-ins for any of these things?

      A: Indeed; all the replication information is available through a call to xdmp:forest-status()


      For further information:

      Q: Can you also advise if the replication lag limit mentioned in section 1.2.5 and the related possibility of transactions stalling on the master database applies during the bulk replication phase?

      A: As long as the replica's forests are in "open replica" state, the replica will respond to queries at any commit timestamp it is able to support irrespective of whether replication is lagged.

      A new feature in MarkLogic 5 is an application server setting for multi-version concurrency control (by default this is set to contemporaneous - meaning it will run from the latest timestamp that any query has committed - irrespective of whether there are still transactions in-flight).

      Conversely, if nonblocking is chosen (i.e. if you create an application server to query a replica database and you set multi-version concurrency control to nonblocking), the server will choose the last timestamp where all pending transactions are known to have successfully committed.

      If you wish to evaluate a query against a replica database you can use xdmp:database-nonblocking-timestamp() to determine the most current query timestamp that will not block.


      Database Replication replicates fragments/documents from a source database to a target database. You may see different database sizes (even when active fragment counts are then same) between Master and Replica Databases. This article provides overview of variables and reasons behind such observation.

      Database Replication:

      Database Replication operates at the forest level by copying journal frames from a forest in the Master database and replaying them on a corresponding forest in the foreign replica database. In other words, this means that when Journal frames are replayed in the replica database, the same group of documents in a single stand of the master database, does not necessarily reside in the same stand on the replica database - i.e. the distribution of fragments within stands are different between the master and replicas. 

      Also, Note that Master and Replica forests can be distributed differently across hosts in each cluster. Even when they are distributed identically (Master DB forest name to Replica DB forest name) you could still see a different number stand between them.

      Database Size, Deleted Fragment and Merge:

      Current Database Size depends on number of factors like number of documents, index, deleted fragments in Stand etc. Deleted Fragments in any stand itself depends on Merge Policy, Background Merge process, Processing Cycle available, Linux Memory Config, Memory Usage at any given time, and application usage pattern.


      Master Cluster and Replica Cluster are separate entities. Although connected, they operate independently. Replica Database on target cluster provides data consistency. However how data can be spread across different stands than the corresponding master, including the retention of deleted fragments, will differ between Master and Replica Cluster. Hence you may see different sizes between Master and Replica Databases, even where the active fragments are the same.

      Further Reading


      If your MarkLogic Server has it's logging level set to "Debug", it's common to see a chain of 'Detecting' and 'Detected' messages that look like this in your ErrorLogs:

      2015-01-27 11:11:04.407 Debug: Detected indexes for database Documents: ss, fp, fcs, fds, few, fep, sln
      2015-01-27 11:11:04.407 Debug: Detecting compatibility for database Documents
      2015-01-27 11:11:04.407 Debug: Detected compatibility for database Documents

      This message will appear immediately after forests are unmounted and subsequently remounted by MarkLogic Server. Detecting indexes is a relatively lightweight operation and usually has minimal impact on performance.

      What would cause the forests to be unmounted and remounted

      • Forest failovers
      • Heavy network activity leading to a cluster (XDQP) "Heartbeat" timeout
      • Changes made to forest configuration or indexes
      • Any incident that may cause a "Hung" message

      Apart from the forest state changes (unmount/mount), this message can also appear due to other events requiring index detection.

      What are "Hung" messages?

      Whenever you see a "Hung" message it's very often indicative of a loss of connection to the IO subsystem (especially the case when forests are mounted on network attached storage rather than local disk). Hung messages are explained in a little more detail in this Knowledgebase article:

      What do the "Detected" messages mean and what can I do about them?

      Whenever you see a group of "Detecting" messages:

      2015-01-14 13:06:26.016 Debug: Detecting indexes for database XYZ

      There was an event where MarkLogic chose to (or was required to) attempt to unmount and remount forests (and the event may also be evident in your ErrorLogs).

      The detecting index message will occur soon after a remount, indicating that MarkLogic Server is examining forest data to check whether any reindexing work is required for all databases available to the node which have Forests attached:

      2015-01-14 13:06:26.687 Debug: Detected indexes for database XYZ: ss, wp, fp, fcs, fds, ewp, evp, few, fep

      The line immediately below indicates that the scan has been completed and the database has been identified as having been configured with a number of indexes. For the line above, these are:

      stemmed searches
      word positions
      fast phrase searches
      fast case sensitive searches
      fast diacritic sensitive searches
      element word positions
      element value positions
      fast element word searches
      fast element phrase searches

      From this list, we are able to determine which indexes were detected.  These messages will occur after every remount if you have index detection set to automatic in the database configuration.

      Every time the forest is remounted, in addition to a recovery process (where the Journals are scanned to ensure that all transactions logged were safely committed to on-disk stands), there are a number of other tests the server will do. These are configured with three options at database level:

      • format compatibility
      • index detection
      • expunge locks

      By default, these three settings are configured with the "automatic" setting (in MarkLogic 7), so if you have logging set to "Debug" level, you'll know that these options are being worked through on remount:

      2015-01-14 13:06:26.016 Debug: Detecting indexes for database XYZ (represents the task for "automatic" index detection where the reindexer checks for configuration changes)
      2015-01-14 13:06:26.687 Debug: Detecting compatibility for database XYZ (represents the task for "automatic" format compatibility where the on-disk stand format is detected)

      These default values may change in accross releases of MarkLogic Server. In MarkLogic 8, expunge locks is set to none but the other two are still set to automatic.

      Can these values be changed safely and what happens if I change these?

      Unmounting / remounting times can be made much shorter by configuring these settings away from automatic but there are some caveats involved; if you need to upgrade to a future release of the product, it's likely that the on-disk stand format may change (it's still 5.0 even when MarkLogic 8 is released) and so setting format compatibility to 5.0 should cause the "Detecting compatibility" messages to disappear and speed up remount times.

      The same is true for disabling index detection but it's important to note that changing index settings on the database will no longer cause the reindexer to perform any checks on remount; in this case you would need to enable this for changes to database index settings to be reindexed.

      Related Reading

      How to handle XDQP-TIMEOUT on a busy cluster


      This article will provide steps to debug applications using the Alerting API that are not triggering an alert.


      1) Check that all required components are present in the database where alerting is setup: config, actions, rules.   Run the attached script 'getalertconfigs.xqy' through the Query Console and review the output.  

      2) As documented in our Search Developer's Guide, Test the alert manually with alert:invoke-matching-actions(). 


            <doc>hello world</doc>, <options/>)

      3) Use the rule's query to test against the database to check that the expected documents are returned by the query.

      Take the query text from the rule and run it through Query Console using a cts:search() on the database.  This will confirm whether the expected documents are a positive match.  If the documents are returned and no alert is triggered, then further debugging will be needed on the configuration or the query may need to be modified.


      Division operations involving integer or long datatypes may generate XDMP-DECOVRFLW in MarkLogic 7. This is the expected behavior but it may not be obvious upon initial inspection.  

      For example, similar queries with similar but different input values executed in Query Console on Linux/Mac machine running MarkLogic 7 gives the following results

      1. This query returns correct results

      let $estimate := xs:unsignedLong("220")

      let $total := xs:unsignedLong("1600")

      return $estimate div $total * 100

      ==> 13.75

      2. This query returns the XDMP-DECOVRFLOW Error


      let $estimate := xs:unsignedLong("227")

      let $total := xs:unsignedLong("1661")

      return $estimate div $total * 100

      ==> ERROR : XDMP-DECOVRFLW: (err:FOAR0002)


      The following defines relevant behaviors in MarkLogic 7 and previous releases.

      • In MarkLogic 7, if all the operands involved in div operations are integer, long or integer sub-types in XML, then the resulting value of the div operation are stored as xs:decimal.
      • In versions previous to MarkLogic 7, if an xs:decimal value is large and occupies all digits then it was implicitly cast into an xs:double for further operations - i.e. beginning with MarkLogic, implict casting no longer occurs in this situation .
      • xs:decimal can accomodate 18 digits as a datatype.
      • In MarkLogic 7 on Linux & Mac, xs:decimal can occupy all digits depending upon actual value ( 227 div 1661 = 0.1366646598434677905 ), all 18 digits occupied in xs:decimal
      • MarkLogic 7 on Windows does not perform division with full decimal precision ( 227 div 1661 produces 0.136664659843468 ); as a result, not all 18 digits occupied in xs:decimal
      • MarkLogic 7 will generates Overflow Exception : FOAR0002, when an operation is performed on an xs:decimal that is already at full decimal precision

      In the example above, multiplying the result with 100 gives an error in Linux/Mac, while its OK on Windows.


      We recommend xs:double be used for all division related operations in order to explicitly cast resulting value to larger data-type.

      For example: These will return results

      xs:double($estimate) div $total * 100

      $estimate div $total * xs:double(100)






      There are options 'maintain last modified' and 'maintain directory last modified' on the Admin UI for a database, which when turned on add properties to every document inserted in the database.  There may be a need to remove all the property fragments of all the documents in the database when the properties no longer need to be retained.


      Turning these options off for a database ensure that properties will not be created for new documents. However, existing document properties will not be removed by turning these settings off.


      To delete existing document properties, the following query can be used:




      Please make sure that 'maintain last modified' and 'maintain directory last modified' options are turned off for the database, so that the property fragment does not get recreated for the document.




      Terraform from HashiCorp is a deployment tool that many organizations use to manage their infrastructure as code. It is platform agnostic, allowing for the deployment and configuration of on-site physical infrastructure, as well as cloud infrastructure such as AWS, Azure, VSphere and more.

      Terraform uses the Hashicorp Configuration Language (HCL) to allow for concise descriptions of infrastructure. HCL is JSON compatible language, and was designed to be both human and machine friendly.

      This powerful tool can be used to deploy a MarkLogic Cluster to AWS using the MarkLogic CloudFormation Template. The MarkLogic CloudFormation Template is the preferred method recommended by MarkLogic for building out MarkLogic clusters within AWS.

      Setting Up Terraform

      For the purpose of this example, I will assume that you have already installed Terraform, the AWS CLI and you have configured the credentials. You will also need to have a working directory that has been initialized using terraform init.

      Terraform Providers

      Terraform uses Providers to provide access to different resources. The Provider is responsible for understanding API interactions and exposing resources. The AWS Provider is used to provide access to AWS resources.

      Terraform Resources

      Resources are the most important part of the Terraform language. Resource blocks describe one or more infrastructure objects, like compute instances and virtual networks.

      The aws_cloudformation_stack resource, allows Terraform to create a stack from a CloudFormation template.

      Choosing a Template

      MarkLogic provides two templates for creating a managed cluster in AWS.

      • MarkLogic cluster in new VPC
      • MarkLogic cluster in an existing VPC
      I've chosen to deploy my cluster to an VPC. When deploying to an existing VPC, you will need to gather the VPC ID, as well as the Subnet IDs for the public and private subnets.

      Defining Variables

      The MarkLogic CF Template takes a number of input variables, including the region, availability zones, instance types, EC2 keys, encryption keys, licenses and more. We have to define our variables so they can be used as part of the resource.

      Variables in HCL can be declared in a separate file, which allows for deployment flexibility. For instance, you can create a Development resource and a Production resource, but using different variable files.

      Here is a snippet from our variables file:

      variable "cloudform_resource_name" {
      type = string
      default = "Dev-Cluster-CF"
      variable "stack_name" {
      type = string
      default = "Dev-Cluster"
      variable "ml_version" {
      type = string
      default = "10.0-4"
      variable "availability_zone_names" {
      type = list(string)
      default = ["us-east-1a","us-east-1b","us-east-1c"]

      In the snippet above, you'll notice that we've defined the availability_zone_names as a list. The MarkLogic CloudFormation template won't take a list as an input, so later we will join the list items into a string for the template to use.

      This also applies to any of the other lists defined in the variable files.

      Using the CloudFormation Resource

      So now we need to define the resource in HCL, that will allow us to deploy a CloudFormation template to create a new stack.

      The first thing we need to do, is tell Terraform which provider we will be using, defining some default options:

          provider "aws" {
          profile = "default"
          #access_key = var.access_key
          secret_key = var.secret_key
          region = var.aws_region

      Next, we need to define the `aws_cloudformation_stack` configuration options, setting the variables that will be passed in when the stack is created:

          resource "aws_cloudformation_stack" "marklogic" {
          name = var.cloudform_resource_name
          capabilities = ["CAPABILITY_IAM"]
          parameters = {
          IAMRole = var.iam_role
          AdminUser = var.ml_admin_user
          AdminPass = var.ml_admin_password
          Licensee = "My User - Development"
          LicenseKey = "B581-REST-OF-LICENSE-KEY"
          VolumeSize = var.volume_size
          VolumeType = var.volume_type
          VolumeEncryption = var.volume_encryption
          VolumeEncryptionKey = var.volume_encryption_key
          InstanceType = var.instance_type
          SpotPrice = var.spot_price
          KeyName = var.secret_key
          AZ = join(",","${var.avail_zone}")
          LogSNS = var.log_sns
          NumberOfZones = var.number_of_zones
          NodesPerZone = var.nodes_per_zone
          VPC = var.vpc_id
          PublicSubnets = join(",","${var.public_subnets}")
          PrivateSubnets = join(",","${var.private_subnets}")
          template_url = "${var.template_base_url}${var.ml_version}/${var.template_file_name}"

      Deploying the Cluster

      Now that we have defined our variables and our resources, it's time for the actual deployment.

      $> terraform apply

      This will show us the work that Terraform is going to attempt to perform, along with the settings that have been defined so far.

      Once we confirm that things look correct, we can go ahead and apply the resource.

      Now we can check the AWS Console to see our stack

      And we can also use the ELB to login to the Admin UI

      Wrapping Up

      We have now deployed a 3 node cluster to an existing VPC using Terraform. The cluster is now ready to have our Data Hub, or other application installed.

      Deploying MarkLogic in AWS with Ansible


      Ansible, owned by Red Hat, is an open source provisioning, configuration and application deployment tool that many organizations use to manage their infrastructure as code. Unlike options such as Chef and Puppet, it is agentless, utilizing SSH to communicate between servers. Ansible also does not need a central host for orchestration, it can run from nearly any server, desktop or laptop. It supports many different platforms and services allowing for the deployment and configuration of on-site physical infrastructure, as well as cloud and virtual infrastructure such as AWS, Azure, VSphere, and more.

      Ansible uses YAML as its configuration management language, making it easier to read than other formats. Ansible also uses Jinja2 for templating to enable dynamic expressions and access to variables.

      Ansible is a flexible tool can be used to deploy a MarkLogic Cluster to AWS using the MarkLogic CloudFormation Template. The MarkLogic CloudFormation Template is the preferred method recommended by MarkLogic for building out MarkLogic clusters within AWS.

      Setting Up Ansible

      For the purpose of this example, I will assume that you have already installed Ansible, the AWS CLI, and the necessary python packages needed for Ansible to talk to AWS. If you need some help getting started, Free Code Camp has a good tutorial on setting up Ansible with AWS.

      Inventory Files

      Ansible uses Inventory files to help determine which servers to perform work on. They can also be used to customize settings to indiviual servers or groups of servers. For our example, we have setup our local system with all the prerequisites, so we need to tell Ansible how to treat the local connections. For this demonstration, here is my inventory, which I've named hosts

      localhost              ansible_connection=local

      Ansible Modules

      Ansible modules are discreet units of code that are executed on a target. The target can be the local system, or a remote node. The modules can be executed from the command line, as an ad-hoc command, or as part of a playbook.

      Ansible Playbooks

      Playbooks are Ansible's configuration, deployment and orchestration language. Playbooks are how the power of Ansible, and its modules is extended from basic configuration, or manangment, all the way to complex, multi-tier infrastructure deployments.

      Chosing a Template

      MarkLogic provides two templates for creating a managed cluster in AWS.

      1. MarkLogic cluster in new VPC
      2. MarkLogic cluster in an existing VPC

      I've chosen to deploy my cluster to an VPC. When deploying to an existing VPC, you will need to gather the VPC ID, as well as the Subnet IDs for the public and private subnets.

      Defining Variables

      The MarkLogic CF Template takes a number of input variables, including the region, availability zones, instance types, EC2 keys, encryption keys, licenses and more. We have to define our variables so they can be used as part of the resource.

      Variables in Ansible can be declared in a separate file, which allows for deployment flexibility.

      Here is a snippet from our variables file:

      # vars file for marklogic template and version
      ml_version: '10.0-latest'
      template_file_name: 'mlcluster.template'
      template_base_url: ''


      # CF Template Deployment Variables
      aws_region: 'us-east-1'
      stack_name: 'Dev-Cluster-An3'
      IAMRole: 'MarkLogic'
      AdminUser: 'admin'

      Using the CloudFormation Module

      So now we need to create our playbook, and choose the module that will allow us to deploy a CloudFormation template to create a new stack. The cloudformation module allows us to create a CloudFormation stack.

      Next, we need to define the cloudformation configuration options, setting the variables that will be passed in when the stack is created.

      # Use a template from a URL
      - name: Ansible Test
        hosts: local


          - ml-cluster-vars.yml


          - cloudformation:
              stack_name: "{{ stack_name }}"
              state: "present"
              region: "{{ aws_region }}"
              capabilities: "CAPABILITY_IAM"
              disable_rollback: true
              template_url: "{{ template_base_url+ml_version+'/'+ template_file_name }}"
                IAMRole: "{{ IAMRole }}"
                AdminUser: "{{ AdminUser }}"
                AdminPass: "{{ AdminPass }}"
                Licensee: "{{ Licensee }}"
                LicenseKey: "{{ LicenseKey }}"
                KeyName: "{{ KeyName }}"
                VolumeSize: "{{ VolumeSize }}"
                VolumeType: "{{ VolumeType }}"
                VolumeEncryption: "{{ VolumeEncryption }}"
                VolumeEncryptionKey: "{{ VolumeEncryptionKey }}"
                InstanceType: "{{ InstanceType }}"
                SpotPrice: "{{ SpotPrice }}"
                AZ: "{{ AZ | join(', ') }}"
                LogSNS: "{{ LogSNS }}"
                NumberOfZones: "{{ NumberOfZones }}"
                NodesPerZone: "{{ NodesPerZone }}"
                VPC: "{{ VPC }}"
                PrivateSubnets: "{{ PrivateSubnets | join(', ') }}"
                PublicSubnets: "{{ PublicSubnets | join(', ') }}"
                Stack: "ansible-test"

      Deploying the cluster

      Now that we have defined our variables created our playbook, it's time for the actual deployment.

      ansible-playbook -i hosts ml-cluster-playbook.yml -vvv

      The -i option allows us to reference the inventory file we created. As the playbook runs, it will output as it starts and finishes tasks in the playbook.

      PLAY [Ansible Test] ************************************************************************************************************


      TASK [Gathering Facts] *********************************************************************************************************
      ok: [localhost]


      TASK [cloudformation] **********************************************************************************************************
      changed: [localhost]

      When the playbook finishes running, it will print out a recap which shows the overall results of the play.

      PLAY RECAP *********************************************************************************************************************
      localhost                  : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

      This recap tells us that 2 tasks ran successfully, resulted in 1 change, and no failed tasks, which is our sign that things worked.

      If we want to see more information as the playbook runs we can add one of the verbose flags (-vor -vvv) to provide more information about the parameters the script is running, and the results.

      Now we can check the AWS Console to see our stack:

      And we can also use the ELB to login to the Admin UI

      Wrapping Up

      We have now deployed a 3 node cluster to an existing VPC using Ansible. The cluster is now ready to have our Data Hub, or other application installed.  We can now use the git module to get our application code, and deploy our code using ml-gradle.

      Deploying REST API Search/Query Options in DHS

      REST API Query Options Overview

      You can use persistent or dynamic query options to customize your queries. MarkLogic Server comes configured with default query options. You can extend and modify the default options using /config/query/default.

      REST API Search options are defined per Group and App Server. When using ml-gradle, they are typically deployed by putting the files defining the options in the src/main/ml-modules/options directory of your gradle project. By default the options will be deployed to the Group/App Server that gradle is pointing at in the data-hub-MODULES database under /[GroupName]/[App Server]/rest-api/options/[name of file].

      REST API Query Options in DHS

      In DHS, query options are created under the Evaluator Group for the data-hub-FINAL app server. One side effect of the permissions for DHS, is that users will not be able to see the files after they are deployed. The default permissions for the options file are rest-reader-internal and rest-admin-internal, which is not provided to the data-hub roles.

      To check that the search options have been deployed you can use the following curl command:

      • curl --anyauth --user username:password -k -X GET -H "Content-type: application/xml"[myOptions]

      If the options exist, you will get results. If the options do not exist, then you will get a 400 return, with a REST-INVALIDPARAM error.

      Deploying Options to Other App Servers and Groups

      Deploying Options to the Staging App Server

      Using src/main/ml-modules/options will only deploy the options to the Final app server. If you want to deploy the options to the Staging app server, then you will need to define the options under src/main/ml-modules/root/Evaluator/data-hub-STAGING/rest-api/options

      Deploying Options to Other Groups

      If the cluster is configured for auto-scaling, the dynamic e-nodes will belong to either the Analyzer, Curator or Operator group, so the search options will not be available for the dynamic e-nodes.

      To set the options for the app servers in other groups, you will also use src/main/ml-modules/root/[Group Name]/[App Server Name]/rest-api/options

      • src/main/ml-modules/root/Analyzer/data-hub-FINAL/rest-api/options
      • src/main/ml-modules/root/Operator/data-hub-FINAL/rest-api/options
      • ...etc

      When deploying the options files in this way, they get different permissions than when they are deployed vi ml-modules/options. The permissions are rest-extension-user, data-hub-module-reader, data-hub-module-writer, tde-admin, and tde-view, but the permission differences do not appear to make a difference in functionality.

      Deployment Failures

      When options are deployed with the rest of the non-REST modules in ml-modules/root/..., it uses the /v1/documents endpoint, which allows you to set the file permissions.

      When options are deployed from ml-modules/options, it uses the /v1/config/query endpoint, which does not allow you to set the file permissions.

      One effect of this difference is if you attempt to deploy the search options using both ml-modules/options and src/main/ml-modules/root/Evaluator/data-hub-FINAL/rest-api/options you will encounter a SEC-PERMDENIED error and the deployment will fail. If you encounter this error, ensure you aren't attempting to deploy the options in both locations.


      This KB article lists some available tools for continuous integration and automatically deploying the MarkLogic Server

      Deployment Options

      ml-gradle is a gradle plugin that can be used for configuration and application deployments. Application deployments are maintained as projects, which can deployed to any environment - Development, QA, Production, etc.

      The MarkLogic Configuration Management API is a RESTful API that allows retrieving, generating, and applying configurations for MarkLogic clusters, databases, and application servers.

      The MarkLogic The Management API is a REST-based API that allows you to administer MarkLogic Server and access MarkLogic Server instrumentation with no provisioning or set-up. You can use the API to perform administrative tasks such as initializing or extending a cluster; creating databases, forests, and App Servers; and managing tiered storage partitions. The API also provides the ability to easily capture detailed information on MarkLogic Server objects and processes, such as hosts, databases, forests, App Servers, groups, transactions, and requests from a wide variety of tools.

      The MarkLogic Admin APIs provide a flexible toolkit for creating new and managing existing configurations of MarkLogic Server.

      Integration Testing

      MarkLogic Unit Test is a testing component that was originally part of the Roxy project. This component enables you to build unit tests that are written in and can test against both XQuery and Server-side JavaScript.

      Implementation Specific Tools

      CloudFormation Templates

      MarkLogic CloudFormation templates enable you to launch clusters with an Elastic Load Balancer, Elastic Block Storage, Auto Scaling Group, and so on. Your cluster can be in either one Availability Zone or three Availability Zones. Multiple nodes can be placed within each Availability Zone. You can choose whether to deploy to an existing VPC, or a new VPC. The templates can also be used with tools like Terraform and Ansible


      The MarkLogic Python API aims to provide complete coverage of the capabilities in the MarkLogic REST API in idiomatic Python.


      Jenkins is often used with MarkLogic Server for building deployable artifacts, staging build artifacts, running automated tests, and deploying said artifacts. Jenkins has great REST endpoints that make it easy to get / put job configurations, and enable / disable jobs from scripts.

      Jenkins provides a driver to the continuous integration / continuous delivery process that can integrate with other tools. In combination with ml-gradle, it can be used to run deploy module/unit test on code check-in.

      One pipeline example used with Jenkins is to:

      1. Pull the code from Git
      2. Deploy to DEV with ml-gradle
      3. Run MarkLogic Unit Test
      4. Email a report of the success/failure
      5. Kick off job to deploy to another environment

      Also noted that the most important best practice here would be to make sure Jenkins runs primarily off of a host other than a MarkLogic host.


      This article will help MarkLogic Administrators to monitor the health of their MarkLogic cluster. By studying the attached scripts, you will learn how to find out which hosts are down and which forests have failed over, enabling you to take the necessary recovery actions.

      Initial Setup

      On a separate Linux host (not a member of the cluster), download the file attachments from this article, making sure that they all reside within the same directory.

      Here is a general description of each file:

      cluster-name.conf - Example configuration file used by script. Configures information for monitoring one ML cluster. - A very simple, low-load check that all the nodes of a cluster are up and running. - A more detailed check for essential cluster functionality with alerting (paging and/or emails to DBAs) if warranted. This script relies on at least one external XQuery file (mon-report-failed-over-forests.xqy) and makes use of the REST MGMT API as well as REST XQuery requests.

      mon-report-failed-over-forests.xqy - External XQuery file used by


      Preparing the CONF File for Use on Your Cluster

      Before running the scripts, the cluster-name.conf needs to be customized for your specific cluster. Start by changing the file name to match the name of your cluster, e.g.,

      $ mv cluster-name.conf some-other-name.conf

      Where "some-other-name" is the actual name of the cluster, or of the application that is hosted on that cluster.

      Next, you will need to customize some of the internal variables inside the CONF file itself. Here is the contents of the cluster-name.conf file, as downloaded:

      # MarkLogic Credentials for the REST Management port - 8002
      # MarkLogic Credentials for the XQuery eval port - 8000

      ---------  end of listing ---------

      For CLUSTER_NAME, provide the cluster-name listed in the cluster's /var/log/MarkLogic/clusters.xml file.

      For CLUSTER_NODES, write in the host-names for each node in your cluster.

      For USER_PW_MGMT, provide the user-name and password for the REST MANAGEMENT user, the format is name:password.

      For USER_PW_XQ, provide the user-name and password for the user who will execute the XQuery scripts, the format is name:password.

      The UNIX_USER is a local Unix username with the correct rwx access rights for this directory.

      The PAGE_ADDRESSES & MAIL_ADDRESSES are alert email addresses who will be notified whenever there is a failover event.


      The script was created with the idea it would be run repeatedly at a certain interval to keep tabs on system health. For example, it can be configured to be invoked with a cron job. A frequency of 5 to 120 minutes is a good candidate range. Ten minutes is a good time if you would like to be woken up (on average) within 5 minutes of a failover event.

      Setting up SSH Passwordless Login

      In monitoring script, section (6) FOREST STATUS CHANGE, requires ssh access to the cluster hosts. That is because this section greps through MarkLogic server ErrorLogs. To enable this part of the script to run without prompting the user, "ssh passwordless login" should be setup between the monitoring host and all the cluster hosts.There are many examples of how to do this on the internet, for example: Alternatively, this monitoring section can be commented out.

      Also regarding section (6), the “grep” command is setup up to grep the latest 10 minutes from the ErrorLog. If this script is configured to be run less often then every 10 minutes, the “grep” command line should be adapted to cover the desired period between script runs.

      Example Usage

      You are now ready to execute the failover monitoring scripts! Here is how you would execute them:

      $ ./ some-other-name.conf MY-CLUSTER-NAME

      $ ./ some-other-name.conf

      [where "some-other-name" and MY-CLUSTER-NAME are your actual CONF and cluster-name, as described above]

      Monitoring Multiple Clusters

      So, given a monitoring machine with a directory of cluster configuration files in the style of cluster-name.conf, those configuration files could be iterated through to monitor a suite of clusters from a single monitoring machine. It should be fairly easy to build a custom shell script to iterate through various cluster CONF files.

      Final thought and Limitations

      Please be aware that the script is only partially implemented. In particular, the Replication Lag and Replication Failure sections are left as exercises for the user.

      This script is being presented as a backup, lowest common denominator monitoring solution. For a more complete solution, you should explore other options, such as Splunk or Nagios.





      According to Wikipedia, DevOps is a set of practices that combines software development (Dev) and IT operations (Ops) with the goal of shortening the Systems Development Lifecycle, and providing continuous delivery with high software quality. This KB will provide some guidance for system deployment and configuration, which can be integrated into an organizations DevOps processes.

      For more information on using MarkLogic as part of a Continuous Integration/Continuous Delivery process, see the KB  Deployment and Continuous Integration Tools.

      Deploying a Cluster

      Deploying a MarkLogic cluster that will act as the target environment for the application code being developed is one piece of the DevOps puzzle. The approach that is chosen will depend on many things, including the tooling already in use by an organization, as well as the infrastructure that will be used for the deployment.  We will cover two of the most common environments, On-Premise and Cloud.

      On-Premise Deployments

      On-Premise deployments, which can include using bare metal servers, or Virtual Machine infrastructure (such as VMware), are one common environment. You can deploy a cluster to an on-premise environment using tools such as shell scripts, or Ansible. In the Scripting Administrative Tasks Guide, there is a section on Scripting Cluster Management, which provides some examples of how a cluster build can be automated.

      Once the cluster is deployed, some of the specific configuration tasks that may need to be performed on the cluster can be done using the Management API.

      Cloud Deployments

      Cloud deployments utilize flexible compute resources provided by vendors such as Amazon Web Services (AWS), or Microsoft Azure.

      For AWS, MarkLogic provides an example CloudFormation template, that can be used to deploy a cluster to Amazon's AWS EC2 Environment. Tools like the AWS Command Line Interface (CLI), Terraform or Ansible can be used to extend the MarkLogic CloudFormation template, and automate the process of creating a cluster in the AWS EC2 environment.  MarkLogic has provided an example , which can be utilized to . The template can be used to deploy a cluster using the AWS CLI. The template can also be used to Deploy a Cluster Using Terraform, or it can be used to Deploy a Cluster Using Ansible.

      For Azure, MarkLogic has provided Solution Templates for Azure which can be extended for automated deployments using the Azure CLI, Terraform or Ansible.

      As with the on-premise deployments, configuration tasks can be performed on the cluster using the Management API


      This is just a brief introduction into some aspects of DevOps processes for deploying and configuring a MarkLogic Cluster.


      After adding or removing a forest and correspond replica forest in a database, we have seen instances where the Rebalancer does not properly distribute the documents amongst existing and newly added forests.

      For this particular instance, XDMP-HASHLOCKINGRETRY debug level error message reported repeatedly in the error logs.  The messages would look something like: 

      2016-02-11 18:22:54.044 Debug: Retrying HTTPRequestTask::handleXDBCRequest 83 because XDMP-HASHLOCKINGRETRY: Retry hash locking. Forests config hash does not match.

      2016-02-11 18:22:54.198 Debug: Retrying ForestRebalancerTask::run P_initial_p2_01 50 because XDMP-HASHLOCKINGRETRY: Retry hash locking. Forests config hash does not match.


      Gather statistics about the rebalancer to see the number of documents being scheduled. If you run attached script “rebalancer-preview.xqy” in the query console of your MarkLogic Server cluster, it will produce rebalancer statistics in tabular format.

      • Note that you must first change the database name (YourDatabaseNameOnWhichNewForestsHaveBeenAdded) on the 3rd line of the XQuery script “rebalancer-preview.xqy”:

      declare variable $DATABASE as xs:string := xdmp:get-request-field("db", "YourDatabaseNameOnWhichNewForestsHaveBeenAdded");

      If experiencing this issue, the newly added forests will show zero in the “Total to be moved” column in the generated html page.


      Perform a cluster wide restart in order to get past this issue.  The restart is required to reload all of the configuration files across the cluster.  The rebalancer will also check to see if additional rebalancing work needs to occur. The rebalancer should work as expected now and the  XDMP-HASHLOCKINGRETRY messages should no longer appear in the logs. If you run the rebalancer-preview.xqy script again, the statistics should now show the the number of documents being scheduled to be moved.

      You can also validate the rebalancer status from the Database Status page in the Admin UI.

      The XDMP-HASHLOCKINGRETRY rebalancer issue has fixed in the latest MarkLogic Server releases.  However, the rebalancer-preview.xqy script can be used to help diagnose other perceived issues with the Rebalancer.


      Search fundamentals


      Difference between cts:contains and fn:contains

       1) fn:contains is a substring match, where as cts:contains performs query matching

       2) cts:contains therefore can utilize general queries and stemming, where fn:contains does not


      For example:-



      <test>daily running makes you fit</test>


      •         fn:contains(fn:doc(“Example.xml”),”ning”)


      •          cts:contains(fn:doc(“Example.xml”),”ning”)




      •         fn:contains(fn:doc(“Example.xml”),”ran”)


      •         cts:contains(fn:doc(“Example.xml”),”ran”)





      The cts:contains examples are checking the document against cts:word-querys.  Stemming reduces words down to their root, allowing for smaller term lists.


      1) Words from different languages are treated differently, and will not stem to the same root word entry from another language.

      2) Note: Nouns will not stem to verbs and vice versa. For example, the word “runner” will not stem to “run”.



      MarkLogic Server provides a variety of  disaster recovery (DR) facilities including full backup, incremental backup, and journal archiving that when combined with other ML features can create a complete disaster recovery strategy. This paper shows some examples of how these features can be combined. It is not comprehensive nor does it reflect features offered only in the latest releases.


      This article will cover three perspectives. First, a quick overview of the metrics used by businesses to measure the quality of their Disaster Recovery strategies will be covered. Next, an overview of how to combine the features that MarkLogic offers in various categories will be given.

      More?: High Availability and Disaster Recovery features ,  High Availability & Disaster Recovery datasheetScalability, Availability, and Failover Guide 

      Disaster Recovery Criteria

      In order to configure MarkLogic Server to perform well in Disaster Recovery situations, we should first define what parameters we will use to measure each possible approach. For most situations, these four measures are used: 

      Long Term Retention Policy (LTR): Long Term Retention Policy can be driven by any number of business, regulatory and other criteria. It is included here because MarkLogic's backup files are often a key part of an LTR strategy. 

      Recovery Point Objective (RPO)The requirement for how up-to-date the database has to be post-recovery with respect to its state immediately before the incident that required recover.

      Recovery Time Objective (RTO)The requirement for the time elapsed between the incident and the recovery to the RPO.

      CostThe storage cost, the computational resource cost and  the operations cost of the overall deployment strategy.

      Flexible Replication Features

      Flexible replication can be used to support LTR objectives but is generally not useful for Disaster Recovery

      More? Flexible Replication Guide

      Platform Support Features

      Flash backup provides a way to leverage backup features of your deployment platform while maintaining transaction integrity. Platform specific solutions can often achieve RPO and RTO targets that would be impossible through other means.

      More? Flash Backup

      High Availability Features

      Forest replication provides recovery from host failures.

      More? Scalability, Availability, and Failover Guide

      Disaster Recovery Features

      Database Replication

      Database Replication is the process of maintaining copies of forests on databases in multiple MarkLogic Server clusters.

      More? Understanding Database Replication


      Of all your backup options, full backups restore the quickest, but take the most time to backup and possibly the most storage space. Each full backup is a backup set in that it contains everything you need to restore to the point of the backup.

      Full backups with journal archiving allow restores to a point after the backup, but the journal archive grows in an unbounded way with the number of transactions, and replaying the journals to get to your recovery point takes time proportional to the number of transactions in the journal archive, so over time, this becomes less efficient.

      With full + incremental backups, a backup set is a full backup, plus the incremental backups taken after that full backup. Incremental backups are quick to backup, but take longer to restore, and over time the backup set gets larger and larger, so it may end up consuming more backup space than a full backup alone (depending on your backup retention policy).

      Full + incremental backups with journal archiving have the same characteristics as incremental backups, except that you can roll forward from the most recent incremental. With this strategy, the journal archive doesn't grow in an unbounded way because the archive is purged when you take the next incremental backup. Note that if your RPO is between incremental backups, you must also enable a merge timestamp by setting the merge timestamp to a negative value (see below).

      More?: Administrator’s Guide to Backing Up and Restoring a Database  How does "point-in-time" recovery work with Journal Archiving? 

      Forest Merge Configurations

      Forest merges recover the disk space occupied by deleted documents. A negative merge timestamp delays that permanent deletion. If we want incremental backups to contain all the fragments that were deleted since the last incremental backup then we want to set the delay to a period greater than the incremental backup period. This requires more disk space for the incremental backups and also requires additional space in the live database, but provides the most flexibility.

      Setting retain-until-backup on a given database (thru the Admin UI or thru an API call) has a similar effect by telling the server to keep the deleted fragments until a full backup or an incremental backup completes. Many clients choose to use both the negative merge timestamp and retain until backup options together.

      More?: admin:database-set-merge-timestamp  admin:database-set-retain-until-backup

      Other Features

      The need for a negative merge timestamp can be understood by remembering that forest merges recover the disk space occupied by deleted documents. A negative merge timestamp delays that permanent deletion. If we want incremental backups to contain all the fragments that were deleted since the last incremental backup then we want to set the delay to a period greater than the incremental backup period. This requires more disk space for the incremental backups and also requires additional space in the live database, but provides the most flexibility.

      Setting retain-until-backup on a given database (thru the Admin UI or thru an API call) has a similar effect by telling the server to  keep the deleted fragments until a full backup or an incremental backup. Many clients choose to use both the negative merge timestamp and retain until backup options together.

      More?: admin:database-set-merge-timestamp,  admin:database-set-retain-until-backup 


      Planning to meet a Long Term Retention (LTR) policy, a Recovery Point Objective (RPO) and a Recovery Time Objective (RTO) and a Cost goal is a key part of developing an overall MarkLogic deployment plan. MarkLogic offers a wealth of tools that can complement each other when they are properly coordinated. As is clear from this article, the choices are many, broad, and interrelated.

      Regardless of the server version, MLCP does not support concurrent jobs if they are importing from/exporting to the same file.

      In general, MLCP jobs will perform best by maximizing the number of threads in a single MLCP job. Before 10.0-4.2, each MLCP job used 4 threads by default. Starting in 10.0-4.2, each MLCP job now uses the maximum number of threads available on the server as the default thread count (you can read more about this change in the 10.0-4.2 release notes).


      In the more recent versions of MarkLogic Server, there are checks in place to prevent the loading of invalid documents (such as documents with multiple root nodes).  However, documents loaded in earlier versions of MarkLogic Server can now result in duplicate URI or duplicate document errors being reported.

      Additionally, under normal operating conditions, a document/URI is saved in a single forest. If somehow the load process gets compromised, then user may see issues like duplicate URI (i.e. same URI in different forests) and duplicate documents (i.e. same document/URI in same forest).


      If the XDMP-DBDUPURI (duplicate URI) error is encountered, refer to our KB article "Handling XDMP-DBDUPURI errors" for procedures to resolve.

      If one doesn't see XDMP-DBDUPURI errors but running fn:doc() on a document returns multiple nodes then it could be a case of duplicate document in same forest.

      To check that the problem is actually duplicate documents, one can either do an xdmp:describe(fn:doc(...)) or fn:count(fn:doc((...)). If these commands return more than 1 e.g. xdmp:describe(fn:doc("/testdoc.xml")) returns (fn:doc("/testdoc.xml"), fn:doc("/testdoc.xml")) or fn:count(fn:doc("/testdoc.xml")) returns 2 then the problem is of duplicate documents in the same forest (and not duplicate URIs).

      To fix duplicate documents, the document will need to be reloaded.

      Before reloading, you can take a look at the two version to see if there is a difference.  Check fn:doc("/testdoc.xml")[1] versus fn:doc("/testdoc.xml")[2] to see if there is a difference, and which one you want to reload.

      If there is a difference, that may also that may point the operation that created the situation.


      This article talks about effects of case sensitivity of search term on search score and thus on final order of search results for a secondary query which is using cts:boost-query and weight. The case-insensitive word term is treated as the lower case word term, so there can be no difference in the frequencies and scores of results for any-case/case-insensitive search term and lowercase search term with “case-sensitive” option or when neither "case-sensitive" nor "case-insensitive" is present. If neither "case-sensitive" nor "case-insensitive" is present, text of search term is used to determine case sensitivity.

      Understanding relevance score

      In MarkLogic Search results are returned in a relevance order. The most relevant results are first in result sequence and least relevant are last.
      More details on relevance score and its calculation are available at,

      Of many ways to control this relevance score one way is to use a secondary query to boost relevance score, . This article takes advantage of examples using secondary query to boost relevance scores and impact of text case (upper, lower or unspecifed) of search terms on relevance score on order of results returned.

      A few examples to understand this scenario

      Consider a few scenarios where below mentioned queries are trying to boost certain search results up using cts:boost-query and weight for word "washington" in returned results.

      Example 1: Search with lowercase search term and option for case not specified

      xquery version "1.0-ml";
      declare namespace html = "";

      for $hit in
      ( cts:search(

      cts:boost-query(cts:element-word-query(xs:QName("test"),"George" ),
      cts:element-word-query(xs:QName("test"),"washington",(), 10.0) )

      return element hit {
      attribute score { cts:score($hit) },
      attribute fit { cts:fitness($hit) },
      attribute conf { cts:confidence($hit) },

      Results for Query1:
      <hit score="28276" fit="0.9393904" conf="0.2769644">
      <test>Washington, George... </test>
      <hit score="16268" fit="0.7125317" conf="0.2100787">
      <test>George washington was the first President of the United States of America...</test>


      Example 2: Search with lowercase search term and case-sensitive option

      xquery version "1.0-ml";
      declare namespace html = "";

      for $hit in
      ( cts:search(

      cts:boost-query(cts:element-word-query(xs:QName("test"),"George" ),
      cts:element-word-query(xs:QName("test"),"washington",("case-sensitive"), 10.0) )

      return element hit {
      attribute score { cts:score($hit) },
      attribute fit { cts:fitness($hit) },
      attribute conf { cts:confidence($hit) },

      Results for Query2:
      <hit score="28276" fit="0.9393904" conf="0.2769644">
      <test>Washington, George... </test>
      <hit score="16268" fit="0.7125317" conf="0.2100787">
      <test>George washington was the first President of the United States of America...</test>


      Example 3: Search with uppercase search term and option case-insensitive, in cts:boost-query like below with rest of query similar to above queries


      cts:boost-query(cts:element-word-query(xs:QName("test"),"George" ),
      cts:element-word-query(xs:QName("test"),"Washington",("case-insensitive"), 10.0) )

      Results for Query3:
      <hit score="28276" fit="0.9393904" conf="0.2769644">
      <test>Washington, George... </test>
      <hit score="16268" fit="0.7125317" conf="0.2100787">
      <test>George washington was the first President of the United States of America...</test>

      Clearly above queries are producing the same scores with same fitness and confidence scores. This is because the case-insensitive word term is treated as the lower case word term, so there can therefore be no difference in the frequencies of those two terms (any-case/case-insensitive and lowercase/case-sensitive), and therefore no difference in scoring. Thus no difference in scores of results for Query3 and Query2.
      And for cases where case sensitivity is not specified, text of search term is used to determine case sensitivity. For Query3 text of search term contains no uppercase hence it treated as "case-insensitive".


      Now let us now take look at a query with a word with uppercase and case-sensitive option in query.

      Example 4: Search with uppercase search term and option case-sensitive, in cts:boost-query like below with rest of query similar to above queries


      cts:boost-query(cts:element-word-query(xs:QName("test"),"George" ),
      cts:element-word-query(xs:QName("test"),"Washington",("case-sensitive"), 10.0) )

      Results for Query4:
      <hit score="44893" fit="0.9172696" conf="0.3489831">
      <test>Washington, George was the first... </test>
      <hit score="256" fit="0.0692672" conf="0.0263533">
      <test>George washington was the first President of the United States of America...</test>


      As we can clearly see the scores are changed for results for Query4 and thus final order of results is also updated.


      While using a secondary query having cts:boost-query and weight, to boost certain search results up, it is important to understand the impact of case of search text on result sequence. A case-insensitive word term is treated as the lower case word term, so there can therefore be no difference in the frequencies of any-case/case-insensitive and lowercase/case-sensitive search terms, and therefore no difference in scoring. For search term with upper case alphabets in text and with “case-sensitive” option scores are boosted up as expected in comparison with a “case-insensitive search”. If neither "case-sensitive" nor "case-insensitive" is present, text of search term is used to determine case sensitivity. If text of search term contains no uppercase, it specifies "case-insensitive". If text of search term contains uppercase, it specifies "case-sensitive".



      MarkLogic Server includes element level security (ELS), an addition to the security model that allows you to specify security rules on specific elements within documents. Using ELS, parts of a document may be concealed from users who do not have the appropriate roles to view them. ELS can conceal the XML element (along with properties and attributes) or JSON property so that it does not appear in any searches, query plans, or indexes - unless accessed by a user with appropriate permissions.

      ELS protects XML elements or JSON properties in a document using a protected path, where the path to an element or property within the document is protected so that only roles belonging to a specific query roleset can view the contents of that element or property. You specify that an element is part of a protected path by adding the path to the Security database. You also then add the appropriate role to a query roleset, which is also added to the Security database.

      ELS uses query rolesets to determine which elements will appear in query results. If a query roleset does not exist with the associated role that has permissions on the path, the role cannot view the contents of that path.


      1. A user with admin privileges can access documents with protected elements by using fn:doc to retrieve documents (instead of using a query). However, to see protected elements as part of query results, even a user with admin privileges will need to have the appropriate role(s).
      2. ELS applies to both XML elements and JSON properties; so unless spelled out explicitly, 'element' refers to both XML elements and JSON properties throughout this article.

      You can read more about how to configure Element Level Security here, and can see how this all works at this Element Level Security Example.


      One of the commonly used document level capabilities is 'update'. Be aware, however, that document level update is too powerful to be used with ELS permissions as someone with document level update privileges could update not only a node, but also delete the whole document. Consequently, a new document-level capability - 'node-update' - has been introduced. 'node-update' offers finer control when combined with ELS through xdmp:node-replace and xdmp:node-delete functions as they can be used to update/delete only the specified nodes of a document (and not the document itself in its entirety).

      Document-level vs Element-level security

      Unlike at the document-level:

      • 'update' and 'node-update' capabilities are equivalent at the element-level. However, at the document-level, if a user only has a 'node-update' privilege to a document, you cannot delete the document. In contrast, 'update' privileges allows that user to delete the document
      • 'Read', 'insert' and 'update' are checked separately at the element level i.e.:
        • read operations - only permissions with 'read' capability are checked
        • node update operations - only permissions with 'node-update' (update) capability are checked
        • node insert operations - only permissions with  'insert' capability are checked

      Note: read, insert, update and node-update can all be used at the element-level i.e., they can be part of the protected path definition.



      1. update: A node can be updated by any user that has an 'update' capability at the document-level
      2. node-update:  A node can be updated by any user with a 'node-update' capability as long as they have sufficient privileges at the element-level


      1. If a node is protected but no 'update/node-update' capabilities are explicitly granted to any user, that node can be updated by any user as long as they have 'update/node-update' capabilities at the document-level
      2. If any user is explicitly granted 'update/node-update' capabilities to that node at the element level, only that specific user is allowed to update/delete that node. Other users who are expected to have that capability must be explicitly granted that permission at the element level

      How does node-replace/node-delete work?

      When a node-replace/node-delete is called on a specific node:

      1. The user trying to update that node must have at least a 'node-update' (or 'update') capability to all the nodes up until (and including) the root node
      2. None of the descendant nodes of the node being replaced/deleted can be protected by a different roles. If they are protected:
        1. 'node-delete' isn’t allowed as deleting this node would also delete the descendant node which is supposed to be protected
        2. 'node-replace' can be used to update the value (text node) of the node but replacing the node itself isn’t allowed

      Note: If a caller has the 'update' capability at the document level, there is no need to do element-level permission checks since such a caller can delete or overwrite the whole document anyway.


      1. 'node-update' was introduced to offer finer control with ELS, in contrast to the document level 'update'
      2. 'update' and 'node-update' permissions behave the same at element-level, but differently at the document-level
        1. At document-level, 'update' is more powerful as it gives the user the permission to delete the entire document
        2. All permissions talk to each other at document-level. In contrast, permissions are checked independently at the element-level
          1. At the document level, an update permission allows you to read, insert to and update the document
          2. At the element level, however, read, insert and update (node-update) are checked separately
            1. For read operations, only permissions with the read capability are checked
            2. For node update operations, only permissions with the node-update capability are checked
            3. For node insert operations, only permissions with the insert capability are checked (this is true even when compartments are used).
      3. Can I use ELS without document level security (DLS)?
        1. ELS cannot be used without DLS
        2. Consider DLS the outer layer of defense, whereas ELS is the inner layer - you cannot get to the inner layer without passing through the outer layer
      4. When to use DLS vs ELS?
        1. ELS offers finer control on the nodes of a document and whether to use it or not depends on your use-case. We recommend not using ELS unless it is absolutely necessary as its usage comes with serious performance implications
        2. In contrast, DLS offers better performance and works better at scale - but is not an ideal choice when you need finer control as it doesn’t allow node-level operations 
      5. How does ELS performance scale with respect to different operations?
        1. Ingestion - depends on the number of protected paths
          1. During ingestion, the server inspects every node for ELS to do a hash lookup against the names of the last steps from all protected paths
          2. For every protected path that matches the hash, the server does a full test of the node against the path - the higher the number of protected paths, the higher the performance penalty
          3. While the hash lookup is very fast, the full test it comparatively much slower - and the corresponding performance penalty increases when there are a large number of nodes that match the last steps of the protected paths
            1. Consequently, we strongly recommend avoiding the use of wildcards at the leaf-level in protected paths
            2. For example: /foo/bar/* has a huge performance penalty compared to /foo/*/bar
        2. Updates - as with ingestion, ELS performance depends on the number of protected paths
        3. Query/Search - in contrast to ELS ingestion or update, ELS query performance depends on the number of query rolesets
          1. Because ELS query performance depends on the number of query rolesets, the concept of Protected PathSet was introduced in 9.0-4
          2. A Protected PathSet allows OR relationships between permissions on multiple protected paths that cover the same element
          3. Because query performance depends on the number of relevant query rolesets, it is highly recommended to use helper functions to obtain the query rolesets of nodes configured with element-level security

      Further Reading


      Some customers have reported problems when attempting to access the Configuration Manager application. In the past, this has been attributed to part of the upgrade process failing for some reason (for example: a port required by MarkLogic already being used) or in some cases it was due to a default databases being removed by the customer at some previous stage.

      XDMP-ARGTYPE Error

      If you see this error when you attempt to access the Configuration Manager:

      500 Internal Server Error XDMP-ARGTYPE XDMP-ARGTYPE: (err:XPTY0004) fn:concat( "could not initialize management plugins with scope: ", $reut:PLUGIN-SCOPE, ": ", xdmp:quote($e)) -- arg1 is not of type xs:anyAtomicType?

      Resolving the error

      Ensure you have an Extensions database configured by doing the following:

      • Log into the MarkLogic Admin interface on port 8001 - http://[your-host]:8001/
      • Under "Databases" box, ensure a database called Extensions is listed

      If it does not exist, download and run the script attached to this article (create-extensions-db.xqy).


      Does MarkLogic provide encryption at rest?

      MarkLogic 9

      MarkLogic 9 introduces the ability to encrypt 'data at rest' - data that is on media (on disk or in the cloud), as opposed to data that is being used in a process. Encryption can be applied to newly created files, configuration files, or log files. Existing data files can be encrypted by triggering a merge or re-index of the data.

      For more information about using Encryption at Rest, see Encryption at Rest in the MarkLogic Security Guide.

      MarkLogic 8 and Earlier releases

      MarkLogic 8 does not provide support for encryption at rest for its own forests.

      Memory consumption

      Memory consumption patterns will be different when encryption is used:

      • To access unencrypted forest data MarkLogic normally uses memory-mapped files. When files are encrypted, MarkLogic instead decrypts the entire index to anonymous memory.
      • As a result, encrypted MarkLogic forests use more anonymous memory and less file-mapped memory than unencrypted forests.  
      • Without encryption at rest, when available memory is low, the operating system can throw out file pages from the working set and later page them in directly from files.  But with encryption at rest, when memory is low, the operating system must write them to swap.

      Using Amazon S3 Encryption For Backups

      If you are hosting your data locally, would like to back up to S3 remotely, and your goal is that there cannot possibly exist unencrypted copies of your data outside your local environment, then you could backup locally and store the backups to S3 with AWS Client-Side encryption. MarkLogic does not support AWS Client-Side encryption, so this would need to be a solution outside MarkLogic.

      See also: MarkLogic documentation: S3 Storage.

      See also: AWS: Protecting Data Using Encryption.


      Here we compare XDBC servers and the Enhanced HTTP server in MarkLogic 8.


      XDBC servers are still fully supported in MarkLogic Server version 8. You can upgrade existing XDBC servers without making any changes and you can create new XDBC servers as you did in previous releases.

      The Enhanced HTTP Server is an additional feature on HTTP servers which is protocol and binary transport compatible with XCC clients, as long as you use the xcc.httpcompliant=true system property.

      The XCC protocol is actually just HTTP, but the details of how to handle body, headers, responses, etc., are "built in" to the XCC client libraries and the XDBC server. The HTTP server in MarkLogic 8 now shares the same low-level code and can dispatch XCC-like requests.


      This article talks about best practices for use of external proxies vs using rewriter rules in the Enhanced HTTP server.


      Whether to use external proxies versus using rewriter rules in the Enhanced HTTP application server is an application design tradeoff not dissimilar to using a single HTTP application server and a XQuery rewriter or endpoint that can dynamically dispatch to different databases and modules (using eval-in).  The Enhanced HTTP server does this type of dispatching much more efficiently, but the concept is similar, with the same pros and cons.

      It is mostly an application and business management issue—by sharing the same port you share the same server configuration (authentication, server settings) and the "outside world" only sees one port, so configuring port-based security on firewalls, routers, or load balancers is more difficult.


      A forest reindex timeout error may occur when there are transactions holding update locks on documents for an extended period of time. A reindexer process is started as a result of a database index change or a major MarkLogic Server upgrade.  The reindexer process will not complete until after update locks are released.

      Example error text seen in the MarkLogic Server ErrorLog.txt file:

      XDMP-FORESTERR: Error in reindex of forest Documents: SVC-EXTIME: Time limit exceeded


      Long running transactions can occur if MarkLogic Server is participating in a distributed transaction environment. In this case transactions are managed through a Resource Manager. Each transaction is executed in a two phase commit. In the first phase, the transaction will be prepared for a commit or a rollback. The actual commit or rollback will occur in the second phase. More details about XA transactions can be found in the Applicactions Developer Guide - Understanding Transactions in MarkLogic Server

      In a situation where the Resource Manager get's disconnected between the two phases, all transactions may be left in a "prepare" state within MarkLogic Server. The Resource Manager maintains transaction information and will clean up transactions left in "prepare" state after a successful reconnect. In the rare case where this doesn't happen, all transactions left in "prepare" state will stay in the system until they are cleaned up manually. The method to manually intervene is described in the XCC Developers Guide - Heuristically Completing a Stalled Transaction.

      In order for a XA transaction to take place, it needs to prepare the execution for the commit. If updates are being made to pre-existing documents, update locks are held against the URIs for those documents. When reindexing is occuring during this process, the reindexer will wait for these locks to be released before it can successfully reindex the new documents.   Because the reindexer is unable to complete due to these pending XA transactions, the hosts in the cluster are unable to completely finish the reindexing task and will eventually throw a timeout error.


      To avoid these kind of reindexer timeouts, it is recommended that the database is checked for outstanding XA transactions in "prepare" state before starting a reindexing process. There are two ways to verify if the database has outstanding transactions in "prepare" state:

      • In the Admin UI, navigate  to each forest of the database and review the status page; or
      • Run the following XQuery code (in Query Console):

        xquery version "1.0-ml"; 
        declare namespace fo = "";   

        for $f in xdmp:database-forests(xdmp:database()) 
          xdmp:forest-status($f)//fo:transaction-coordinator[fo:decision-state = 'prepare']

      In the case where there are transactions in the "prepare" state, a roll-back can be executed:

      • In the Admin UI, click on the "rollback" link for each transaction; or
      • Run the following XQuery code (in Query Console):

        xquery version "1.0-ml"; 
        declare namespace fo = "";

        for $f in xdmp:database-forests(xdmp:database()) 
          for $id in xdmp:forest-status($f)//fo:transaction-coordinator[fo:decision-state = 'prepare']/fo:transaction-id/fn:string()
            xdmp:xa-complete($f, $id, fn:false(), fn:false())


      Query Console is an interactive web-based query development tool for writing and executing ad-hoc queries in XQuery, Server-Side JavaScript, SQL and SPARQL. Query Console enables you to quickly test code snippets, debug problems, profile queries, and run administrative XQuery scripts.  Query Console uses workspaces to assist users with organizing queries.  A user can have multiple workspaces, and each workspace can have multiple queries.


      In MarkLogic Server v9.0-11, v10.0-3 and earlier releases, users may experience delays, lag or latency between when a key is pressed on the keyboard, and when it appears in the Query Console query window.  This typically happens when there are a large number of queries in one of the users workspaces.


      A workaround to improve performance is to reduce the number of queries in each workspace.  The same number of queries can be managed by increasing the number of workspaces and reducing the number of queries in each workspace.  We suggest keeping no more than 30 queries in a workspace to avoid these latency issues.  

      The MarkLogic Development team is looking to improve the performance of Query Console, but at the time of this writing, this performance issue has not yet been resolved. 

      Further Reading

      Query Console User Guide


      Users of Java based batch processing applications, such as CoRB, XQSync, mlcp and the hadoop connector may have seen an error message containing "Premature EOF, partial header line read". Depending on how exceptions are managed, this may cause the Java application to exit with a stacktrace or to simply output the exception (and trace) into a log and continue.

      What does it mean?

      The premature EOF exception generally occurs in situations where a connection to a particular application server connection was lost while the XCC driver was in the process of reading a result set. This can happen in a few possible scenarios:

      • The host became unavailable due to a hardware issue, segfault or similar issue;
      • The query timeout expired (although this is much more likely to yield an XDMP-EXTIME exception with a "Time limit exceeded" message);
      • Network interruption - a possible indicator of a network reliability problem such as a misconfigured load balancer or a fault in some other network hardware.

      What does the full error message look like?

      An example:

      INFO: completed 5063408/14048060, 103 tps, 32 active threads
       Feb 14, 2013 7:04:19 AM com.marklogic.developer.SimpleLogger logException
       SEVERE: fatal error
       com.marklogic.xcc.exceptions.ServerConnectionException: Error parsing HTTP
       headers: Premature EOF, partial header line read: ''
       [Session: user=admin, cb={default} [ContentSource: user=admin,
       cb={none} [provider: address=localhost/, pool=0/64]]]
       [Client: XCC/4.2-8]
       at com.marklogic.xcc.impl.SessionImpl.submitRequest(
       at Source)
       at Source)
       at java.util.concurrent.FutureTask$Sync.innerRun(
       at java.util.concurrent.Executors$
       at java.util.concurrent.FutureTask$Sync.innerRun(
       Caused by: Error parsing HTTP headers: Premature EOF,
       partial header line read: ''
       at com.marklogic.http.HttpHeaders.nextHeaderLine(
       at com.marklogic.http.HttpHeaders.parseResponseHeaders(
       at com.marklogic.http.HttpChannel.parseHeaders(
       at com.marklogic.http.HttpChannel.receiveMode(
       at com.marklogic.http.HttpChannel.getResponseCode(
       ... 11 more
       2013-02-14 07:04:19.271 WARNING [12] (AbstractRequestController.runRequest):
       Cannot obtain connection: Connection refused

      Configuration / Code: things to try when you first see this message

      A possible cause of errors like this may be due to the JVM starting garbage collection and this process taking long enough as to exceed the server timeout setting. If this is the case, try adding the -XX:+UseConcMarkSweepGC java option

      Setting the "keep-alive" value to zero for the affected XDBC application server will disable socket pooling and may help to prevent this condition from arising; with keep-alive set to zero, sockets will not be re-used. With this approach, it is understood that disabling keep-alive should not be expected to have a significant negative impact on performance, although thorough testing is nevertheless advised.


      Here we discuss various methods for sharing metering data with Support:  telemetry in MarkLogic 9 and exporting monitoring data.



      In MarkLogic 9, enabling telemetry collects, encrypts, packages, and sends diagnostic and system-level usage information about MarkLogic clusters, including metering, with minimal impact to performance. Telemetry sends information about your MarkLogic Servers to a protected and secure location where it can be accessed by the MarkLogic Technical Support Team to facilitate troubleshooting and monitor performance.  For more information see Telemetry.

      Meters database

      If telemetry is not enabled, make sure that monitoring history is enabled and data has been collected covering the time of the incident.  See Enabling Monitoring History on a Group for more details.

      Exporting data

      One of the attached scripts can be used in lieu of a Meters database backup. They will provide the raw metering XML files from a defined period of time and can be reloaded into MarkLogic and used with the standard tools.


      This XQuery export script needs to be executed in Query Console against the Meters database and will generate zip files stored in the defined folder for the defined period of time.

      Variables for start and end times, batch size, and output directory are set at the top of the script.

      This bash version will use MLCP to perform a similar export but requires an XDBC server and MLCP installed. By default the script creates output in a subdirectory called meters-export. See the attached script for details. An example command line is

      ./ localhost admin admin "2018-04-12T00:00:00" "2018-04-14T00:00:00"

      Backup of Meters database

      A backup of the full Meters database will provide all the available raw data and is very useful, but is often very large and difficult to transfer, so an export of a defined time range is often requested.

      The best available way for exporting triples from MarkLogic at the moment is via the /v1/rows (POST) endpoint. You can  make an HTTP POST call to this endpoint to which you would attach an op.fromSPARQL query to return the desired set of triples and select the output format (Accept request header or row-format? url parameter) of your choice.

      • Sample op.fromSPARQL payload:
        • op.fromSPARQL('SELECT * FROM <collection_name> WHERE {?s ?p ?o.}')
      • Sample curl POST command:
        • curl --anyauth --user <user_name>:<password> -i -X POST  -H "Content-type: application/vnd.marklogic.querydsl+javascript" -H "Accept: <output_type>" http://<host_name>:8000/v1/rows -d @./<payload_file_name>

      Alternative ways to export triples:

      • MLCP currently doesn’t offer a way to export triples but if you are okay with exporting them as XML files (through a collection name - for managed triples, graph name can be used as a collection name), you can do so by exporting those documents as files through MLCP
        • Note:  If you are working with embedded/unmanaged triples, there is a possibility of the resulting XML files consisting of XML elements that are not triples if you go with this alternative
      • You can also use the /v1/graphs endpoint to export triples but this endpoint only returns managed triples and if you need to export unmanaged triples, this is not an option
        • Note: This is not efficient in terms of performance when working with large sets of data

      Further reading:


      Within a MarkLogic deployment, there can be multiple primary and replica objects. Those objects can be forests in a database, databases in a cluster, nodes in a cluster, and even clusters in a deployment. This article walks through several examples to clarify how all these objects hang together.

      Shared-disk vs. Local-disk failover

      Shared-disk failover requires a shared filesystem visible to all hosts in a cluster, and involves one copy of a data forest, managed by either its primary host, or its failover host (so forest1, assigned to host1, failing over to host2).

      Local-disk failover involves two copies of data in a primary and local disk failover replica forest (sometimes referred to as an "LDF"). Primary hosts manage primary forests, and failover hosts manage the corresponding synchronized LDF (forest1 on host1, failing over to replicaForest1 on host2).

      Database Replication

      In the same way that you can have multiple copies of data within a cluster (as seen in local-disk failover), you can also have multiple copies of data across clusters as seen in either database replication or flexible replication. Within a replicated environment you'll often see reference to primary/replica databases or primary/replica clusters. So this will often look like forest1 on host1 in cluster1, replicating to forest1 on host1 in cluster2. We can shorten forest names here to c1.h1.f1 and c2.h1.f1. Note that you can have both local disk failover and database replication going at the same time - so on your primary cluster, you'll have c1.h1.f1, as well as failover forest c1.h2.rf1, and your replica cluster will have c2.h1.f1, as well as its own failover forest c2.h2.rf1. All of these forest copies should be synchronized both within a cluster (c1.h1.f1 synced to c1.h2.rf1) and across clusters (c1.h1.f1 synced to c2.h1.f1).

      Configured/Intended vs. Acting

      At this point we've got two clusters, each with at least two nodes, where each node has at least one forest - so four forest copies, total (bear in mind that databases can have dozens or even hundreds of forests - each with their own failover and replication copies). The "configured" or "intended" arrangement is what your deployment looks like by design, when no failover or any other kind of events have occurred that would require one of the other forest copies to serve as the primary forest. Should a failover event occur, c1.h2.rf1 will transition from the intended LDF to the acting primary, and its host c1.h2 will transition from the intended failover host to the acting primary host. At this point, the intended primary forest c1.h1.f1 and its intended primary host c1.h1 will likely both be offline. Failing back is the process of reverting hosts and forests from their acting arrangement (in this case, acting primary forest c1.h2.rf1 and acting primary host c1.h2), back to their intended arrangement (c1.h1.f1 is both intended and acting primary forest, c1.h1 is both intended and acting primary host).

      This distinction between intended vs. acting can even occur at the cluster level, where c1 is the intended/acting primary cluster, and c2 is the intended/acting replica cluster. Should something happen to your primary cluster c1, the intended replica cluster c2 will transition to the acting primary cluster while c1 is offline.


      • It's possible to have multiple copies of your data in a MarkLogic Server deployment
      • Under normal operations, these copies are synchronized with one another
      • Should failover events occur in a cluster, or catastrophic events occur to an entire cluster, you can shift traffic to the available previously synchronized copies
      • Failing back to your intended configuration is a manual operation
        • Make sure to re-synchronize copies that were offline with online copies
        • Shifting previously offline copies to acting primary before re-synchronization may result in data loss, as offline forests can overwrite updates previously committed to forests serving as acting primaries while the intended primary forests were offline


      To avoid index bloat, MarkLogic only records positions in its indexes for words once for word-query fields. When word positions are necessary to accurately match element-word queries, they are normally used from the word-query field. When elements are excluded from the word query field, words under those elements are not indexed - so their positions are not recorded. In MarkLogic 7.0-5 and 8.0-1, a code change was included to avoid false negatives resulting from an element-word query expecting positions from words in elements descended from excluded elements. This code change was to not use positions from the word-query field for element-word searches if the word-query field has exclusions.


      Unfortunately, this solution can sometimes result in false positives - which is captured in 7.0-5 bug #33207 and 8.0-1 bug #32686 (you can read more about both of these bugs in our Fixed Bugs Report). Consequently, a follow-up refinement was shipped in 7.0-5.1 & 8.0-2 to allow for the affected queries to be fully resolveable via indexes. To take advantage of this update, three changes are required:

      1) Upgrade to 7.0-5.1 or later, or 8.0-2 or later

      2) Database index settings must be updated to tell MarkLogic Server to use positions in this scenario and therefore avoid the previously seen false positives. There are two changes that could be made. Either:

      2a. The element in the element-word query must be explicitly included in the word-query field


      2b. All the word-query excluded elements must be configured as phrase-around elements.

      3) After the relevant database index settings are updated and the upgrade has been applied, a reindex must be performed

      If these changes are made, positions in the word-query field should then be used, which should then ultimately result in the elimination of false positives.


       A "fast data directory" is configurable for each forest, and can be set to a directory built on a fast file system, such as one using SSDs. Refer to Using a mix of SSD and spinning drives. If configured MarkLogic Server will try to put as many writes and seeks to the Fast Data Directory (FDD) as it can. As such, it will try to put as many on disk stands as possible onto the FDD. Frequently updated documents tend to reside in the smaller stands and thus are more likely to reside on the FDD.

      This article attempts to explain how you should account for the FDD when sizing disk space for your MarkLogic Server.


      Forest journals will be placed on the fast data directory. 

      Each time an automatic merge is performed, MarkLogic Server will attempt to save the results onto the forest's fast data directory. If there is not sufficient space on the FDD, MarkLogic Server will use the forest's primary data directory. To preserve space for future small stands, MarkLogic Server is conservative in deciding whether to put the merge destination stands on the FDD, which means that even if there is enough available space, it may store the result to the forests regular data directory. For more details, refer to the fundamental of resource consumption white paper. 

      It is also important to know when the Fast Data Directory is not used: Stands created from a manually triggered merges do not get stored on the fast data directory, but in the forest's primary data directory. Manual merges can be executed by calling the xdmp:merge function or from within the Admin UI; Forest-migrate  and Restoring backups do not put stands in the fast data directory.


      MarkLogic Server maintains some disk space in the FDD for checkpoints and journaling. However, since the Fast Data Directory is not used in some procedures, we should not count the size of the FDD when sizing the disk space needed for forest data.


      The Performance Considerations section of the Loading Content Into MarkLogic Server documentation states 

      "When you load content, MarkLogic Server performs updates transactionally, locking documents as needed and saving the content to disk in the journal before the transaction commits. By default, all documents are locked during an update and the journal is set to preserve committed transactions, even if the MarkLogic Server process ends unexpectedly."

      There are two types of locking which are specified at the database level:

      • Fast locking employs a hashed locking scheme (based on the URI) where each fragment URI has a designated forest, so the lock created during the insert is restricted only to that forest.
      • Setting up a database with "strict" locking will force the coordination of an update lock across all forests in the database (and across the cluster) until the insert has taken place.

      Fast locking has been the default setting for newly created MarkLogic databases since MarkLogic 5 (released October 2011)

      When should I use strict locking?

      If at any point in your code, you are specifying the forest to insert document or fragment into (using a technique commonly referred to as in-forest evaluation), configuring the setting for that database at "strict" is definitely the safest choice. If your code always allows the server to determine the target forest for the document/fragment, you're perfectly safe using fast locking.

      In the situation where two different people create the same document (with the same URI) and where fast locking was taking place, this would result in:

      • A transaction culminating in an insert into a given forest (as assigned by the ML node servicing the request) for the first fragment
      • An "update" transaction (in the same forest) where the first fragment is then marked as deleted
      • A new fragment takes place of the first fragment to complete the second transaction

      Subsequent merges would then remove the stand entry for the first fragment (now deleted/replaced by the subsequent transaction)

      The fast option would not create a dangerous race condition unless your application would allow two different people to insert a document with the same URI into two different forests as two separate transactions and where URI assignment is handled by your XQuery/application layer; if the code responsible for making those transactions were to inadvertently assign the same URI to two different forests in a cluster, this could cause a problem that strict locking would guard against. If your application always allows MarkLogic to assign the forest for the document, there is no danger whatsoever in keeping to the server default of "fast" locking.

      Additionally - consider what kind of failover you system is using. When using fast journaling with local disk replication, the journal disk write needs to fail on both master and replica nodes in order for data loss to occur - so there's no need for strict in this scenario. In contrast, strict journaling should be used with shared-disk failover, as data loss is possible if using fast journaling and a single node fails before the OS flushes the buffer to disk.

      Is there a performance implication in switching to strict locking?

      Fast locking will be faster than strict locking, but the performance penalty is largely going to be dependent on a number of factors; the number of forests in a given database, the number of nodes across which the database forests are spread and the speed at which all nodes in the cluster can coordinate a transaction across the cluster (Network/IO) will all have some (potentially minimal) impact.

      If the conditions of your application suit, we recommend staying with the default of fast locking on all your databases.

      There may be reasons for using 'strict' locking - especially if you are considering loading documents using in-forest-evaluation in your code.

      Further reading


      There are situations where the SVC-DIRREM, SVC-DIROPEN and SVC-FILRD errors occur on backups to an NFS mounted drive. This article explains how this condition can occur and describes a number of recommendations to avoid such errors.

      Under normal operating conditions, with proper mounting options for a remote drive, MarkLogic Server does not expect to report SVC-xxxx errors.  Most likely, these errors are a result of improper nfs disk mounting or other IO issues.

      We will begin by exploring methods to narrow down the server which has the disk issue and then list some things to look into in order to identify the cause.

      Error Log and Sys Log Observation

      The following errors are typical MarkLogic Error Log entries seen during an NFS Backup that indicate an IO subsystem error.   The System Log files may include similar messages.

              Error: SVC-DIRREM: Directory removal error: rmdir '/Backup/directory/path': {OS level error message}

              Error: SVC-DIROPEN: Directory open error: opendir '/Backup/directory/path': {OS level error message}

              Error: Backup of forest 'forest-name' to 'Bakup path' SVC-FILRD: File read error: open '/Backup/directory/path': {OS level error message}

      These SVC- error messages include the {OS level error message} retrieved from the underlying OS platform using generic C runtime strerror() system call.  These messages are typically something like "Stale NFS file handle" or "No such file or directory".

      If only a subset of hosts in the cluster are generating these types of errrors ...

      You should compare the problem host's NFS configuration with rest of the hosts in the cluster to make sure all of the configurations are consistent.

      • Compare nfs versions (rpm -qa | grep -i nfs)
      • Compare nfs configurations (mount -l -t nfs, cat /etc/mtab, nfsstat)
      • Compare platform version (uname -mrs, lsb_release -a) 

      NFS mount options 

      MarkLogic recommends the NFS Mount settings - 'rw,bg,hard,nointr,noac,tcp,vers=3,timeo=300,rsize=32768,wsize=32768,actimeo=0'

      • Vers=3 :  Must have NFS client version v3 or above
      • TCP : NFS must be configured to use TCP instead of default UDP
      • NOAC : To improve performance, NFS clients cache file attributes. Every few seconds, an NFS client checks the server's version of each file's attributes for updates. Changes that occur on the server in those small intervals remain undetected until the client checks the server again. The noac option prevents clients from caching file attributes so that applications can more quickly detect file changes on the server.
        • In addition to preventing the client from caching file attributes, the noac option forces application writes to become synchronous so that local changes to a file become visible on the server immediately. That way, other clients can quickly detect recent writes when they check the file's attributes.
        • Using the noac option provides greater cache coherence among NFS clients accessing the same files, but it extracts a significant performance penalty. As such, judicious use of file locking is encouraged instead. The DATA AND METADATA COHERENCE section contains a detailed discussion of these trade-offs.
        • NOTE: The noac option is a combination of the generic option sync, and the NFS-specific option actimeo=0.
      • ACTIME=0 : Using actimeo sets all of acregminacregmaxacdirmin, and acdirmax to the same "0" value. If this option is not specified, the NFS client uses the defaults for each of these options listed above.
      • NOINTR : Selects whether to allow signals to interrupt file operations on this mount point. If neither option is specified (or if nointr is specified), signals do not interrupt NFS file operations. If intr is specified, system calls return EINTR if an in-progress NFS operation is interrupted by a signal.
        • Using the intr option is preferred to using the soft option because it is significantly less likely to result in data corruption.
        • The intr / nointr mount option is deprecated after kernel 2.6.25. Only SIGKILL can interrupt a pending NFS operation on these kernels, and if specified, this mount option is ignored to provide backwards compatibility with older kernels.
      • BG : If the bg option is specified, a timeout or failure causes the mount command to fork a child which continues to attempt to mount the export. The parent immediately returns with a zero exit code. This is known as a "background" mount.
      • HARD (vs soft) : Determines the recovery behavior of the NFS client after an NFS request times out. If neither option is specified (or if the hard option is specified), NFS requests are retried indefinitely. If the soft option is specified, then the NFS client fails an NFS request after retrans retransmissions have been sent, causing the NFS client to return an error to the calling application.
        • Note: A so-called "soft" timeout can cause silent data corruption in certain cases. As such, use the soft option only when client responsiveness is more important than data integrity. Using NFS over TCP or increasing the value of the retrans option may mitigate some of the risks of using the soft option. 

      Issue persists => Further debugging 

      If after checking NFS configuration and after implementing the MarkLogic recommended NFS mount settings, the issue persists, then you will need to debug the NFS connection during an issue period.    You should enable rpcdebug for NFS on the hosts showing the NFS errors, and then analyze the resulting syslogs during a period that is experiencing the issues

              rpcdebug -m nfs -s all

       The resulting logs may give you additional information to help understand what the source of the failures are.



      It has long been possible to store binary files in MarkLogic. In the MarkLogic 5 release in 2011, binary support was enhanced to allow for even more control over binary files.

      The purpose of this Knowledgebase article is not to cover MarkLogic's binary support in depth but to demonstrate a technique for retrieving a list of URIs for binary files which are managed in a MarkLogic Database.

      Retrieving a list of binary document URIs from MarkLogic Server

      The following code will use a call to cts:uris to get back a list of all URIs pointing to binary documents for a given MarkLogic database; note that this example assumes that you have the uri lexicon enabled in your database:

      Further reading

      People often want fine-grained entitlement control in the applications they build on top of MarkLogic Server. This article discusses two options and their performance implications.

      Best Practice

      Often, we'll see people attempt an implementation using MarkLogic users and roles. While MarkLogic Server can easily handle a large number of roles in total, you'll run into scalability and performance issues if you have a large number of roles per user. Additionally, you'll want to minimize the number of updates to documents in your Security database as every update requires Security caches to be re-validated, thus incurring a performance penalty.

      Instead, for a more scalable and performant solution, you will want to build your entitlements into your documents at the application level, then query those entitlement values with element range indexes on the elements containing those entitlement values.


      When attempting to start MarkLogic Server on older versions of Linux (Non-supported platforms), a "Floating Point Exception" may prevent the server from starting.

      Example of the error text from system messages:

      kernel: MarkLogic[29472] trap divide error rip:2ae0d9eaa80f rsp:7fffd8ae7690 error:0


      Older Linux kernels will, by default, utilize older libraries.  When a software product such as MarkLogic Server is built using a newer version of gcc, it is possible that it will fail to execute correctly on older systems.  We have seen it in cases where the glibc library is out of date, and not containing certain symbols that were added in newer versions. Refer to the RedHat bug that explains this issue:

      The recommended solution is to upgrade to a newer version of your Linux distribution.  While you may be able to resolve the immediate issue by only upgrading the glibc library, it is not recommended.


      Attached to this article is an XQuery module: "appserver-status.xqy", which will generate a report on all requests currently "in-flight" across all application servers in your cluster


      Run this in Query Console (be sure to display results as html output), it will generate an html table showing all requests currently "in-flight" across all application servers in your cluster. For any transaction taking over 60 seconds, it provides extra detail to help understand and identify bottlenecks where specific modules (or tasks) may be having an adverse effect on the overall performance of the cluster.

      The information generated by this module can be used in conjunction with any ticket opened with the support team where assistance is required to better understand and resolve performance issues relating to specific modules. This module could also be used in a situation where DBAs want to perform routine health checks on their cluster to find and identify slow running queries.


      At the time of this writing (MarkLogic 9), MarkLogic Server cannot perform spherical queries, as the geospatial indexes do not support a true 3D coordinate system.  In situations where cylindrical queries are sufficient, you can create a 2D geospatial index and a separate range index on an altitude value. An "and-query" with these indexes would result in a cylindrical query.


      Consider the following sample document structure:

      Configure these 2 indexes for your content database:

      1. Geospatial Element Pair index specifying latitude localname as ‘lat’ , longitude localname ‘long’ and ‘parent localname’ as ‘location’ in configuration
      2. Range element index with localname as ‘alt’ with int scalar type

      Assuming you have data in your content database matching above document structure, this query:

      will return all the documents with location i.e., points falling in the cylinder with center at 37.655983, -122.425525 having a radius of 1000 miles and with an altitude of less than 4 miles.

      Note that in MarkLogic Server 9 geospatial region match was introduced, so the above technique can be extended beyond cylinders.


      The MarkLogic Monitoring History dashboard (http://localhost:8002/history/) is probably the easiest way to gather monitoring history data, but almost all of this information available within the monitoring dashboard is also available over our ReST APIs:

      Application Server Status details

      Information on Application Severs can be found at and here's an example for getting detailed metrics - http://localhost:8002/manage/v2/servers?group-id=Default&view=metrics&format=xml

      For Application Server status information - and here's an example with detailed metrics http://localhost:8002/manage/v2/servers?view=status&group-id=Default&format=xml&fullrefs=true

      To access status information for a specific Application Server (for example, the TaskServer), you can get the current status by adding the name to the URI - http://localhost:8002/manage/v2/servers/TaskServer?group-id=Default&view=status&format=xml

      You can also get the configuration information for a given application server (for example: "Admin") over the ReST API - http://localhost:8002/manage/v2/servers/Admin/properties?group-id=Default&format=xml

      Database and Forest status details

      For databases and forests, you can similarly use the endpoints for /databases or /forests:

      Database level examples include:

      Forest level examples include:

      MarkLogic default Group Level Cache and Huge Pages settings

      The table below shows the default (and recommended) group level cache settings based on a few common RAM configurations for the 9.0-9.1 release of MarkLogic Server:

      Total RAM List Cache Compressed Tree Cache Expanded Tree Cache Triple Cache Triple Value Cache Default Huge Page Ranges
      8192 (8GB) 1024 (1 partition) 512 (1 partition) 1024 (1 partition) 512 (1 partition) 1024 (2 partitions) 1280 to 1994
      16384 (16GB) 2048 (1 partition) 1024 (2 partitions) 2048 (1 partition) 1024 (2 partitions) 2048 (2 partitions) 2560 to 3616
      24576 (24GB) 3072 (1 partition) 1536 (2 partitions) 3072 (1 partition) 1536 (2 partitions) 3072 (4 partitions) 3840 to 4896
      32768 (32GB) 4096 (2 partitions) 2048 (3 partitions) 4096 (2 partitions) 2048 (3 partitions) 4096 (6 partitions) 5120 to 6176
      49152 (48GB) 6144 (2 partitions) 3072 (4 partitions) 6144 (2 partitions) 3072 (4 partitions) 6144 (8 partitions) 7680 to 8736
      65536 (64GB) 8064 (3 partitions) 4032 (6 partitions) 8064 (3 partitions) 4096 (6 partitions) 8192 (11 partitions) 10080 to 11136
      98304 (96GB) 12160 (4 partitions) 6080 (8 partitions) 12160 (4 partitions) 6144 (8 partitions) 12160 (16 partitions) 15200 to 16256
      131072 (128GB) 16384 (6 partitions) 8192 (11 partitions) 16384 (6 partitions) 8192 (11 partitions) 16384 (22 partitions) 20480 to 21020
      147456 (144GB) 18432 (6 partitions) 9216 (12 partitions) 18432 (6 partitions) 9216 (12 partitions) 18432 (24 partitions)

      23040 to 24096

      262144 (256GB) 32768 (9 partitions) 16384 (11 partitions) 32768 (9 partitions) 16128 (22 partitions) 32256 (32 partitions)

      40320 to 42432

      524288 (512GB)

      65536 (22 partitions) 32768 (32 partitions) 65536 (32 partitions) 32768 (32 partitions) 65536 (32 partitions)

      81920 to 82460

      Note that these values are safe to use for MarkLogic 7 and above.

      For all the databases that ship with MarkLogic Server, the Huge Pages ranges on this table will cover the out-of-the box configuration. Note that adding more forests will cause the second value in the range to increase.

      From MarkLogic Server 9.0-7 and above

      In the 9.0-7 release and above (and all versions of MarkLogic 10), automatic cache sizing was introduced; this setting is usually recommended.

      Note: For RAM size greater than 256GB, group cache settings are configured the same as for 256GB with automatic cache sizing. These can be changed using manual cache sizing.

      Maximum group level cache settings

      Assuming a Server configured with 256GB RAM (and above), these are the maximum sizes for the three main group level caches and will utilise 180GB (184320MB) per host for the Group Level Caches:

      • Expanded Tree Cache - 73728 (72GB) (with 9 8GB partitions)
      • List Cache - 73728 (72GB) (with 9 8GB partitions)
      • Compressed Tree Cache - 36864 (36GB) (with 11 3 GB partitions)

      We have found that configuring 4GB partitions for the Expanded Tree Cache and the List Cache generally works well in most cases; for this you would set the number of partitions to 18

      For the Compressed Tree Cache the number of partitions can be set to 22.

      Important note

      The maximum number of configurable partitions is 32

      Each cache partition should be no more than 8192 MB


      Understanding what are the timeout/time-limit configuration options offered by MarkLogic is important when working with queries. There are a set of options that can be configured at group-level and another set of options that can be configured at app-server-level. One can find an extensive list of what those options are and what each option is used for in the documentation (links at the end), but in this article, we will be discussing two of the important timeout configuration options and how they work with each other - Retry timeout and Default time limit.

      Quick Overview

      The retry-timeout is the time, in seconds, before a MarkLogic Server stops retrying a request whereas the default-time-limit is the default value for any request’s time limit, when otherwise specified.

      A deeper dive

      To elaborate on that, the "retry timeout" (group-level setting) is the total time the server will spend waiting to retry (not executing the request itself), so if a request fails in one millisecond with a retryable error (and in general a retry happens every 2 secs), we end up retrying roughly 90 times if the retry-timeout value is set to the default value 180 (90*2 secs = 180). The "default-time-limit" (appserver-level setting), on the other hand, is for each retry i.e, for each retry, the request time gets reset to 0 secs with the default-time-limit which means it is supposed to timeout at the end of whatever value the default-time-limit is configured with. However, if the request fails with a retryable error, a retry happens after a wait time of around 2 secs which means the request time gets reset again to 0 secs and will be set to timeout at the end of the default-time-limit.

      The above behavior is better explained in a sequence of events listed below:

      -> start request time

           -> request time is set to 0 (with a limit of "default time limit" value)

      -> request fails with retryable error

      -> wait some time t1 before retrying (which is usually 2sec, in general)

      -> retry

           -> request time is reset to 0

      -> request fails with retryable error

      -> wait some time t2 before retrying

      -> retry

           -> request time is reset to 0

      -> request fails with retryable error

      -> wait some time t3 before retrying

      -> retry

           -> request time is reset to 0

      -> request succeeds

      -> end request time

      where "retry-timeout" value is the limit for all retry times combined (t1 + t2 +t3) and "default-time-limit" value is the limit for each retry.

      For instance, if the retry-timeout is 180 secs and the default-time-limit is 120 secs, the request will still retry until the 180sec retry timeout is met because the 120 sec limit is for each retry and wouldn’t affect the retry timeout. However, a single retry (or the original request) will timeout at 120 secs.

      Further reading


      MarkLogic Server has a notion of groups, which are sets of similarly configured hosts within a cluster.

      Application servers (and their respective ports) are scoped to their parent group.

      Therefore, you need to make sure that the host and its exposed port to which you're trying to connect both exist in the group where the appropriate application server is defined. For example, if you attempt to connect to a host defined in a group made up of d-nodes, you'll only see application servers and ports defined in the d-nodes group. If the application server you actually want is in a different group (say, e-nodes), you'll get a connection error, instead.


      Can I use any xdmp builtins to show which application servers are linked to particular groups?

      The code example below should help with this:


      The errors 'XDMP-MODNOTFOUND - Module not found' and 'XDMP-NOPROGRAM - Server unable to build program from request' may occur when the requested module does not exist or the user does not have the right permissions on the module.


      When either of these errors is encountered, the first step would be to check if the requested XQuery/JS module is actually present in the modules database. Make sure the the document uri matches the 'root' of the relevant app-server.

      'Modules' field of the app-server configuration specifies the name of the database in which this app-server locates the application code (if it is not set to 'File-system'). When it is set to a specific database, then only documents in that database whose URI begin with the specified root directory are executable. For example, if 'root'  of the database is set to "/codebase/xquery/", then only documents in the database which start with this uri "/codebase/xquery/" are executable.

      If set to 'File-system' make sure the requested module exists in the location specified in the 'root' directory of the app-server. 

      Defining a 'File-system' location is often used on single node DEV systems but not recommended on a clustered environment. To keep the deployment of code simple it is recommended to use a Modules database in clustered production system.

      Once you made sure that the module does exist, the next step is to check if the user has the right permissions to execute the database. More often, it is likely that the error is caused because of a permissions issue.

      (i) Check app-server privileges

      The 'privilege' field in the app-server configuration, when set, specified the execute privilege required to access the server. Only users who are assigned this privilege can access the server and the application code. Absence of this privilege may cause the XDMP-NOPROGRAM error.

      Make sure the user accessing the app-server has the specified privileges. This can be checked by using sec:user-privileges() (Should be run against the Security database).

      The documentation here - contains more detailed information about privileges.

      (ii) Check permission on the requested module

      The user trying to access the application code/modules is required to have the 'execute' permission on the module. Make sure all the xquery documents have 'read' and 'execute' permissions for the user trying to access them. This can be verified by executing the following query against your 'modules' database:


      This returns a list of permission on the document - with the capability that each role has, in the below format:

                    <sec:permission xmlns:sec="">
                    <sec:permission xmlns:sec="">

      You can then map the role-ids to their role names as below: (this should be done against the Security database)

                    import module namespace sec="" at "/MarkLogic/security.xqy";

      If you see that the module does not have execute permission for the user, the required permissions can be added as below: (


                   xdmp:permission("role-name", "execute")))








      Recent exploits in the TLS protocol, such as POODLE, FREAK, LogJam, and SLOTH, have rendered TLSv1.0, TLSv1.1 and SSLv3 largely obsolete.  Additionally, standards councils such as PCI (Payment Card Industry) and NIST (National Institute of Standards & Technology) are moving to disallow the use of these protocols.

      This article will describe the MarkLogic configuration changes needed to harden a MarkLogic HTTP Application Server so that only secure versions of TLSv1.2 are used and where clients attempting to connect with TLSv1.1 or earlier protocols are rejected.


      The TLS protocol versions accepted and the Cipher suites selected are controlled by the specification list set in the "SSL Ciphers" field on the HTTP App Server Configuration panel:

      The format of the specification list follows the OpenSSL format described in the OpenSSL Cipher suite documentation and comprises one or more colon ":" separated ciphers strings which control which cipher suites are enabled or disabled. 

      The default specification used by MarkLogic enables ALL ciphers except those that are considered of LOW encryption and places them in order of @STRENGTH 


      While sufficient for a lot of needs, the default settings still allow for cipher negotiations that are no longer considered secure or weak signature algorithms, such as MD2 and MD5. The following cipher specification string enhances security by only permitting High strength ciphers and disabling weak or vulnerable ciphers.


      For sites requiring even higher levels of security, restricting the ciphers available to a specific list can provide a more advanced level of control of the ciphers used. For example, the following cipher suite list restricts algorithms to those used by TLSv1.2 only. You should therefore disable all other TLS protocols as described below before using this setting in MarkLogic.


      The following string restricts the algorithms available by only permitting TLSv1.2 ciphers using a 256bit key. However, while this increases security even further, it is at risk of being incompatible with many browsers and applications and should only be used after thorough testing.


      At this stage, while the MarkLogic HTTP Application Server is now using more robust security, it will still permit a client to connect using TLSv1.0 or TLsv1.1. To comply with PCI DSS 3.2 and other recommended security standards, compliant sites must stop using TLSv1.0 before 30th June 2018 while NIST SP 800-52 requires that sites only use TLSv1.1 with a recommendation to use TLSv1.2 where possible.

      Note: Since this article was written, the MarkLogic server has added an administrator function to disable individual SSL and TLS protocol versions. If you are still running MarkLogic version 8.0-5 or earlier, you can continue to use the solution outlined below. Otherwise, users of MarkLogic 9 or later should use the new AppServer Set SSL Disabled Protocols function to control which SSL and TLS protocol versions are available. The following XQuery code, when run against the Security Database, will disable all but TLSv1.2 on MarkLogic 9 or later.

      xquery version "1.0-ml";

      import module namespace admin = ""
            at "/MarkLogic/admin.xqy";

      let $config := admin:get-configuration()
      let $appServer := admin:appserver-get-id($config,
          admin:group-get-id($config, "Default"),"ssl-project-appserver")
         admin:save-configuration(admin:appserver-set-ssl-disabled-protocols($config, $appServer, ("SSLv3","TLSv1","TLSv1_1")))

      Run the following Xquery code to confirm the weaker TLS Protocols have been disabled

      xquery version "1.0-ml";

      import module namespace admin = "" at "/MarkLogic/admin.xqy";
      let $config := admin:get-configuration()
      let $appServer := admin:appserver-get-id($config,admin:group-get-id($config, "Default"), "mh-photos-test")
        admin:appserver-get-ssl-disabled-protocols($config, $appServer)



      Warning: Disabling all but TLSv1.2 and restricting available ciphers may break connectivity with applications and browsers configured to use SSLv3, TLSv1.0 or TLSv1.1. MarkLogic recommends that you test thoroughly in a lower QA environment before disabling any algorithms and protocols in a production environment.

      HTTP Strict-Transport-Security

      The HTTP Strict-Transport-Security response header (often abbreviated as HSTS) informs browsers that the site should only be accessed using HTTPS and that any future attempts to access it using HTTP should automatically be converted to HTTPS.

      Set the "enable hsts header" to True to enable HSTS for the AppServer when dictated by your Security requirements.

      TLSv1.2 and browser support (MarkLogic 8 only)

      For TLSv1.2, older browsers should be upgraded to current versions.

      These changes may require users accessing your application to upgrade older browsers such as Firefox < 27.0 or Internet Explorer < 11.0, as these versions do not support TLSv1.2 by default.

      The MarkLogic App Server utilizes OpenSSL, which does not explicitly support enabling or disabling a specific TLS protocol version. However, you effectively get the same outcome by disabling all cipher suites associated with a particular version.

      SSLv3, TLSv1.0 & TLSv1.1 share the same common ciphers, so adding "!SSLv3" and "!TLSv1.0 "to the cipher specification will cause all client connection attempts using any of these protocols to fail, including "TLSv1.1".


      Testing using the OpenSSL s_client utility shows that attempts to connect using TLSv1.0 fail with SSL alert 40, indicating no common cipher was available.

      openssl s_client -connect -debug -tls1
      140735283961936:error:14094410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure:s3_pkt.c:1472:SSL alert number 40
      140735283961936:error:1409E0E5:SSL routines:ssl3_write_bytes:ssl handshake failure:s3_pkt.c:656:

      While connecting using TLSv1.2 is successful.

      openssl s_client -connect -debug -tls1_2
      New, TLSv1/SSLv3, Cipher is AES256-GCM-SHA384
      Server public key is 2048 bit
      Secure Renegotiation IS supported
      Compression: NONE
      Expansion: NONE
      No ALPN negotiated
      Protocol : TLSv1.2
      Cipher : AES256-GCM-SHA384

      Further reading

      On MarkLogic Security Certification

      How does MarkLogic Server's high-availability work in AWS?

      AWS provides fault tolerance within a geographic region through the use of Availability Zones (AZs) while MarkLogic gives that ability through Local Disk Failover (LDF). If you’re using AWS, the best practice is to place each MarkLogic node/EC2 instance in a different Availability Zone within a single region, where a given data forests is in one AZ (AZ A), while its LDF forest is in a different AZ (AZ B). This way, in the event where access to Availability Zone A is lost, the host in the Availability Zone A will failover to its LDF on the host in Availability Zone B, thereby ensuring high-availability within your MarkLogic cluster.

      Further reading:

      Should failover be configured for the Security forest?

      A cluster is not functional without its Security database. Consequently, it’s important to ensure high-availability of the Security database’s forest by configuring failover for that forest.

      Further reading:

      Should my forests have more than one Local Disk Failover forest?

      High-availability through Local Disk Failover with one LDF forest is designed to allow the cluster to survive the failure of a single host. If you're using AWS, careful forest placement across AWS availability zones can provide high-availability even in the event of an entire availability zone going down. With rare exceptions, additional LDF forests are typically not worth the additional complexity and cost for the vast majority of MarkLogic deployments.

      If you configure Local Disk Failover with one LDF coupled with Database Replication and Backups, you would have enough copies of your data to survive the failure of a single host to an entire availability zone.

      Do I still have high-availability post failover? What happens to the data forest? How can I fail back my forests to the way they were?

      When a failover event occurs, the LDF forest takes over as the acting data forest and the configured data forest will assume the role of the acting LDF forest as soon as it is successfully restarted. At this point, as long as both forests are still available, the cluster continues to be high availability but with forests reversing their originally intended roles. To fail back the forests into the roles they were originally intended, you will need to wait until the acting data forest (the originally intended LDF) and acting LDF (the originally intended data forest) are synchronized, then manually restart the acting data forest/intended LDF. At that point, the acting LDF/intended data forest “fails back” to take over its original role of acting data forest, and the acting data forest/intended LDF will once again assume its original role of acting LDF. In short, failover is automatic, but failing back requires a manual restart of the acting data forest/intended LDF. When failing back, it's very important to wait until the forests are synchronized - if you fail back before the forests are fully synchronized, you'll lose any data in the acting data forests that's yet to be propagated back to the acting LDF/intended data forest.

      Further reading:

      Where does the hostname come from?

      • If there is a MARKLOGIC_HOSTNAME environmental variable, it is used as the hostname
      • If there is no environment variable configured, the gethostname() library function is called (instead of the gethostname() system call since we use the GNU C Library - see notes here for more info) which internally calls uname() function 
        • This uname() function looks for and returns the nodename which does or does not have a '.' in it (you can also get the output of the uname() call by running the uname --nodename command on the terminal)
          • If the response from the uname() call has a '.' in it, we consider it a complete name and use it as the hostname
          • Otherwise, we look at the resolv.conf for a domain entry/search entry and we take the first entry and to get the complete hostname, we add this entry from resolv.conf to the uname output from the above step followed by a '.' and use that as a hostname
            • E.g.: <uname_output>.<resolv.conf_entry>
            • Note: the resolv.conf file could have both a domain and a search entry and usually domain entry takes priority over search


      If you experience a hostname mismatch or any hostname issue in general, you can check the following:

      • The following commands/functions are different ways to return the hostname (and you can verify if there is a mismatch)
        • Functions:
        • Commands:
          • hostname
          • hostname -f  (returns FDQN with '.')
          • hostname -d  (lists all the domains)
      • Check the resolv.conf file (under /etc) to see if it contains the right hostname
        • If yes and the issue still persists, restarting ML server would help because if ML is getting the hostname from this file, it will do so at startup

      Note: When you want to open a support ticket in this context, attaching the above information (the outputs of the above commands/functions and the contents of resolv.conf file) along with it would help speed up the investigation

      Possible issues with hostname mismatch:

      Introduction: getting more information about the bugs fixed between releases

      As a general recommendation, we encourage customers to keep the server up-to-date with patch releases at any case.

      If you would like a list of some of the published bugs that were addressed between two releases of the server (for example: 5.0-3 and 5.0-4.1), you can perform the following steps:

      - Log into the support portal at
      - Click on the "Fixed bugs" icon to take you to the bugtrack list
      - Select 5.0-3 in the From: dropdown box
      - Select 5.0-4.1 in the To: dropdown box
      - Click 'Show' to generate an HTML table or View PDF to export the results in a PDF document

      Step one: login

      Provide your credentials and use the form on the left-hand side to log in to access the support portal

      Log into the support portal

      Step two: select the "Fixed bugs" link from the icons on the page

      Select 'Fixed Bugs' to go to the bugtrack list

      Step three: select the release 'range' from the two dropdown lists on the Fixed Bugs page

      Use the Show button to update the page or download the list in PDF format as required

      Select the versions from the 'From' and 'To' lists to generate the report


      In Amazon Web Services, AMIs have unique ids based on their region. There will be many cases when you want to use multiple regions (for example: maintenance of two clusters in separate geographical regions). Below is an example of how to find the list of current AMIs.

      Log in to Amazon Web Services

      Example image showing the AWS Login Page

      Find your MarkLogic instance on Amazon AWS Marketplace

      Example image showing the MarkLogic 8 HVM in Amazon's Marketplace

      For example:

      Click continue

      Example Continue button

      View the table

      Choose the version of MarkLogic Server that you're planning to use from the version dropdown.

      Image of a table showing all AMI IDs available for this item in the AWS Marketplace

      You will see a table containing a list of all current regions and the corresponding AMI ID for our instances for each available region.

      Further reading


      MarkLogic Server has several different features that can help manage data across multiple database instances. Those features differ from each other in several important ways - this article will focus on high-level distinctions and will provide pointers to other materials to help you decide which of these features could work best for your particular use case.


      Backup/Restore - database backup and restore operations in MarkLogic Server provide consistent database-level views of your data. Propagating data from one instance to another via backup/restore involves a MarkLogic administrator using a completed backup from the source instance as the restore archive on the destination instance. You can read more about Backup/Restore here:

      Flexible Replication - can be used to maintains copies of data on multiple MarkLogic Servers. Unlike backup/restore (which relies on taking a consistent, database level view of the data at a particular timestamp), Flexible Replication creates a copy of a document in another database and keeps that copy in sync (possibly with some time-lag/latency) with the original in the course of normal operations. You can read more about Flexible Replication here: Do note that:

      • Flexible Replication is asynchronous. Asynchronous Replication refers to a configuration in which the Master does not wait for confirmation that the update has been received by the Replica before sending further updates.
      • Flexible Replication does not use the same transaction boundaries on the replica as on the master. For example, 10 documents might be inserted in a single transaction on a Flexible Replication master. Those 10 documents will eventually be inserted on a Flexible Replication replica, but there is no guarantee that the replica instance will also use a single transaction to do so.

      Database Replication - is used maintains copies of data on multiple MarkLogic Servers. Database Replication creates a copy of a document in another database and keeps that copy in sync (possibly with some time-lag/latency) with the original in the course of normal operations. You can read more about Database Replication here: Note that:

      a. Database Replication is, like Flexible Replication, asynchronous.

      b. In contrast to Fleixble Replication, Database Replication operates by copying journal frames from the Master database and replays the transactions described by those journal frames on the foreign Replica database.

      XA Transactions - MarkLogic Server can participate in distributed transactions by acting as a Resource Manager in an XA/JTA transaction. If there are multiple MarkLogic Server instances participating as XA resources in a given XA transaction, then it's possible to use that XA transaction as a synchronized means of replicating data across those multiple MarkLogic instances. You can read more about XA Transactions in MarkLogic Server here:


      Upgrading individual MarkLogic instances and clusters is generally very easy to do and in most cases requires very little downtime. In most cases, shutting down the MarkLogic instance on each host in turn, uninstalling the current release, installing the updated release and restarting each MarkLogic instance should be all you need to be concerned about...

      However, unanticipated problems do sometimes come to light and the purpose of this Knowledgebase article is to offer some practical advice as to the steps you can take to ensure the process goes as easily as possible - this is particularly important if you're planning an upgrade between major releases of the product.


      While the steps outlined under the process heading below offer practical advice as to what to do to ensure your data is safeguarded (by recommending that backups are taken prior to upgrading), another very useful step would be to ensure you have your current configuration files backed up.

      Each host in a MarkLogic cluster is configured using parameters which are stored in XML Documents that are available on each host. These are usually relatively small files and will zip up to a manageable size.

      If you cd to your "Data" directory (on Linux this is /var/opt/MarkLogic; on Windows this is C:\Program Files\MarkLogic\Data and on OS X this is /Users/{username}/Library/Application Support/MarkLogic), you should see several xml files (assignments, clusters, databases, groups, hosts, server).

      Whenever MarkLogic updates any of these files, it creates a backup using the same naming convention used for older ErrorLog files (_1, _2 etc). We recommend backing up all configuration files before following the steps under the next heading.


      1) Take a backup for each database in your cluster

      2) Turn reindexing off for each database in your cluster

      3) Starting with the node hosting your Security and Schemas forests, uninstall the current maintenance release MarkLogic version on your cluster, then install the latest maintenance release in that feature release (for example, if you're currently running version 10.0-2, you'll want to update to the latest available MarkLogic 10 maintenance release - at the time of this writing, it is 10.0-4).

      4) Start up the host in your cluster hosting your Security and Schemas forests, then the remaining hosts in the cluster.

      5) Access the Admin UI on the node hosting your Security and Schemas forests and accept the license agreement, either for just that host (Accept button) or for all of the hosts in the cluster (Accept for Cluster button). If you choose the Accept for Cluster button, a summary screen appears showing all of the hosts in the cluster. Click the Accept for Cluster button to confirm acceptance (all of the hosts must be started in order to accept for the cluster). If you accepted the license just for the one host in the previous step, you must go to all of the Admin Interface for all of the other hosts and accept the license for each host before each host can operate.

      6) If you're upgrading across feature releases, you may now repeat steps #3-5 until you reach the desired feature and maintenance release on your cluster (for example, if trying to upgrade from MarkLogic 8 to MarkLogic 10,  after installing 8.0-latest, you'll repeat steps 3-5 for version 9.0-latest).

      7) After you've finished upgrading across all the relevant feature releases, re-enable reindexing for each database in your cluster.

      For more details, please go through Section  “Upgrading a Cluster to a New Maintenance Release of MarkLogic Server” of “Scalability, Availability, and Failover” guide.

      If you've got database replication in place across both a master and replica cluster, then be aware that:

      1) You do not need to break replication between the clusters

      2) You should plan to upgrade both the master cluster and replica cluster. If you upgrade just the master, connectivity between the two clusters will stop due to different XDQP versions. 

      3) If the Security database isn't replicated, then there shouldn't be anything special you need to do other than upgrade the two clusters.

      4) If the security database is replicated, do the following:

      • Upgrade the Replica cluster and run the upgrade scripts. This will update the Replica's Security database to indicate that it is current. It will also do any necessary configuration upgrades.
      • Upgrade the Master cluster and run the upgrade scripts. This will update the Master's Security database to indicate that it is current. It will also do any necessary configuration upgrades.

      For more here Updating Clusters Configured with Database Replication

      Back-out Plan

      MarkLogic does not support restoring a backup made on a newer version of MarkLogic Server onto an older version of MarkLogic Server. Your Back-out plan will need to take this into consideration.

      See the section below for recommendations on how this should be handled.

      Further reading

      Backing out of your upgrade: steps to ensure you can downgrade in an emergency

      Product release notes

      The "Upgrade Support" section of the release notes.

      All known incompatibilities between releases

      The "Upgrading from previous releases" section of the documentation

      MarkLogic Support Fixed Bug List


      spell:suggest() and spell:suggest-detailed aren't simply looking for character differences between the provided strings and the strings in your dictionaries - they're also factoring in differences in the resulting phonetics represented by these strings.


      There is an undocumented option that can be passed along to increase the phonetic-distance threshold (which is 1, by default). For example, consider the following:

      xquery version "1.0-ml";

      spell:suggest-detailed(('customDictionary.xml'),'acknowledgment', <options xmlns=""> <phonetic-distance>2</phonetic-distance> </options> )


      <spell:suggestion original="acknowledgment"
      xmlns:spell=""> <spell:word distance="9" key-distance="2" word-distance="45"
      levenshtein-distance="1">acknowledgement</spell:word> </spell:suggestion>

      Note that the option "distance-threshold" corresponds to "distance" in the result, and "phonetic-distance" corresponds to "key-distance."

      Also note that increasing the phonetic-distance may cause spell:suggest() and spell:suggest-detailed() to use significantly more CPU. Metaphones are short keys, so a larger distance may match a very large fraction of the dictionary, which would then mean each of those matches would need to be checked in the distance algorithms.


      A database consists of one or more forests. A forest is a collection of documents (mostly XML trees, thus the name), implemented as a physical directory on disk. Each forest holds a set of documents and all their indexes. 

      When a new document is loaded into MarkLogic Server, the server puts this document in an in-memory stand and writes the action to an on-disk journal to maintain transactional integrity in case of system failure. After enough documents are loaded, the in-memory stand will fill up and be flushed to disk, written out as an on-disk stand. As more document are loaded, they go into a new in-memory stand. At some point this in-memory stand fills up as well, and the in-memory stand gets written as yet another new on-disk stand.

      To read a single term list, MarkLogic must read the term list data from each individual stand and unify the results. To keep the number of stands to a manageable level where that unification isn't a performance concern, MarkLogic runs merges in the background. A merge takes some of the stands on disk and creates a new singular stand out of them, coalescing and optimizing the indexes and data, as well as removing any previously deleted fragments
      Each forest has its own in-memory stand and set of on-disk stands. Loading and indexing content is a largely parallelizable activity so splitting the loading effort across forests and potentially across machines in a cluster can help scale the ingestion work.

      Deletions and Multi-Version Concurrency Control (MVCC)

      What happens if you delete or change a document? If you delete a document, MarkLogic marks the document as deleted but does not immediately remove it from disk. The deleted document will be removed from query results based on its deletion markings, and the next merge of the stand holding the document will bypass the deleted document when writing the new stand. MarkLogic treats any changed document like a new document, and treats the old version like a deleted document.

      This approach is known in database circles as which stands for Multi-Version Concurrency Control (or MVCC).
      In an MVCC system changes are tracked with a timestamp number which increments for each transaction as the database changes. Each fragment gets its own creation-time (the timestamp at which it was created) and deletion-time (the timestamp at which it was marked as deleted, starting at infinity for fragments not yet deleted).

      For a request that doesn't modify data the system gets a performance boost by skipping the need for any URI locking. The query is viewed as running at a certain timestamp, and throughout its life it sees a consistent view of the database at that timestamp, even as other (update) requests continue forward and change the data.

      Updates and Deadlocks

      An update request, because it isn't read-only, has to use read/write locks to maintain system integrity while making changes. Read-locks block for write-locks; write-locks block for both read and write-locks. An update has to obtain a read-lock before reading a document and a write-lock before changing (adding, deleting, modifying) a document. Lock acquisition is ordered, first-come first-served, and locks are released automatically at the end of a request.

      In any lock-based system you have to worry about deadlocks, where two or more updates are stalled waiting on locks held by the other. In MarkLogic deadlocks are automatically detected with a background thread. When the deadlock happens on the same host in a cluster, the update farthest along (with the most locks) wins and the other update gets restarted. When it happens on different hosts, because lock count information isn't in the wire protocol, both updates start over. MarkLogic differentiates queries from updates using static analysis. Before running a request, it looks at the code to determine if it includes any calls to update functions. If so, it's an update. If not, it's a query. Even if at execution time the update doesn't actually invoke the updating function, it still runs as an update.

      For the most part it's not under the control of the user. The one exception is there's an xdmp:lock-for-update($uri) call that requests a write-lock on a document URI, without actually having to issue a write and in fact without the URI even having to exist.

      When a request potentially touches millions of documents (such as sorting a large data set to find the most recent items), a query request that runs lock-free will outperform an update request that needs to acquire read-locks and writelocks. In some cases you can speed up the query work by isolating the update work to its own transactional context. This technique only works if the update doesn't have a dependency on the outer query, but that turns out to be a common case. For example, let's say you want to execute a content search and record the user's search string to the database for tracking purposes. The database update doesn't need to be in the same transactional context as the search itself, and would slow things down if it were. In this case it's better to run the search in one context (read-only and lock-free) and the update in a different context. See the xdmp:eval() and xdmp:invoke() functions for documentation on how to invoke a request from within another request and manage the transactional contexts between the two.

      Document Lifecycle

      Let's track the lifecycle of a document from first load to deletion until the eventual removal from disk. A document load request acquires a write-lock for the target URI as part of the xdmp:document-load() function call. If any other request is already doing a write to the same URI, our load will block for it, and vice versa. At some point, when the full update request completes successfully (without any errors that would implicitly cause a rollback), the actual insertion work begins, processing the queue of update work orders. MarkLogic starts by parsing and indexing the document contents, converting the document from XML to a compressed binary fragment representation. The fragment gets added to the in-memory stand. At this point the fragment is considered a nascent fragment, a term you'll see sometimes on the administration console status pages. Being nascent means it exists in a stand but hasn't been fully committed. (On a technical level, nascent fragments have creation and deletion timestamps both set to infinity, so they can be managed by the system while not appearing in queries prematurely.) If you're doing a large transactional insert you'll accumulate a lot of nascent fragments while the documents are being processed. They stay nascent until they've been committed. Once the fragment is placed into the in-memory stand, the request is ready to commit. It obtains the next timestamp value, journals its intent to commit the transaction, and then makes the fragment available by setting the creation timestamp for the new fragment to the transaction's timestamp. At this point it's a durable transaction, replayable in event of server failure, and it's available to any new queries that run at this timestamp or later, as well as any updates from this point forward (even those in progress). As the request terminates, the write-lock gets released.

      Our document lives for a time in the in-memory stand, fully queryable and durable, until at some point the in-memory stand fills up and gets written to disk. Our document is now in an on-disk stand. Sometime later, based on merge algorithms, the on-disk stand will get merged with some other on-disk stands to produce a new on-disk stand. The fragment will be carried over, its tree data and indexes incorporated into the larger stand. This might happen several times.

      At some point a new request makes a change to the document, such as with an xdmp:node-replace() call. The request making the change first obtains a read-lock on the URI when it first accesses the document, then promotes the read-lock to a write-lock when executing the xdmp:node-replace() call. If another write-lock were already present on the URI from another executing update, the read-lock would have blocked until the other write-lock released. If another read-lock were already present, the lock promotion to a write-lock would have blocked. Assuming the update request finishes successfully, the work runs similar to before: parsing and indexing the document, writing it to the in-memory stand as a nascent fragment, acquiring a timestamp, journaling the work, and setting the creation timestamp to make the fragment live. Because it's an update, it has to mark the old fragment as deleted also, and does that by setting the deletion timestamp of the original fragment to the transaction timestamp. This combination effectively replaces the old fragment with the new. When the request concludes, it releases its locks. Our document is now deleted, replaced by the new version.

      The old fragment still exists on disk, of course. In fact, any query that was already in progress before the update incremented the timestamp, or any query doing time travel with an old timestamp, can still see it. Eventually the on-disk stand holding the fragment will be merged again, at which point the old fragment will be completely removed from the system. It won't be written into the new on-disk stand. That is, unless the administration "merge timestamp" was set to allow deep time travel. In that case it will live on, sticking around in case any new queries want to time travel to see old fragments.


      When interacting with the MarkLogic Technical Support Team, there will be times when you will need to either submit or retrieve large data content. MarkLogic maintains an SFTP (SSH file transfer protocol) server for this purpose. This article describes how to interact with the SFTP server at 


      Our SFTP service is a self-managed service and requires a public key to be uploaded to the support account profile for activation.  

      • Step 1. Generate an SSH key pair or re-use an existing one 
      • Step 2. Login to the MarkLogic Support portal and click on the "My Profile" link
      • Step 3. Scroll down to the "Public Keys" section and click on "Add Key."
      • Step 4. Copy paste the content of your public key into the text field
      • Step 5. Update the profile by clicking on "Update."

      Our key upload accepts RSA, DSA and ECDSA public keys in OpenSSH, Putty, PKCS1 or PKCS8 format. The uploaded key will be converted to OpenSSH format automatically. After a public key has been uploaded, it will be used for any ticket created in our Support Portal to login to the SFTP service. We recommend rotating the keys on regular bases for security reasons.

      Connection is on the default port 22.  It is advised to follow your IT/Security department about security requirements.

      Customer Access

      The account details to login to our SFTP service will be provided in all email responses or directly from the Support Portal after selecting an open ticket from the "My Ticket" list. The account details will be of the following format.

      "" and are different between each ticket. In general, the creation of an SFTP account happens fully automated in the backend and takes only a few minutes to be ready. No contact is required for the setup, but please reach out if there are any questions or problems.

      Sharing any data requires only a few steps.

      Logging In

      1. Open your preferred SFTP client.
      2. Provide user / account details "". Replace xyz-123-4567 with ticket meta details provided in the email or from our support portal
      3. Verify the private key selection of your client
      4. Login or connect 

      You are now logged in to the MarkLogic SFTP site and in the ticket home directory, where all data will be shared for this ticket.

      Submitting Content to MarkLogic

      Uploading files doesn't require changing to any directory as they will be placed directly into the home folder.

      • To upload a file, use
        • drag and drop for UI-based clients
        • Put command for command line-based clients.
      • In case an upload becomes disconnected, it can also be resumed at any time by using the resume feature of your SFTP client or the "reput" SFTP command.
      • listing and deleting any file is supported the same way
      • After files have been uploaded, our system will scan them, calculate the MD5 checksum, send a notification and add them to the ticket.

      Retrieving Content from MarkLogic

      Downloading files is similar to uploading; no change of directory is required.

      • In case MarkLogic Support uploads/places a file as a notification will be send
      • To download a file use
        • drag and drop for UI-based clients
        • Get command for command line-based clients
      • In case the download is interrupted, it can be resumed at any time using the resume feature of your SFTP client or the "reget" SFTP command.

      Data life cycle

      Any data uploaded to the SFTP site will be available during the lifetime of an open ticket. After a ticket has been closed, the data will be deleted shortly after. It can, of course, be deleted at any time during the ticket work, either on request or by using the SFTP client. In case it is deleted manually, we appreciate getting notified in advance as it might still be needed for further investigations.


      The following article explains the way in-memory caches are used by MarkLogic Server and how can they be utilized to improve query execution.



      MarkLogic Server provides several caches that are used to improve the performance during query execution. When a query executes for the first time, the Server will populate these caches to store termlist and data fragments in memory.

      MarkLogic Server keeps a lot of its configuration information in databases, and has a lot of caches to make it run faster, but those caches get populated the first time things are accessed. The server also uses book-keeping terms in the indexes to keep track of whether all documents have been indexed with the current settings. MarkLogic caches this information, but has to query the indexes on the first request to warm the cache.

      The in-memory cache in MarkLogic Server holds data that was recently added to the system and is still in an in-memory stand; that is, it holds data that has not yet been written to disk.

      For updates, if there is no in-memory stand on a forest when a new document is inserted, the server will create it. This stand is big enough for thousands of documents, but the cost of creating it will be seen in the time taken for the first document added to it.


      How will the in-memory cache help improve query execution

      When a query is executed, the in-memory data structures like range indexes and lexicons get pinned into RAM the first time they are used.  The easiest way to speed things up is to "warm the caches” by running a small sample program that exercises the type-ahead prior to starting production. You can also keep the server warm by doing a non-time-critical stub update at time intervals (every 30 sec to 1 minute). If the server is idle, then it will serve to keep caches and in-memory stand warm. If the server is really busy then it would only take a small amount of extra work. Once this is done, the functionality will be fast for all users in all future sessions.


      MarkLogic requires running with only one master forest for the system databases, including the Security database. 

      The Security database is typically fairly small and there is no reason to have more than one forest for the Security database. Having more than one Security forest causes additional complexity during failover events, server upgrades, and restarts. A functioning Security database is critical to the stability of a MarkLogic Cluster and it is easier to recover from a host failure if the Security database is configured with only a single forest and a single replica forest. 

      In terms of high availability and forest failover, one local disk failover forest should be configured. In terms of database replication, a replica forest in the replica cluster should be configured.

      If you have more than one Security forest(s):

      We have seen incidents where customers attached more than one Security forest either intentionally or inadvertently (scripting bug or user error) and run into issues.

      When the database rebalancer is enabled for the database (default setting) and when a new forest is attached, the database will automatically redistribute the content across all attached forests. Problems can then arise when security forests are detached without preserving their content. This is true for any database, but is problematic when dealing with the Security database. 

      When a Security database forest is detached without first retiring it (and verifying documents are moved out of it), some Security documents will be removed from the database. This may lead to users being locked out of the cluster or render the cluster unusable.  If this occurs on your MarkLogic cluster, please contact MarkLogic Support to help with the repair.

      Best Practice

      • Do not configure more than one forest for any system database, including the Security database.
      • If you have multiple forests in your Security database, and need to come back in line with our one forest recommendation
        • Retire the extra Security database forests;
        • Verify all extra forests are drained of content (zero documents / zero fragments);
        • Detach the extra forests.
      • Once your cluster is in line with our one forest recommendation, disable the rebalancer for the Security database.
      • Configure a single replica forest to achieve high availability.

      Further reading

      Administering Security in MarkLogic

      Database Rebalancing in MarkLogic

      Restoring Security Database

      Security Database restore leading to lingering Certificate Template id in Config files

      The target for range indexes in a MarkLogic database should be about 100. This is because:

      • In the interests of performance, MarkLogic Server indexes your content on ingest, then memory maps those indexes to serialized data structures on disk. Each of those memory maps requires some amount of RAM.
        • If you've got many thousands of indexes you may run into a situation where system monitoring is reporting you've got RAM to spare, but MarkLogic Server is reporting "SVC-MAPINI: Mapped file initialization error." In which case you're likely running up against Linux's default vm.max_map_count value.
        • Independent of SVC-MAPINI errors, the more range indexes you've configured, the longer it will take to perform forest operations.
      • If you find yourself configuring many hundreds or even thousands of range indexes, you should migrate your data modeling scheme to take advantage of Template Driven Extraction (TDE), which was specifically engineered to address this scenario.

      Additional Reading:


      This Knowledgebase article is a general guideline for backups using the journal archiving feature for both free space requirements and expected file sizes written to the archive journaling repository when archive journaling is enabled and active.

      The MarkLogic environment used here was an out-of-the box version 9.x with one change of adding a new directory specific to storing the archive journal backup files.

      It is assumed that the reader of this article already has a basic understanding of the role of Journal Archiving in the Backup and Restore feature of MarkLogic Server. See references below for further details(below).

      How much free space is needed for the Archive Journal files in a backup?

      MarkLogic Server uses the forest size of the active forest to confirm whether the journal archive repository has enough free space to accommodate that forest, but if additional forests already exist on the same volume, then there may be an issue in the Server's "free-space" calculation as the other forests are never used in the algorithm that calculates the free space available for the backup and/or archive journal repositories. Only one forest is used in the free-space calculation.

      In other words, if multiple forests exist on the same volume, there may not be enough free space available on that specific volume due to the additional forests; especially during a high rate of ingestion. If that is the case, then it is advised to provide enough free space on that volume to accommodate the sizes of all the forests. Required Free Space(approximately) = (Number of Forests) x (Size of largest Forest).

      What can we expect to see in the journal archiving repository in terms of files sizes for specific ingestion types and sizes? That brings us to the other side.

      How is the Journal Archive repository filling up?

      1 MByte of raw XML data loaded into the server (as either a new document ingestion or a document update) will result in approximately 5 to 6 MBytes of data being written to the corresponding Journal Archive files.  Additionally, adding Range Indexes will contribute to a relatively small increase in consumed space.

      Ingesting/updating RDF data results in slightly less data being written to the journal archive files.

      In conclusion, for both new document ingestion and document updates, the typical expansion ratio of Journal Archive size to Input file size is between 5 an 6 but can be higher than that depending on the document structure and any added range indexes.


      Problem summary: Sometimes, it is required to use the license key acquired from MarkLogic instead of the one that comes out of box by subscribing to the AMIs on AWS. In such case, the below are the steps to follow to change the license key.

      Please note that if the cluster was created using an enterprise AMI(pay as you go), it is not possible to change the license key details on the instances manually.

      However, if the cluster was created using cloud formation templates, we have an option of updating the stack. In order to change the license key,  please perform the below steps

      1. Modify the cloud formation template by changing the AMI ID to a developer AMI ID or a custom AMI based out of the developer AMI in the cloud formation template.
      2. Go to cloud formation and then update the stack with the new modified template and while updating the stack, please provide the new license key details.
      3. Once the stack update was successful, terminate the existing nodes from EC2 dashboard so that new nodes get created with the new developer AMI.
      4. Once the cluster is back to running state, verify if the new license key is updated through the admin UI. If it still is not updated, please change the license details through the admin UI so that the cluster uses your own license key.

       If the cluster is created using Developer AMI or a custom AMI based out of the developer AMI without using cloud formation templates, you can follow the below steps to update the license on every node

      1. SSH into instance using the key that was used while creating the instance
      2. Stop the MarkLogic server on node.
      3. remove/ take a back up of /var/local.mlcmd.conf
      4. Create a file named marklogic.conf under /etc with the below entries
      5. Complete the above steps on all the nodes.
      6. Start the Cluster by starting MarkLogic server on each node.
      7. Log into the ML's admin GUI , then navigate to license key for every host and click "OK" button. You will observe the server restarting with the new license key.

       Please make sure you test the above in any of your lower environment before implementing in production. Please feel free to get back to us if you have any questions.


      Content processing applications often require multi-step processing. Each step in the process performs a particular task or set of tasks. The Content Processing Framework in MarkLogic Server supports these types of multi-step conversion processes. Sometimes during document delete operation, it is possible that the CPF action might fail with 'XDMP-CONFLICTINGUPDATES' error, which can be seen in document-properties file like:

      Sample message:

      <error:format-string>XDMP-CONFLICTINGUPDATES: xdmp:document-set-property("FILE-NAME", <cpf:state xmlns:cpf=""></cpf:state>) -- Conflicting updates xdmp:document-set-property("FILE-NAME", /cpf:state) and xdmp:document-delete("FILE-NAME")</error:format-string>

      This error message indicates that an update statement (for e.g. xdmp:document-set-property) is trying to update a document that is conflicting with other update occurring (e.g. xdmp:document-delete) in the same transaction.



      Actions that want to delete the target URI need special handling because MarkLogic CPF also wants to keep track of progress in the properties, and just having document-delete [ xdmp:document-delete($cpf:document-uri) ]can't do that.

      Following are ways to achieve the expected behavior and get past the XDMP-CONFLICTINGUPDATES error:

      1) Performing a "soft delete" on the document and then let CPF take care of deleting the document. This can be done by setting the document status to "deleted" via cpf:document-set-processing-status API function. Setting the document's processing status to "deleted" will tell CPF to clean up the document and not update properties at the same time.

      cpf:document-set-processing-status( $uri-to-delete, "deleted" )

      Additional details can be found at:

      2) If you want to keep a record of the URI that is being deleted, you can delete its root node instead of the document. The CPF state will be able be recorded in document-properties, even if the document is gone.


      Details at:

      Problem Statement : AWS has updated the lambda python runtime version to python:3.9.v19 in us-east regions and it fails to satisfy some dependencies that we package with our Managed Cluster Lambda code and fails to create the Managed ENI stack and also NodeManager stack. Stack creation works perfectly fine in other AWS region (us-west-2, eu-central-1) as lambda runtime still uses python:3.9.v18

      Proposed Solution: 

      1. For the newly creating Clusters that use custom templates with ML Managed ENI and NodeManager as reference. Below is what needs to be changed.

      Managed ENI and NodeManager Template Reference: (Code highlighted in blue need to be added, region "us-east-2" should be edited based on the region where stack is created)

      Managed ENI

          Type: 'AWS::Lambda::Function'
          DependsOn: ManagedEniExecRole
              S3Bucket: !Ref S3Bucket
              S3Key: !Join ['/', [!Ref S3Directory,'']]
            Handler: managedeni.handler
            Role: !GetAtt [ManagedEniExecRole, Arn]
            Runtime: python3.9
              RuntimeVersionArn: 'arn:aws:lambda:us-east-2::runtime:edb5a058bfa782cb9cedc6d534ac8b8c193bc28e9a9879d9f5ebaaf619cd0fc0'
              UpdateRuntimeOn: 'Manual'
            Timeout: '180'


          Type: 'AWS::Lambda::Function'
          DependsOn: NodeManagerExecRole
              S3Bucket: !Ref S3Bucket
              S3Key: !Join ['/', [!Ref S3Directory,'']]
            Handler: nodemanager.handler
            Role: !GetAtt [NodeManagerExecRole, Arn]
            Runtime: python3.9
              RuntimeVersionArn: 'arn:aws:lambda:us-east-2::runtime:edb5a058bfa782cb9cedc6d534ac8b8c193bc28e9a9879d9f5ebaaf619cd0fc0'
              UpdateRuntimeOn: 'Manual'
            Timeout: '180'

      2. For the newly creating clusters with default lambda templates that are offered by MarkLogic "ml-managedeni.template" and "ml-nodemanager.template". Marklogic Team patched the templates already. It will be from 10.0-9.2 to 10.0-9.5 and 11.0.0 to 11.0.2. For any ML 10 older versions customers needs to raise support ticket and we will address it.

      3. For the customers who have existing stack and perform upgrades on regular basis. Please follow the below steps on the existing managedENI and NodeManager Lambda functions manually one time before performing any upgrades.

      Look for Managed ENI function AWS Lambda console in the region where stack was deployed

      Under Runtime Settings → Edit runtime management configuration

      Select Manual option and input the ARN of the previous runtime  python:3.9.v18(arn:aws:lambda:us-west-2::runtime:edb5a058bfa782cb9cedc6d534ac8b8c193bc28e9a9879d9f5ebaaf619cd0fc0). Region must be edited accordingly based on where your lambda function is located.

      Repeat the same steps for the NodeManager Lambda function as well and save it before performing any upgrades.


      Sometimes, when a host is removed from a cluster in an improper manner -- e.g., by some means other than the Admin UI or Admin API, a remote host can still try to communicate with its old cluster, but the cluster will recognize it as a "foreign IP" and will log a message like the one below:

      2014-12-16 00:00:20.228 Warning: XDQPServerConnection::init( SVC-SOCRECV: Socket receive error: wait Timeout


      XDQP is the internal protocol that MarkLogic uses for internal communications amongst the hosts in a cluster and it uses port 7999 by default. In this message, the local host is receiveng socket connections from foreign host


      Debugging Procedure, Step 1

      To find out if this message indicates a socket connection from an IP address that is not part of the cluster, the first place is to look is in the hosts.xml files. If the IP address in not found in the hosts.xml, then it is a foreign IP. In that case, the following are the steps will help to identify the the processes that are listening on port 7999.


      Debugging Procedure, Step 2

      To find out who is listening on XDQP ports, try running the following command in a shell window on each host:

            $ sudo netstat -tulpn | grep 7999

      You should only see MarkLogic as a listner:

           tcp 0 0* LISTEN 1605/MarkLogic

      If you see any other process listening on 7999, yopu have found your culprit. Shot down those processes and the messages will go away.


      Debugging Procedure, Step 3

      If the issue persists, run tcpdump to trace packets to/from "foreign" hosts using the following command:

           tcpdump -n host {unrecognized IP}

      Shutdown MarkLogic on those hosts. Also, shutdown any other applications that are using port 7999.


      Debugging Procedure, Step 4

      If the cluster are hosts on AWS, you may also want to check on your Elastic Load Balancer ports. This may be tricky, because instances will change IP addresses if they are rebooted, so  work with AWS Support to help you find the AMI or load balancer instance that is pinging your cluster.

      In the case that the "foreign host" is an elastic load balancer, be sure to remove port 7999 from its rotation/scheduler. In addition, you should set the load balancer to use port 7997 for the heartbeat functionality.


      Sometimes, when a cluster is under heavy load, your cluster may show a lot of XDQP-TIMEOUT messages in the error log. Often, a subset of hosts in the cluster may become so busy that the forests they host get unmounted and remounted repeatedly. Depending on your database and group settings, the act of remounting a forest may be very time-consuming, due to the fact that that all hosts in the cluster are being forced to do extra work of index detection.

      Forest Remounts

      Every time a forest remounts, the error log will show a lot messages like these:

      2012-08-27 06:50:33.146 Debug: Detecting indexes for database my-schemas
      2012-08-27 06:50:33.146 Debug: Detecting indexes for database Triggers
      2012-08-27 06:50:35.370 Debug: Detected indexes for database Last-Login: sln
      2012-08-27 06:50:35.370 Debug: Detected indexes for database Triggers: sln
      2012-08-27 06:50:35.370 Debug: Detected indexes for database Schemas: sln
      2012-08-27 06:50:35.370 Debug: Detected indexes for database Modules: sln
      2012-08-27 06:50:35.373 Debug: Detected indexes for database Security: sln
      2012-08-27 06:50:35.485 Debug: Detected indexes for database my-modules: sln
      2012-08-27 06:50:35.773 Debug: Detected indexes for database App-Services: sln
      2012-08-27 06:50:35.773 Debug: Detected indexes for database Fab: sln
      2012-08-27 06:50:35.805 Debug: Detected indexes for database Documents: ss, fp

      ... and so on ...

      This can go on for several minutes and will cost you more down time than necessary, since you already know the indexes for each database.

      Improving the situation

      Here are some suggestions for improving this situation:

      1. Browse to Admin UI -> Databases -> my-database-name
      2. Set ‘index detection’ to ‘none’
      3. Set ‘expunge locks’ to ‘none’

      Repeat steps 1-4 for all active databases.

      Now tweak the group settings to make the cluster less sensitive to an occasional busy host:

      1. Browse to Admin UI -> Groups -> E-Nodes
      2. Set ‘xdqp timeout’ to 30
      3. Set ‘host timeout’ to 90
      4. Click OK to make this change effective.

      The database-level changes tell the server to speed up cluster startup time when a server node is perceived to be offline. The group changes will cause the hosts on that group to be a little more forgiving before declaring a host to be offline, thus preventing forest unmounting when it's not really needed.

      If after performing these changes, you find that you are still experiencing XDQP-TIMEOUT's, the next step is to contact MarkLogic Support for assistance. You should also alert your Development team, in case there is a stray query that is causing the data nodes to gather too many results.

      Related Reading

      XML Data Query Protocol (XDQP)


      Under normal operations, only a single user object is created for a user-name. However, when users are migrated from another security database and if the recommend checking is not performed, duplicate user-names might be created.


      When there are duplicate user-names in the database, you may see the following message on the Admin UI or in the error logs:

      500: Internal Server Error
      XDMP-AS: (err:XPTY0004) get-element($col, "sec:user", "sec:user-name", $user-name, "SEC-USERDNE") -- Invalid coercion: (fn:doc("*******")/sec:user, fn:doc("*******")/sec:user) as element()?


      To fix duplicate user-names, the extra security object that is created needs to be removed. You can delete one of the extra security objects, which should have a URI similar to:******* where "*******" represents the user-id's.


      To resolve the issue, follow the below steps:

      1. Perform a backup of your Security database in case manual recovery is required.

      2. Login to the QConsole with admin credentials.

      3. Select "Security" database as the content-source

      4. Delete the security object by executing xdmp:document-delete($uri) with $uri set to the Uri of the duplicate user.


      When configuring a server to add a foreign cluster you may encounter the following error:

      Host does not match origin or inferred origin, or is otherwise untrusted.

      This error will typically occur when using MarkLogic Server versions prior to 10.0-6, in combination with Chrome versions newer than 84.x.

      Our recommendation to resolve this issue is to upgrade to MarkLogic Server 10.0-6 or newer. If that is not an option, then using a different browser, such as Mozilla Firefox, or downgrading to Chrome version 84.x may also resolve the error.

      Changes to Chrome

      Starting in version 85.x of Chrome, there was a change made to the default Referrer-Policy, which is what causes the error. The old default was no-referrer-when-downgrade, and the new value is strict-origin-when-cross-origin. When no policy is set, the browser's default setting is used. Websites are able to set their own policy, but it is common practice for websites to defer to the browser's default setting.

      A more detailed description can be found at


      For hosts that don't use a standard US locale (en_US) there are instances where some lower level calls will return data that cannot be parsed by MarkLogic Server. An example of this is shown with a host configured with a different locale when making a call to the Cluster Status page (cluster-status.xqy):


      The problem

      The problem you have encountered is a known issue: MarkLogic Server uses a call to strtof() to parse the values as floats:

      Unfortunately, this uses a locale-specific decimal point. The issue in this environment is likely due to the Operating System using a numeric locale where the decimal point is a comma, rather then a period.

      Resolving the issue

      The workaround for this is as follows:

      1. Create a file called /etc/marklogic.conf (unless one already exists)

      2. Add the following line to /etc/marklogic.conf:

      export LC_NUMERIC=en_US.UTF-8

      After this is done, you can restart the MarkLogic process so the change is detected and try to access the cluster status again.


      This Knowledgebase article outlines the necessary steps required in importing an existing (pre-signed) Certificate into MarkLogic Server and configuring a MarkLogic Application Server to utilize that certificate.

      Existing (Pre-signed) Certificate vs. Certificate Request Generated by MarkLogic

      MarkLogic will allow you to use an existing certificate or will allow you to generate a Certificate Request. The key difference between above two lies in who generates public-private keys and other fields in the certificate.

      For a Pre-Signed Certificate: In this instance, the keys already exist outside of MarkLogic Server, and 3rd party tool would have populated CN (Common Name) and other subject fields to generate Certificate Request File (.csr) containing a public key.

      For a Certificate Request Generated by MarkLogic: In this instance, new keys are generated by MarkLogic Server (it does this while creating the new template), while CN and other fields are added by the MarkLogic Server Administrator (or user) through the web-based MarkLogic admin GUI during New Certificate Template creation.

      The section in MarkLogic's online documentation on Creating a Certificate Template covers the steps required to generate a certificate template from within MarkLogic Server:


      Steps to Import Pre-Signed Certificate and Key into MarkLogic

      1) Create a Certificate Template 

      Create a new Certificate Template with the fields similar to your existing Pre-Signed Certificate

      For example, your current Certificate file -

      [amistry@engrlab18-128-026 PreSignedCert]$ openssl x509 -in ML.pem -text 
              Version: 1 (0x0)
              Serial Number: 7 (0x7)
          Signature Algorithm: sha1WithRSAEncryption
              Issuer: C=US, ST=CA, L=San Carlos, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic CA
                  Not Before: Nov 30 04:12:33 2015 GMT
                  Not After : Nov 29 04:12:33 2017 GMT
              Subject: C=US, ST=NJ, L=Princeton, O=DemoLab Corporation, OU=Engineering,
              Subject Public Key Info:
                  Public Key Algorithm: rsaEncryption
                      Public-Key: (1024 bit)
      For above Certificate we will create below Custom Template in Admin GUI -> Configure-> Security -> Certificate Template  Create Tab as below.
      We will save our new template as - "DemoLab Corporation Template"

      Note - Above fields are placeholders only for signed Certificate, and MarkLogic mainly uses above fields to generate Certificate Signing Request (.csr). For Certificate request generated by 3rd party tool, it does NOT matter if template field matches exactly with final signed Certificate or not.

      Once we have Signed Certificate imported, App Server will use the Signed Certificate, and the SSL Client will only see field values from the Signed Certificate (even if they are different from Template Config page ).

      2) Create an HTTPS App Server

      Please follow Procedures for Enabling SSL on App Servers except for the "Creating Certificate Template" part as we have created the Template to match our existing pre-signed Certificate. 

      3) Verify Pre-signed Certificate and Private Key file 

      Prior to installing a pre-signed certificate and private key the following verification should be performed to ensure that both certificate and key are valid and are in the correct format. 

      * Generate and display the certificate checksum using the OpenSSL utility

      [admin@sitea ~]# openssl x509 -noout -modulus -in cert.pem | openssl md5

      (stdin)= 2ddd2ca48ad2eb4eba082f5da3fd33ab

      * Generate and display the private key checksum

      [admin@siteaa ~]# openssl rsa -noout -modulus -in key.key | openssl md5

      (stdin)= 2ddd2ca48ad2eb4eba082f5da3fd33ab

      The checksum from both commands should return identical values, if the values do not match or if you are prompted for additional information such as the private key password then the certificate and private keys are not valid and should be corrected before proceeding.

      Note: Proceeding to the next step without verifying the certificate and the private key could lead to the MarkLogic server being made inaccessible. 

      Advisory: Private Key's with a key length of 1024 and less are now considered insecure. When generating a Private Key you should ensure a key length of 2048 or higher is used.

      4) Install Pre-signed Certificate and Key file to Certificate Template using Query Console

      Now since Certificate was pre-signed, MarkLogic does not have a key that goes along with that Pre-signed Certificate. We will install Pre-signed Certificate and Key into MarkLogic using below XQuery in Query Console.

      Note: Query Must be run against Security Database. 

      Please change the Certificate Template-Name, and Certificate/Key File location in below XQuery to reflect values from your environment.

      xquery version "1.0-ml";
      import module namespace pki = "" at "/MarkLogic/pki.xqy";
      import module namespace admin = "" at "/MarkLogic/admin.xqy";
      (: Update Template name for your environment :)
      let $templateid := pki:template-get-id(pki:get-template-by-name("TemplateName"))
      (: Path on the MarkLogic host that is readable by the MarkLogic server process (default daemon) :)
      (:   File suffix could also be .txt or other format :)
      let $path-to-cert := "/cert.pem"
      let $path-to-key := "/key.key"
          <options xmlns="xdmp:document-get"><format>text</format></options>),
          <options xmlns="xdmp:document-get"><format>text</format></options>)

       Above will associate our pre-signed Certificate and Key into Template created earlier, which is linked to HTTPS App Server.

      Important note: pki:insert-trusted-certificates can also be used in place of pki:insert-host-certificate in the above example.


      This article discusses the effects of the incremental backup implementation on Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO).


      With MarkLogic 8 you can have multiple daily incremental backups with minimal impact on database performance.

      Incrementals complete more quickly than full backups reducing the backup window. A smaller backup window enables more frequent backups, reducing the RPO of the database in case of disaster.

      However, RTO can be longer when using incremental backups compared to just full backups, because multiple backups must be restored to recover.

      There are two modes of operation when using incremental backups:

      Incremental since last full. Here, each incremental has to store all the data that has changed since the last full backup. Since a restore only has to go through a single incremental data set, the server is able to perform a faster restore.  However, each incremental data set is bigger and takes longer to complete than the previous data set because it stores all changes that were included in the previous incremental.

      Please note when doing “Incremental since last full”:-

      - Create a new incremental backup directory for each incremental backup

      - Call database-incremental-backup with incremental-dir set to the new incremental backup directory


      Incremental since last incremental.  In this case, a new incremental stores only changes since the last incremental, also known as delta backups. By storing only the changes since the last incremental, the incremental backup sets are smaller in size and are faster to complete.  However, a restore operation would have to go through multiple data sets.

      Please note when doing “Incremental since last incremental”:-

      - Create an incremental backup directory ONCE

      - Call database-incremental-backup with the same incremental backup directory.

      See also the documentation on Incremental Backup.



      Topic FAQ Link

      General Questions 

      General Questions

      MarkLogic FAQs - General Questions

      How do I work with MarkLogic Support?

      How to work with MarkLogic Support FAQ

      Customer Success

      MarkLogic FAQs - Customer Success

      Training and Community

      MarkLogic FAQs - Training & Community

      Product Support Matrix

      Product Support/Compatibility Matrix

      MarkLogic Server Fundamentals

      MarkLogic Fundamentals

      MarkLogic Support FAQ

      MarkLogic Server

      MarkLogic FAQs - MarkLogic Server

      Data Ingestion

      MarkLogic Content Pump (MLCP) FAQ


      MarkLogic Upgrade FAQ

      Common Error Messages

      Common Error Messages FAQ

      Database Replication

      Database Replication/Disaster Recovery FAQ


      MarkLogic Backup/Restore FAQ

      Local Disk Failover

      Local Disk Failover FAQ


      Search FAQ

      Template Driven Extraction (TDE)

      Template Driven Extraction FAQ


      Semantics FAQ


      Hadoop FAQ

      Geospatial Double Precision

      Geospatial Double Precision FAQ

      Geospatial Region Search

      Geospatial Region Search FAQ

      MarkLogic on Cloud

      MarkLogic on Amazon Web Services (AWS)

      MarkLogic on AWS FAQ

      Data Hub

      Data Hub Support FAQ

      Data Hub Support FAQ

      Data Hub 

      Data Hub FAQ

      Data Hub Service

      MarkLogic FAQs - Data Hub Service

      Indexing Best Practices

      MarkLogic Server indexes records (or documents/fragments) on ingest. When a database's index configuration is changed, the server will consequently reindex all matching records.

      Indexing and reindexing can be a CPU and I/O intensive operation. Reindexing creates a lot of new fragments, with the original fragments being marked for deletion. These deleted fragments will then need to be merged out. All of this activity can potentially affect query performance, especially in systems with under-provisioned hardware.

      Reindexing in Production

      If you need to add or modify an index on a production cluster, consider scheduling the reindex during a time when your cluster is less busy. If your database is too large to completely reindex during a single period of low usage, consider running the reindex over several periods of time. For example, if your low usage period is during a weekend, the process may look like:

      • Change your index configuration on a Friday night
      • Let the reindex run for most of the weekend
      • To pause the reindex, set the reindexer-enable field to 'false' for the database being reindexed. Be sure to allow sufficient time for the associated merging to complete before system load comes back.
      • If needed, reindexing can continue over the next weekend - the reindexer process will pick up where it left off before it was disabled.

      You can refer to for more details on invoking reindexing on production.

            When you have Database Replication Configured

      If you have to add or modify indexes on a database which has database replication configured, make sure the same changes are made on the Replica cluster as  well. Starting with ML server version 9.0-7, index data is also replicated from the Master to the Replica, but it does not automatically check if both sides have the same index settings. Reindexing is disabled by default on a replica cluster. However, when database replication configuration is removed (such as after a disaster),  the replica database will reindex as necessary. So it is important that the Replica database index configuration matches the Master’s to avoid unnecessary reindexing.

      Further reading -

      Master and Replica Database Index Settings

      Database Replication - Indexing on Replica Explained

      Avoid Unused Range Indexes, Fields, and Path Indexes

      In addition to taking up extra disk space, Range, Field, and Path Indexes require extra work when it's time to reindex. Field and Path indexes may also require extra indexing passes.

      Avoid Using Namespaces to Implement Multi-Tenancy

      It's a common use case to want to create some kind of partition (or multiple partitions) between documents in a particular database. In such a scenario it's far better to 1) constrain the partitioning information to a particular element in a document (then include a clause over that element in your searches), than it is to 2) attempt to manage partitions via unique element namespaces corresponding to each partition. For example, given two documents in two different partitions, you'll want them to look like this:

      1a. <doc><partition>partition1</partition><name>Joe Smith</name></doc>

      1b. <doc><partition>partition2</partition><name>John Smith</name></doc>

      ...vs. something like this:

      2a. <doc xmlns:p="http://partition1"><p:name>Joe Smith</p:name></doc>

      2b. <doc xmlns:p="http://partition2"><p:name>John Smith</p:name></doc>

      Why is #1 better? In terms of searching the data once it's indexed, there's actually not much of a difference - one could easily create searches to accommodate both approaches. The issue is how the indexing works in practice. MarkLogic Server indexes all content on ingest. In scenario #2, every time a new partition is created, a new range element index needs to defined in the Admin UI, which means your index settings have changed, which means the server now needs to reindex all of your content - not just the documents corresponding to the newly introduced partition. In contrast, for scenario #1, all that would need to be done is to ingest the documents corresponding to the new partition, which would then be indexed just like all the other existing content. There would be a need, however, to change the searches in scenario #1, as they would not yet include a clause to accommodate the new partition (for example: cts:element-value-query(xs:QName("partition"), "partition2")) - but the overall impact of adding a partition is changing the searches in scenario #1, which is ultimately far, far less intrusive a change than reindexing your entire database as would be required in scenario #2. Note that in addition to a database-wide reindex, searches would also need to change in scenario #2, as well.

      Keep an Eye on I/O Throughput

      Reindexing can lead to heavy merge activity and may lead to disk I/O bottlenecks if not managed carefully. If you have a system that is available 24-7 with no downtime window, then you may need to throttle the reindexer in order to keep the disk I/O to a minimum. We suggest the following database settings for reindexing a system that must always remain in use:

      • reindexer-throttle = 3
      • large-size-threshold = 1048576

      You can also adjust the following group settings to help limit background I/O:

      • background-io-limit = 100

      This will limit the background I/O for that group to 100 MB/sec per host across all hosts in that group. This should only be configured if merges are causing problems—it is a way of throttling back the I/O used by the merging process.This is good starting point, and may be increased in increments of 50 if you find that your merges are progressing too slowly.  Proceed with caution as too low of a background IO limit can have negative performance or even catastrophic consequences

      General Recommendations

      In general, your indexing/reindexing and subsequent search experience will be better if you


      MarkLogic Admin GUI is convenient place to deploy the Normal Certificate infrastructure or use the Temporary Certificate generated by MarkLogic. However for certain advance solutions/deployment we need XQuery based admin operations to configure MarkLogic.

      This knowledgebase discusses the solution to deploy SAN or Wildcard Certificate in 3 node (or more) cluster.


      Certificate Types and MarkLogic Default Config

      Certificate Types

      In general, When browsers connect to a Server using HTTPS, they check to make sure your SSL Certificate matches the host name in the address bar. There are three ways for browsers to find a match:

      a).The host name (in the address bar) exactly matches the Common Name in the certificate's Subject.

      b).The host name matches a Wildcard Common Name. Please find example at end of article. 

      c).The host name is listed in the Subject Alternative Name (SAN) field as part of X509v3 extensions. Please find example at end of article.

      The most common form of SSL name matching is for the SSL client to compare the server name it connected to with the Common Name (CN field) in the server's Certificate. It's a safe bet that all SSL clients will support exact common name matching.

      MarkLogic allows this common scenario (a) to be configured from Admin GUI, and we will discuss the Certificate featuring (b) and (c) deployment further.

      Default Admin GUI based Configuration 

      By default, MarkLogic generates Temporary Certificate for all the nodes in the group for current cluster when Template is assigned to MarkLogic Server ( Exception is when Template assignment is done through XQuery ).

      The Temporary Certificate generated for each node do have hostname as CN field for their respective Temporary Certificate - designed for common Secnario (a).

      We have two path to install CA signed Certificate in MarkLogic

      1) Generate Certificate request, get it signed by CA, import through Admin GUI

      or 2) Generate Certificate request + Private Key outside of MarkLogic, get Certificate request signed by CA, import Signed Cert + Private Key using Admin script

      Problem Scenario

      In both of the above cases, while Installing/importing Signed Certificate, MarkLogic will look to replace Temporary Certificate by comparing CN field of Installed Certificate with Temporary Certificaet CN field.

      Now, if we have WildCard Certificate (b) or SAN Certificate (c), our Signed Certificate's CN field will never match Temporary Certificate CN field, hence MarkLogic will Not remove Temporary Certificates - MarkLogic will continue using Temporary Certificate.



      After installing SAN or wildcard Certificate, we may run into AppServer which still uses Temporary installed Certificate ( which was not replaced while installing SAN/wild-card Certificate).

      Use below XQuery against Security DB to remove all Temporary Certificates. XQuery needs uri lexicon to be enabled (default enabled). [Please change the Certificate Template-Name in below XQuery to reflect values from your environment.] 

      xquery version "1.0-ml";
      import module namespace pki = ""  at "/MarkLogic/pki.xqy";
      import module namespace admin = ""  at "/MarkLogic/admin.xqy";
      let $hostIdList := let $config := admin:get-configuration()
                         return admin:get-host-ids($config)
      for $hostid in $hostIdList
        (: FDQN name matching Certificate CN field value :)
        let $fdqn := ""
        (: Change to your Template Name string :)
        let $templateid := pki:template-get-id(pki:get-template-by-name("YourTemplateName"))
        for $i in cts:uris()
        (   (: locate Cert file with Public Key :)
            and fn:doc($i)//pki:authority=fn:false()
            and fn:doc($i)//pki:host-name=$fdqn
        return <h1> Cert File - {$i} .. inserting host-id {$hostid}
        {xdmp:node-insert-child(doc($i)/pki:certificate, <pki:host-id>{$hostid}</pki:host-id>)}
            (: extract cert-id :)
            let $certid := fn:doc($i)//pki:certificate/pki:certificate-id
            for $j in cts:uris()
                (: locate Cert file with Private key :)
                and fn:doc($j)//pki:certificate-private-key/pki:certificate-id=$certid
            return <h2> Cert Key File - {$j}
        } </h1>

      Above will remove all Temporary Certificates (including Template CA) and their private-key, leaving only Installed Certificate associated with Template, forcing all nodes to use Installed Certificate. 


      Example: SAN (Subject Alternative Name) Certificate

      For 3 node cluster (,,

      $ opensl x509 -in ML.pem -text -noout
              Version: 3 (0x2)
              Serial Number: 9 (0x9)
              Signature Algorithm: sha1WithRSAEncryption
              Issuer: C=US, ST=NY, L=NewYork, O=MarkLogic, OU=Engineering, CN=Support CA
                  Not Before: Apr 20 19:50:51 2016 GMT
                  Not After : Jun  6 19:50:51 2018 GMT
              Subject: C=US, ST=NJ, L=Princeton, O=MarkLogic, OU=Eng,
              Subject Public Key Info:
                  Public Key Algorithm: rsaEncryption
                  RSA Public Key: (1024 bit)
                      Modulus (1024 bit):
                      Exponent: 65537 (0x10001)
              X509v3 extensions:
                  X509v3 Key Usage: 
                      Key Encipherment, Data Encipherment
                  X509v3 Extended Key Usage: 
                      TLS Web Server Authentication
                  X509v3 Subject Alternative Name: 
          Signature Algorithm: sha1WithRSAEncryption

      Example: Wild-Card Certificate

      For 3 node cluster (,, 

      $ openssl x509 -in ML-wildcard.pem -text -noout
              Version: 1 (0x0)
              Serial Number: 7 (0x7)
              Signature Algorithm: sha1WithRSAEncryption
              Issuer: C=US, ST=NY, L=NewYork, O=MarkLogic, OU=Engineering, CN=Support CA
                  Not Before: Apr 24 17:36:09 2016 GMT
                  Not After : Jun 10 17:36:09 2018 GMT
              Subject: C=US, ST=NJ, L=Princeton, O=MarkLogic Corporation, OU=Engineering Support, CN=*


      Okta provides secure identity management and single sign-on to any application, whether in the cloud, on-premises or on a mobile device.

      The following procedure describes the procedure required to integrate MarkLogic with Okta identity management and Microsoft Windows Active Directory using the Okta AD Agent.

      This document assumes that the users accessing MarkLogic are defined in the Windows Active Directory only and do not currently have Okta User Profiles defined.

      Authentication Flow

       The authentication flow in this scenario will be as follows:

      1. The user opens a Browser connection to the Site Single Sign-On Portal page.
      2. The user enters their Active Directory credentials
      3. Okta verifies the user credentials using the Okta LDAP Agent
      4. If successful, the user is presented with a selection of applications they can sign-on to.
      5. The user selects the required application and Okta completes the sign-on using the stored user credentials.


      • MarkLogic Server version 8 or 9
      • Okta Admin account access
      • Okta AD Agent
      • Active Directory Server

      For the purpose of this document the following Active Directory user entry will be used as an example:

      # LDAPv3
      # base <dc=MarkLogic,dc=Local> with scope subtree
      # filter: (sAMAccountName=martin.warnes)
      # requesting: *
      # Martin Warnes, Users, marklogic.local
      dn: CN=Martin Warnes,CN=Users,DC=marklogic,DC=local
      objectClass: top
      objectClass: person
      objectClass: organizationalPerson
      objectClass: user
      cn: Martin Warnes
      sn: Warnes
      givenName: Martin
      distinguishedName: CN=Martin Warnes,CN=Users,DC=marklogic,DC=local
      sAMAccountName: martin.warnes
      memberOf: CN=mladmins,CN=Users,DC=marklogic,DC=local
      sAMAccountType: 805306368
      userPrincipalName: martin.warnes@marklogic.local


      1. By default, Okta uses the email address as the username, however, MarkLogic usernames cannot contain certain special characters such as the @ symbol so the sAMAccountName will be used to sign-on on to MarkLogic. This will be configured later during the Okta Application definition.
      2. One or more memberOf attributes should be assigned to the Active Directory user entry and these will be used to assign MarkLogic Roles without requiring the need to configure duplicate user entries in the MarkLogic security database.

      Step 1. Create a MarkLogic External Security definition

       An External Security definition is required to authenticate and authorize Okta users against a Microsoft Windows Active Directory server.

       Full details on configuring an external security definition can be found at:

       You should ensure that both “authentication” and “authorization” are set to “ldap”, for details on the remaining settings you should consult your Active Directory administrator.

      Step 2. Assign Active Directory group membership to MarkLogic Roles

      In order to assign the correct Roles and Permission to Okta users, you will need to map Active Directory memberOf attributes to MarkLogic rolls.

      In my example Active Directory user entry martin.warnes belongs to the following Group:

       memberOf: CN=mladmins,CN=Users,DC=marklogic,DC=local

      To ensure that all members of this Group are assigned MarkLogic Admin roles you simply need to add the memberOf attribute value as an external name in the admin role as below:

      Step 3. Configure the MarkLogic AppServer

      For each App Server that you wish to integrate with Okta, you will need to set the “authentication” to “basic” and select the “external security” definition.

      As HTTP Basic Authentication is considered insecure it is highly recommended that you secure the AppServer connection using HTTPS by configuring and selecting a “SSL certificate template”.

       Further details on configuring SSL for AppServers can be found at:

      Step 4. Install and Configure Okta AD Integration

      In order for Okta to authenticate your Active Directory users, you will first need to download and install the Okta AD Agent using the following instructions supplied by Okta

       Once installed your Okta Administrator will be able to complete the AD Agent configuration to select which AD users to import into Okta.

      Step 5. Create Okta MarkLogic application

      From the Okta Administrator select “Add Application”, search for the Basic Authentication template and click “Add

      On the “General Settings” tab, enter the MarkLogic AppServer URL, ensure to use HTTP or HTTPS depending on whether you have chosen to secure the listening port using TLS.

       Check the “Browser plugin auto-submit” option.

      On the Sign-On options panel select “Administrator sets username, password is the same as user’s Okta password

       For “Application username format” select “AD SAM Account name” from the drop-down selection.

      Once the Okta application is created you should assign the users permitted to access the application

      When assigning a user, you will be prompted to check the AD Credentials, at this point you should just check that Okta has selected the correct "sAMAccountName" value, the password will not be modifiable.

      Repeat Step 5. for each AppServer you wish to access via the Okta SSO portal.

      Step 6. Sign-on to Okta SSO Portal

      All assigned MarkLogic applications should be shown:

      Selecting one of the MarkLogic applications should automatically log you in using your AD Credentials stored within Okta.

      Additional Reading


      MarkLogic Server provides pre-commit or post-commit triggers and these triggers listen for certain events to occur and then invokes a configured XQuery module to run after the event occurs. It is a common use case to create a common function in a library module which is shared among different trigger modules called by various triggers. This article shows an example to create and use such a shared library module in a post-commit trigger.


      This example shows a simple post commit trigger that fires when a new document is created.

      1. For this example create a database 'minidb' and after that set its triggers database as self (minidb). Also, create another database 'minimodule' to store all modules.

      2. Using Query Console, create a trigger using trigger definition by evaluating below XQuery against triggers database (minidb):


      3. Create a module by running below XQuery against modules database:


      4. Insert a library module into the modules database (minimodules):


      5. Now insert the sample document into the content database (minidb):


      6. Check the output in logs:

      After a new document having its URI prefixed with "/mini" is inserted into the content database, the TaskServer Logs file logs the below message:

      2018-04-25 11:40:50.224 Info: *****Document with /mini root /mini/test-25-1-1.xml was created.*****2018-04-25T11:40:50+05:30

      NOTE: Module imports are relative to root.


      1. Creating and Managing Triggers With triggers.xqy -


      We are always looking for ways to understand and address performance issues within the product and we are addressing this by adding the following new diagnostic features to the product.

      New Trace Events in MarkLogic Server

      Some new diagnostic trace events have been added to MarkLogic Server:

      • Background Time Statistics - Background thread period and further processing timings are added to xdmp:host-status() output if this trace event is set.
      • Journal Lag 30 - A forest will now log a warning message if a frame takes more than 30 seconds to journal.
        • Please note that this limit can be adjusted down by setting the Journal Lag # trace event (where # is {1, 2, 5 or 10} seconds).
      • Canary Thread 10 - A new "canary thread" that does nothing but sleep for a second and check how long is was since it went to sleep.
        • It will log messages if the interval between sleeping has exceeded 10 seconds.
        • This can be adjusted down by setting the Canary Thread # trace event (where # is {1, 2, 5 or 10} seconds).
      • Canary Thread Histogram - Adding this trace event will cause MarkLogic to write to the ErrorLog a histogram of timings once every 10 minutes.
      • Forest Fast Query Lag 10 - By default, a forest will now warn if the fast query timestamp is lagging by more than 30 seconds.
        • This can be adjusted down by setting the Forest Fast Query Lag # (where # is {1, 2, 5, or 10} seconds).
        • Note that Warning level messages will be repeatedly logged at intervals while the lag limit is exceeded, with the time between logged messages doubling until it reaches 60 seconds.
        • There will be a final warning when the lag drops below the limit again as a way to bracket the period of lag.

      Examples of some of new statistics can be viewed in the Admin UI by going to the following URL in a browser (replacing hostname with the name of a node in your cluster and replacing TheDatabase with the name of the database that you would like to monitor):

      You can clear the forest insert and journal statistics by adding clear=true to your request; executing the following in a browser:

      These changes now feature in the current releases of both MarkLogic 7 and MarkLogic 8 and are available for download from our developer website:

      Hints for interpreting new diagnostic pages

      Here's some further detail on what the numbers mean.

      First, a note about how bucketing is performed on these diagnostic pages:

      For each operation category (e.g. Timestamp Wait, Semaphore, Disk), the wait time will fall into a range of values, which need to be bucketed.

      The bucketing algorithm starts with 1000 buckets to cover the whole range, but then collapses them into a small set of buckets that cover the whole span of values. The algorithm aims to

      1. End up with a small number of buckets

      2. Include extreme (out-liers) values

      3. Spread out multiple values so that they are not too "bunched-up" and are therefore easier to interpret.

      Forest Journal Statistics (http://hostname:8001/forest-journal-statistics.xqy?database=TheDatabase)

      When we journal a frame, there are a sequence of operations.

      1. Wait on a semaphore to get access to the journal.
      2. Write to the journal buffer (possibly waiting for I/O if exceeding the 512k buffer)
      3. Send the frame to replica forests
      4. Send the frame to journal archive/database replica forests
      5. Release the semaphore so other threads can access the journal
      6. Wait for everything above to complete, if needed.
        1. If it's a synchronous op (e.g. prepare, commit, fast query timestamp), we wait for disk I/O
        2. If there are replica forests, we wait for them to acknowledge that they have journaled and replayed.
        3. If the journal archive or database replica is lagged, wait for it to no longer be lagged.

      We note the wall clock time before/after these various options, so we can track how long they're taking.

      On the replica side, we also measure the "Journal Replay" time which would be inserting into the in-memory stand, committing, etc.

      Here's an example for a master and its replica.

      Forest F-1-1

      Timestamp Wait
      Bucket (ms)Count%CumulativeCumulative %
      0..9 280 99.64 280 99.64
      50..59 1 0.36 281 100.00
      Bucket (ms)Count%CumulativeCumulative %
      0..9 816 100.00 816 100.00
      Bucket (ms)Count%CumulativeCumulative %
      0..9 204 99.51 204 99.51
      10..19 1 0.49 205 100.00
      Local-Disk Replication
      Bucket (ms)Count%CumulativeCumulative %
      0..9 804 99.26 804 99.26
      10..119 6 0.74 810 100.00
      Journal Archive

      No Information

      Database Replication

      No Information

      Journal Total
      Bucket (ms)Count%CumulativeCumulative %
      0..9 810 99.26 810 99.26
      10..119 6 0.74 816 100.00
      Journal Replay

      No Information

      Forest F-1-1-R

      Timestamp Wait

      No Information

      Bucket (ms)Count%CumulativeCumulative %
      0..9 811 100.00 811 100.00
      Bucket (ms)Count%CumulativeCumulative %
      0..9 203 99.02 203 99.02
      10..59 2 0.98 205 100.00
      Local-Disk Replication

      No Information

      Journal Archive

      No Information

      Database Replication

      No Information

      Journal Total
      Bucket (ms)Count%CumulativeCumulative %
      0..9 809 99.75 809 99.75
      10..59 2 0.25 811 100.00
      Journal Replay
      Bucket (ms)Count%CumulativeCumulative %
      0..9 807 99.63 807 99.63
      10..119 3 0.37 810 100.00

      Forest Insert Statistics (http://hostname:8001/forest-insert-statistics.xqy?database=TheDatabase)

      When we're inserting a fragment into an in-memory stand, we also have a sequence of operations.

      1. Wait on a semaphore to get access to the in-memory stand.
      2. Wait on the insert throttle (e.g. if there are too may stands)
      3. Wait for the stand's journal semaphore, to serialize with the previous insert if needed.
      4. Release the stand insert semaphore.
      5. Journal the insert.
      6. Release the stand journal semaphore.
      7. Start the checkpoint task if the stand is full.

      As with the journal statistics, we note the wall clock time between these operations so we can track how long they're taking.

      On the replica side, the behavior is similar, although the journal and insert are in reverse order (we journal before inserting into the in-memory stand). If it's a database replica forest, we also have to regenerate the index information (Filled IPD).

      Here is a example for a master and its replica.

      Forest F-1-1

      Journal Throttle
      Bucket (ms)Count%CumulativeCumulative %
      0..9 606 100.00 606 100.00
      Insert Sem
      Bucket (ms)Count%CumulativeCumulative %
      0..9 604 99.67 604 99.67
      80..199 2 0.33 606 100.00
      Filled IPD

      No Information

      Stand Throttle
      Bucket (ms)Count%CumulativeCumulative %
      0..9 606 100.00 606 100.00
      Stand Insert
      Bucket (ms)Count%CumulativeCumulative %
      0..9 605 99.84 605 99.84
      100..109 1 0.17 606 100.00
      Journal Sem
      Bucket (ms)Count%CumulativeCumulative %
      0..9 604 99.67 604 99.67
      10..119 2 0.33 606 100.00
      Bucket (ms)Count%CumulativeCumulative %
      0..9 603 99.50 603 99.50
      10..119 3 0.50 606 100.00
      Bucket (ms)Count%CumulativeCumulative %
      0..9 597 98.51 597 98.51
      10..19 6 0.99 603 99.50
      200..229 3 0.50 606 100.00

      Forest F-1-1-R

      Journal Throttle

      No Information

      Insert Sem
      Bucket (ms)Count%CumulativeCumulative %
      0..9 606 100.00 606 100.00
      Filled IPD

      No Information

      Stand Throttle
      Bucket (ms)Count%CumulativeCumulative %
      0..9 606 100.00 606 100.00
      Stand Insert
      Bucket (ms)Count%CumulativeCumulative %
      0..9 605 99.84 605 99.84
      110..119 1 0.17 606 100.00
      Journal Sem
      Bucket (ms)Count%CumulativeCumulative %
      0..9 606 100.00 606 100.00

      No Information

      Bucket (ms)Count%CumulativeCumulative %
      0..9 605 99.84 605 99.84
      110..119 1 0.17 606 100.00

      Further reading

      To learn more about diagnostic trace events, please refer to our documentation and Knowledgebase articles and note that some trace events may only log information if logging is set to debug:

      Data Hub Framework allows you to model your data according to business entities. And Template Driven Extraction (TDE) allows you to view these entities through a relational or a semantic lens. With Data Hub Framework (DHF), TDE Templates are now created automatically so you can create data as rows using SQL or Optic API (see this video for more information). Template Driven Extraction feature has been available in MarkLogic for a while now whereas the DHF Generated TDE feature came out in DHF 4.

      Recently, we have been receiving reports of a couple of issues with respect to the DHF generated TDE feature and we are currently working on investigating and resolving those issues. Although this feature is fully functional for the most part, while our investigation is in progress, if you are seeing issues with your DHF generated TDE feature, our recommendation is to consider the DHS generated TDE as an example only and based on that, create your own TDE in the meantime to be able to handle the queries that you would like to run.

      Helpful resources:


      The jemalloc library is included with the MarkLogic install and is recommended to use as it has shown a performance boost over the default Linux malloc library.  It is included with the MarkLogic server install and is configured to be used by default. 

      There have been cases where even if configured, the library is not used.  This article will give possible solutions to debug that.


      ErrorLog message on startup if jemalloc is not allocated:

      Warning: Memory allocator is not jemalloc; check /etc/sysconfig/MarkLogic


      1) Make sure to use superuser shell or sudo and run the 'service MarkLogic restart'

      2) Verify that the jemalloc library is present in the install directory (ie /opt/MarkLogic/lib/

      3) Has the /etc/sysconfig/MarkLogic configuration file been modified from the default?  Try setting the configuration file back to the default and restarting the server.

      4) Confirm that /etc/sysconfig/MarkLogic contain the following lines:
      # preload jemalloc
      if [ -e $MARKLOGIC_INSTALL_DIR/lib/ ]; then


      For more information on the jemalloc library, please review the article provided by Facebook Engineering


      This article compares JSON support in MarkLogic Server versions 6, 7, and 8, and the upgrade path for JSON in the database.

      How is native JSON different than the previous JSON support?

      Previous versions of MarkLogic Server provided XQuery APIs that converted between JSON and XML. This translation is lossy in the general case meaning developers were forced to make compromises on either or both ends of the transformation. Even though the transformation was implemented in C++ it still added significant overhead to ingestion. All of these issues go away with JSON as a native document format. 

      How do I upgrade my JSON façade data to native JSON?

      For applications that use the previous JSON translation façade (for example: through the Java or REST Client APIs), MarkLogic 8 comes with sample migration scripts to convert JSON stored as XML into native JSON.

      The migration script will upgrade a database’s content and configuration from the XML format that was used in MarkLogic 6 and 7 to represent data to native JSON, specifically converting documents in the namespace.
      If you are using the MarkLogic 7 JSON support, you will also need to migrate your code to use the native JSON support. The resulting application code is expected to be more efficient, but it will require application developers to make minor code changes to your application.
      See also:
      Version 8 JSON incompatibilities


      MarkLogic Server provides a couple of useful techniques for keeping values in memory or resolving values without having to scan for documents on-disk.


      There are a few options available:

      1. cts:element-values performs a lexicon lookup so it's directly getting those values from the range indexes; you can add an options node and use the "map" parameter to get the call to return a map directly as per the documentation, which may give you what you need without having to do any further work.


      2. Storing a map as a server field is a popular approach and is widely used for storing data that needs to be accessed routinely by queries.

      Bear in mind that there is a catch to this approach as the map is not available to all nodes in a cluster - it is only available to the node responsible for evaluating the original request, so if you're using this technique in a clustered environment, the results may not be what is expected.

      Also note that if you're planning on storing a large number of maps in server fields on nodes on the cluster, it's important to make sure the hosts are provisioned with enough memory to accommodate these maps on top of group level caches and memory for query allocation, stands, range indexes document retrieval and the like.



      3. xdmp:set only allows you to set a value for the life of a single query but this technique can be useful in some circumstances - especially in situations where you're interested in keeping track of certain values throughout the processing of a module or a function within a module.


      4. If you have a situation where you have a large number of complex queries - particularly ones where lexicon lookups or calls to range indexes won't resolve the data you need and where lots of documents will need to be retrieved from disk, you should consider using registered queries.


      Note that registered queries utilise the List Cache so, if you plan to adopt this method, we recommend careful testing to ensure your caches are sized sufficiently to suit the needs of your application.


      This article explains how to kill Long Running Query and related timeout configurations.

      Problem Scenario

      At some point, we've all run into an inefficient long running query. What should we do if we don't want to wait for the query to complete? If we cancel the browser request, that would end the connection, but it wouldn't end the program invocation (called a "request") on the MarkLogic Server side. On the server side, that program invocation would continue to run until the execution is complete.

      Most of the time, this isn't really an issue. The server, of course, is multi-threaded, handling many concurrent transactions. We can just cancel the browser request, move on, and let the query finish when it finishes. However, sometimes it becomes necessary to free up server resources by killing the query and starting over. To do this, we need access to the Admin interface. 

      Sample Long running Query 

      Example only, please don't try this on any production machines!

      for $x in 1 to 1000000
      return collection()[1 + xdmp:random(1000)]
      This query is asking for 1,000,000 random documents, and will take a long time to execute. How can we cancel this query?

      How to Cancel/Kill the Query

      Go to the Administrative interface (at http://localhost:8001/ if you're running MarkLogic locally). At the top of the screen, you'll see a tab labeled "Status." Click that:


      This will take you to the "System Status" screen. This page reveals status information about hosts, databases, forests, and app servers. The App Server section is what we're concerned with. Scanning down the "Queries" column, we see that the "Admin" server is processing a query (namely, the one that generated the page we see). Everything looks okay so far. But just below that, we see that the "App-Services" server is just over 3 minutes into processing a query. That's our slow one. Query Console runs on the "App-Services" app server, which explains why we see it there. Go ahead and click the "App-Services" link:


      This takes us to the "App-Services" status page. So far, there's still no "cancel" button. One more click will reveal it ("show more"):


      We can now see an individual entry for the currently running query. Here we see it's called "eval.xqy"; that's the query module that Query Console invokes when you submit a query. If you were running your own query module (instead of using Query Console), then you would see its name here instead. To cancel the query, click the "[cancel]" link:


      One more click (on the confirmation page).


      This takes us back to the status page, where we see MarkLogic Server is in the process of canceling our query:


      Above page will continue to say "cancelling..." even though query is already killed and no longer exist till we refresh the page.

      A quick refresh of the above page shows that the query is no longer present.



      What happens if you forget to cancel a query?

      MarkLogic will continue to execute the query until a time limit is reached, at which point the Server will cancel the query for you. For example, here's what Query Console eventually returns back if we don't bother to cancel the query:


      How long is this time limit?

      This depends on your server configuration. We can actually set the timeout in the query itself, using the xdmp:set-request-time-limit() function, but even that will be limited by your server's "max time limit."

      For example, on the "Configure" tab of my "App-Services" app server, you can see that the "default time limit" is set to 10 minutes (600 seconds), and the longest any query can allow itself to run (by setting its own request time limit) is one hour (3600 seconds):



      Update and delete operations can be performance intensive and have negative effects on search performance when done in a conventional way, where data is updated or deleted in-place. To avoid these performance impacts during update and delete operations, MarkLogic Server updates and deletes "lazily."

      In MarkLogic Server, when you delete a document, it is not removed from disk immediately as that document's fragments are instead marked as "obsolete." Marking a document as obsolete tags its fragments for later removal, and also hides its fragments from subsequent query results. Updates happen in a similar way, where instead of updating in-place, MarkLogic Server marks the old versions of the fragments in an old stand as "obsolete" for later deletion, while also creating new versions of those fragments in a new stand (initially an in-memory stand, which is eventually written down as a new on-disk stand).

      Eventually, merges occur to move any unchanged fragments from an old stand into a new stand. Old fragments marked obsolete are ultimately deleted after the merge creating the new stand finishes, where the old stands that were used as input into that merge are finally removed from disk. Merging is very important - this is the mechanism by which MarkLogic Server both frees up disk space and optimizes its on-disk data structures, as well as reduces the number of fragments evaluated during its queries and searches.

      Note that for a merge-min-ratio of n, you can expect up to 1/(n+1) of a stand to be deleted fragments before the stand is automatically merged.  See Overview of the Merge Policy Controls.

      While lazy deletion results in faster updates and deletes, be aware that residual impacts can be seen in terms of both disk space and query performance if merges are not done in a timely manner.

      Further reading:

      Multi-Version Concurrency Control
      How do updates work in MarkLogic Server?
      ML Performance: Understanding System Resources


      MarkLogic Server allows you to configure MarkLogic Server so that users are authenticated using an external authentication protocol, such as Lightweight Directory Access Protocol (LDAP) or Kerberos. These external agents serve as centralized points of authentication or repositories for user information from which authorization decisions can be made. If, after following the configuration instructions in our documentation, the authentication does not work as expected, this article gives some additional debugging ideas.


      The following are areas should be checked when your LDAP Authentication is not working as expected:

      1. Verify that cyrus-sasl-md5 library is installed on MarkLogic Server node.

      2. Run the following LDAP search command to check if LDAP server is properly setup.

      ldapsearch -H ldap://{Your LDAP Serevr URI}:389 -x -s base

      a. Once you run the ldap search command, make sure digest-md5 is supported. 

      supportedSASLMechanisms: DIGEST-MD5

      b. Identify the correct LDAP Service name:

      e.g ldapServiceName: MLTEST1.LOCAL:dc1$@MLTEST1.LOCAL

      3. On Windows platforms, the services.keytab file is created using Active Directory Domain Services (AD DS) on a Windows server. If you are using Active Directory Domain Services (AD DS) on a computer that is running Windows Server 2008 or Windows Server 2008 R2, be sure that you have installed the hot fix described in

      Introduction: the issue

      MarkLogic performs Nested lookups on the LDAP Groups assigned to a user to determine which roles the user will be assigned. If the groups belong to multiple Active Directory Domains within a federated Active Directory Forest then MarkLogic user authorization could fail with a subordinate Referral error, as seen below:

      2019-07-30 13:27:23.002 Notice: XDMP-LDAP: ldap_search_s failed on ldap server ldap:// Referral (10)


      MarkLogic has been configured to connect to the Local Domain Controller LDAP ports 389 (LDAP) or 636 (LDAPs), however, a Local Domain Controller can only search domains to which it has access.


      A user is a member of the following groups which belong to two separate Active Directory domains, subA, and subC.

      Using a Local Domain Controller for subA for external authorization would result in a login failure when attempting to perform the nested group lookup for the domain subC

      member=CN=Group Onw,OU=OrgUnitAGroups,OU=OrgUnitA,DC=subA,DC=domain
      member=CN=Group Two,OU=OrgUnitAGroups,OU=OrgUnitA,DC=subA,DC=domain
      member=CN=Group Three,OU=OrgUnitCGroups,OU=OrgUnitC,DC=subC,DC=domain


      If you have multiple Active Directory Domains federated into an Active Directory forest you should use the Global Catalog port 3278 (LDAP) or 3279 (LDAPS) to prevent failures when searching for group memberships that are defined in other domains.

      Optional workaround

      A large number of nested groups can potentially lead to a decrease in login time performance, if you do not need to really on nested lookups to determine group membership for MarkLogic roles, i.e. all groups required are returned from the initial user search request then you should consider disabling setting the "ldap nested lookup" parameter to false in the External Security configuration.

      Doing this would also prevent subordinate domain searches and allow you to continue to use an Active Directory Domain Controller instead of switching to the Global Catalog.

      Further reading


      A leap second, as defined by wikipedia is "a one-second adjustment that is occasionally applied to Coordinated Universal Time (UTC) in order to keep its time of day close to the mean solar time. Without such a correction, time reckoned by Earth's rotation drifts away from atomic time because of irregularities in the Earth's rate of rotation."  At the time of this writing, the next leap second to be inserted is on June 30, 2015 at 23:59:60 UTC.

      For systems that use the Network Time Protocol (NTP) to synchronize the network time across all the host in their MarkLogic Cluster, the Marklogic Server Software is not impacted by the leap second (i.e. we expect everything to work fine at the MarkLogic layer)

      For systems where the synchronization of their system clocks require UTC time to be set backwards, then anywhere time dependent data is stored, it must be accounted for. In this case, we recommend that our customers implement NTP in their environment.  Otherwise, the application layer will need to handle discontinuous time. 

      Transactional Consistency

      The algorithm that MarkLogic Server uses to maintain transactional consistency of data is not wall clock dependent and, as such, is not affected by the leap second.

      Network Time Protocol (NTP)

      NTP generally works really really hard not to make time go backwards as clock readings are constrained to always increase - every reading increases the NTP clock. NTP adjusts things gradually by slowing down or speeding up the clock and not by making discrete changes unless time is off by a lot. A second is not a lot.  An hour is a lot. Regardless of the leap second, adjustments for computer clock drift can easily be more than a second and happen frequently. 

      When Time Goes Backwards

      Without NTP and left on their own, computer clocks are really not that accurate. If synchronization of the system clocks on the hosts of a MarkLogic cluster require the clocks to be set backwards, then the application layer will need to account for and handle discontinuous date-time in their data. 

      Beginning with MarkLogic Server version 8,  the temporal feature was introduced.  If the system clock is adjusted backwards, there are conditions where temporal document inserts and updates will fail with an appropriate error code.  This is by design and expected.

      Our recommendation is to implement NTP on all hosts of a MarkLogic cluster to eliminate the need to handle discontinuous time at the application layer. 

      Further Reading

      Redhat article on the Leap Second - ;

      Microsoft Support article on the Leap Second - ;



      The internal mechanisms MarkLogic Server uses to implement security are query constraints. Lexicon search performance may be impacted by security query contraints.  If performed with admin credentials, Lexicon searches will not be impacted by the security query constraints.  


      Query time grows proportionately with the number of matches from a given search across a set of documents (not the actual number of documents in your database). The presence of security constraints will contribute a significantly larger number of matches than if the same lexicon search was performed with admin credentials.  In order to minimize the number of matches (and therefore query time) for a given lexicon search, you'll want to amp your lexicon searches to an admin user.


      If, as recommended in the Optic security advisory, you are not able to upgrade straight away, the following steps can be followed to disable the Optic query functionality. 

      Note: This will disable the ability to run all Optic and SPARQL queries so this can only be done if applications do not rely on those features.


      The Optic and SPARQL query engines can be disabled via a script, or via the administration user interface.

      In both cases the sem:sparql privilege will be removed from all the relevant roles.

      Scripted privilege removal

      Run the script listed below to remove the sem:sparql privilege from all roles. The script removes the sem:sparql privilege from the four out-of-the-box roles, then prompts the user to remove the privilege from the custom roles, if any are found. Please make sure to take good note of the affected roles If you intend to re-enable the privilege after upgrading your deployment.

      xquery version "1.0-ml";
      import module namespace sec="" at 
      let $ootb-sem-sparql-roles:=  ("optic-reader-internal",
      let $remove-privilege:=""
      return xdmp:invoke-function(function() {
          let $sem-sparql-priv:=sec:get-privilege($remove-privilege,"execute")  
          let $_ := if (fn:count($sem-sparql-priv) eq 0 ) then fn:error(xs:QName("PRIV-NOT-FOUND"),"sem-sparql privilege not found. Contact MarkLogic Support.") else ()
          let $_ := if (fn:count($sem-sparql-priv) gt 1 ) then fn:error(xs:QName("MULTIPLE-PRIVS-FOUND"),"Multiple sem-sparql privileges found. Contact MarkLogic Support.") else ()
          let $role-ids-having-sem-sparql:=$sem-sparql-priv/sec:role-ids/sec:role-id/xs:unsignedLong(.)
          let $role-names:=sec:get-role-names($role-ids-having-sem-sparql)/xs:string(.)
          let $ootb-roles-having-sem-sparql:=$role-names[. = $ootb-sem-sparql-roles]
          let $custom-roles-having-sem-sparql:=$role-names[fn:not(. = $ootb-sem-sparql-roles)]
          let $_ := if (fn:count($ootb-roles-having-sem-sparql) gt 0) then
                       xdmp:invoke-function(function() {
                    else ()
          return  if (fn:count($role-names) eq 0) then
                       "No roles have the sem:sparql privilege." 
                       ("Removed sem:sparql from the following MarkLogic Server out-of-the-box roles:",
                        if (fn:count($ootb-roles-having-sem-sparql) eq 0) then "No OOTB roles have sem:sparql" else $ootb-roles-having-sem-sparql,
                        "The following non OOTB roles have sem:sparql which should be removed manually:",
                        if (fn:count($custom-roles-having-sem-sparql) eq 0) then "No custom roles present having sem:sparql" else $custom-roles-having-sem-sparql)

      Manual privilege removal

      Alternatively, the sem:sparql privilege can be removed manually via the Admin UI. From the side menu, select Security > Execute Privileges. Scroll to the sem:sparql privilege, click on it and then uncheck any roles that are selected and click "OK". Please make sure to take good note of the affected roles If you intend to re-enable the privilege after upgrading your deployment.

      For MarkLogic Server v6.0, the absolute maximum number of MarkLogic Servers in a Cluster is 256, but the optimum is around 64.


      MarkLogic recommends the default "ordered" option for Linux ext3 and ext4 file-systems.

      File System administrators in Linux are tempted to use the data=writeback option to achieve higher throughput from their file-system, but this comes with the side-effects of potential data corruption and data-secuity breach. This article explains both file system options with respect to MarkLogic Server. 


      Linux ext3 and ext4 file system has default data option of "ordered", which writes to the main file system before committing to the journal.

      Both of these file-system goes the extra mile to protect your files and writes data associated with that meta data by default with data=ordered, thus assuring file-system integrity to application layer - essential for MarkLogic Server data integrity. 


      Other journaled file systems like XFS and JFS write meta data to the disk;  to make ext3 and ext4 behave like XFS and other journal file system, an administrator could set 'data=writeback' in their mount options.

      The 'data=writeback' mode does not preserve data ordering when writing to the disk, so commits to the journal may happen before the data is written to the file system. This method is faster because only the meta data is journaled, but is not good at protecting data integrity in the face of a system failure.

      If there is a crash between the time when metadata is commited to the journal and when data is written to disk, the post-recovery metadata can point to incomplete, partially written or incorrect data on disk; which can lead to corrupt data files. Additionally, data which was supposed to be overwritten in the filesystem could be exposed to users - resulting in a security risk.

      Linus Torvalds comments on 'data=writeback'

      "it makes things much smoother, since now the actual data is no longer in the critical path for any journal writes, but anybody who thinks that's a solution is just incompetent.  We might as well go back to ext2 then. If your data gets written out long after the metadata hit the disk, you are going to hit all kinds of bad issues if the machine ever goes down."   -



      Running MarkLogic Server in AWS has challenges that you may not experience in traditional IT data centers and the Managed Cluster feature helps mitigate those challenges with support for reliability, scalability and high availability, as well as with some tools that automatically handle some of the more problematic issues. Managed Cluster Feature works with AWS features to automatically create and provision the necessary AWS resources and provide MarkLogic with the information needed to manage cluster. More information here.

      Can I use customized AMIs/CFTs with the Managed Cluster feature?

      MarkLogic provides prebuilt Amazon Machine Images (AMIs) and ready for deployment Cloud Formation Templates (CFT). Cloud Formation templates, which are built on the AMIs, can be used to provision a MarkLogic Managed Cluster and this is the best (and easiest) way to provision MarkLogic Managed Clusters.

      Please note that Managed Cluster Feature is packaged and tested with the standard AMI that we publish on the Amazon Marketplace and is designed to work with the CloudFormation Template that we offer.

      Any customization to either (AMI or CFT) of them may rely on specific account dependency to function properly and those customization could lead to issues in Customer environment. MarkLogic supports tested and published AMI and CFT usage for Managed Cluster; However, solving issue originating from customization in AWS may require consulting engagement.

      Can the UserData section within the MarkLogic offered CloudFormation Template be modified to work with the Managed Cluster feature?

      One such area where customization is strictly NOT recommended is the UserData section of our CloudFormation Template. The UserData property is populated with the data assigned to the variables described in the AWS Configuration Variables section of our documentation and any modification to UserData could impact the way the Managed Cluster feature should work and could in turn cause unforeseen issues. Therefore, customers who wish to take advantage of our Managed Cluster feature must make sure not to modify/customize the UserData section of our CFT in any way.

      What if I want to use custom AMIs/CFTs?

      If you wish to use custom templates/custom AMIs, we suggest managing MarkLogic externally just like you would do on-prem (you can use our Management API for that) by disabling MarkLogic Managed Cluster feature. You can do that by creating the /etc/marklogic.conf file and adding the following two lines in it before starting MarkLogic:


      More information on this here.



      Here we discuss management of temporal documents.


      In MarkLogic, a temporal document is managed as a series of versioned documents in a protected collection. The ‘original’ document inserted into the database is kept and never changes. Updates to the document are inserted as new documents with different valid and system times. A delete of the document is also inserted as a new document.

      In this way, a temporal document always retains knowledge of when the information was known in the real world and when it was recorded in the database.


      By default the normal xdmp:* document functions (e.g., xdmp:document-insert) are not permitted on temporal documents.

      The temporal module (temporal:* functions; see Temporal API) contains the functions used to insert, delete, and manage temporal documents.

      All temporal updates and deletes create new documents and in normal operations this is exactly what will be desired.

      See also the documentation: Managing Temporal Documents.

      Updates and deletes outside the temporal functions

      Note: normal use of the temporal feature will not require this sort of operation.

      The function temporal:collection-set-options can be used with the updates-admin-override option to specify that users with the admin role can change or delete temporal documents using non-temporal functions, such as xdmp:document-insert and xdmp:document-delete.

      For example, if you need to do a corb or other administrative transform, but do not want to update the system dates on the documents; say, you want to change the values M/F to Male/Female.



      This article outlines different manual procedures to failback after a failover event

      What is failover?

      Failover in MarkLogic Server provides high availability for data nodes in the event of a d-node or forest-level failure. With failover enabled and configured, a host can go offline or unresponsive and a MarkLogic Server cluster automatically and gracefully recovers from the outage, continuing to process queries without any immediate action needed by an administrator.

      MarkLogic offers support for two varieties of failover at the forest level, both of which provide a high-availability solution for data nodes.

      • Local-disk failover: Allows you to specify a forest on another host to serve as a replica forest which will take over in the event of the forest's host going offline. Multiple copies of the forest are kept on different nodes/filesystems in local-disk failover
      • Shared-disk failover: Allows you to specify alternate nodes within a cluster to host forests in the event of a forest's primary host going offline. A single copy of the forest is kept in shared-disk failover

      More information can be found at:

      How does failover work?

      The mechanism for how MarkLogic Server automatically fails over is described in our documentation at: How Failover Works

      When does failover occur?

      Scenarios that trigger a forest to failover are discussed in detail at:

      High level overview of failing back after a failover event

      If failover is configured, other hosts in the cluster automatically assume control of the forests (or replicas of the forests) of the failed host. However, when the failed host comes back up, the transfer of control back to their original host does not happen automatically. Manual intervention is required to failback. If you have a failed over forest and want to fail back, you'll need to:

      • Restart either the forest or the current host of that forest, if using shared-disk failover
      • Restart the acting data forest or restart the host of that forest, if using local-disk failover. You should only do this if the original primary forest is in the sync replicating state, which indicates that it is up-to-date and ready to take over. Updates written to an acting primary forest must be synchronized to acting replicas, else those updates will be lost after failing back. After restarting the acting data forest, the intended primary data forest will automatically open on the intended primary host.

      Make sure the primary host is safely back online before attempting to fail back the forest.

      You can read more about this procedure at: Reverting a Failed Over Forest Back to the Primary Host

      Local disk failover procedure for attaching replicas directly to the database and clearing the intended primary forests error states

      If your primary data forests are in an error state, you'll need to clear those errors before failing back. This will usually require unmounting the primary forest copy, then directly mounting the local disk failover forest copy (or "LDF") to the relevant database. That procedure looks like:

      1. Make sure to turn OFF the rebalancer/reindexer at the database level - you don't want to unintentionally move data across forests when manually altering your database's forest topology.
      2. Break forest level replication between forests (i.e. - between the intended LDF replica (aka "acting primary") and intended primary forest currently in an error state)
      3. Detach the intended primary forest from database
      4. Attach the intended LDF replica (aka acting primary) forest directly to the database
      5. Make sure the database is online
      6. Delete the intended primary forest in error state
      7. Create a new forest with the same name as the now deleted intended primary forest
      8. Re-establish forest-level replication between the intended LDF replica (aka acting primary) forest and the newly created intended primary forest
      9. Let bulk replication repopulate the intended primary forest
      10. After bulk replication is finished, fail back as described above, so the intended primary forest is once again the acting primary forest, and the intended LDF replica is once again the acting LDF replica forest

      What is the procedure for failing forests back to the primary host in cases where the replicas are directly attached to the database?

      If intended LDF replicas are instead directly attached to the relevant database, forest or host restarts will not fail back correctly. Instead, you must rename the relevant forests:

      1. Forests that are currently attached to the database can be renamed - from their LDF replica naming scheme, to the desired primary forest naming scheme.
      2. Conversely, unattached primary forests can be renamed as LDF replicas, then configured as LDF replicas for the relevant database
      3. At this point, the server should detect that the current primary (which was previously the LDF replica) will have more recent data than the current LDF replica (which was previously the primary), which should then cause the server to populate the current LDF replica from the current primary

      What should be done in case of a disk failure?

      In the unlikely event a logical volume is lost, you'll want to restore from a copy of your data. That copy can take the form of:

      1. Local disk failover (LDF) replicas within the same cluster (assuming those copies are fully synchronized)
      2. Database Replication copies in your replication cluster (again, assuming those copies are fully synchronized)
      3. Backups, which might be missing updates made since the backup was taken

      You can restore from backups if you can afford to lose updates subsequent to that backup's timestamp and/or can re-apply whatever updates happened after the backup was taken.

      If you would instead prefer not to lose updates, then use LDF replicas to sync back to replacement primary forests created on new volumes, failing back manually when done. In the event that data was moved across forests in some way after the backup was taken, it would be best to use LDF replicas instead, which avoids the possibility database corruption in the form of duplicate URIs.

      Database Replication will allow you to maintain copies of forests on databases in multiple MarkLogic Server clusters. Once the replica database in the replica cluster is fully synchronized with its primary database, you may break replication between the two and then go on to use the replica cluster/database as the primary. Note: To enable Database Replication, a license key that includes Database Replication is required. You'll also need to ensure that all hosts are:

      1. Running the same maintenance release of MarkLogic Server
      2. Using the same Operating System
      3. Have Database Replication correctly configured


      • It's possible to have multiple copies of your data in a MarkLogic Server deployment
      • Under normal operations, these copies are synchronized with one another
      • Should failover events occur in a cluster, or catastrophic events occur to an entire cluster, you can shift traffic to the available previously synchronized copies
      • Failing back is a manual operation
        • Make sure to re-synchronize copies that were offline with online copies
        • Shifting previously offline copies to acting primary before re-synchronization may result in data loss, as offline forests can overwrite updates previously committed to LDF forests serving as acting primaries while the intended primary forests were offline

      Related materials:


      When CPF is installed, a number of new documents are created for the nominated Triggers database associated with that database.

      This Knowledgebase article is designed to show you what CPF creates on install, in the event that you want to safely disable and remove it from your system.

      Getting started

      Below is a layout of all databases and their associated document counts with a clean install of MarkLogic 9.0-2:

      Database IDDatabase NameDocument Count
      8723423541597683063 App-Services 14
      12316032390759111212 Modules 0
      1695527226691932315 Fab 0
      11723073009075196192 Security 1526
      15818912922008798974 Triggers 0
      5212638700134402198 Documents 0
      4320540002505594119 Extensions 0
      9023394855382775954 Last-Login 0
      11598847197347642387 Schemas 0
      12603105430027950215 Meters 48

      Adding CPF

      After installing CPF on the Documents database (with conversion enabled), we now see:

      Database IDDatabase NameDocument Count
      8723423541597683063 App-Services 15
      12316032390759111212 Modules 0
      1695527226691932315 Fab 0
      11723073009075196192 Security 1526
      15818912922008798974 Triggers 39
      5212638700134402198 Documents 0
      4320540002505594119 Extensions 0
      9023394855382775954 Last-Login 0
      11598847197347642387 Schemas 0
      12603105430027950215 Meters 498

      If we ignore Meters and App-Services, we can see that by default, A CPF install will create a number of documents in the Triggers database:


      Files created by CPF

      One of these files is the CPF configuration.xml file

      One of these documents describes the default domain which is created when CPF is installed:

      Default Documents

      Of the 39 files created, we can see from the URI listing above that the majority (28) of these are prefaced with These files describe each of the standard conversion pipelines that ship with the server. These are:

      Alerting (spawn)
      Calais Entity Enrichment Sample
      Conversion Processing
      Conversion Processing (Basic)
      Data Harmony Enrichment Sample
      DocBook Conversion
      Document Filtering (Properties)
      Document Filtering (XHTML)
      Entity Enrichment
      Flexible Replication
      HTML Conversion
      Janya Entity Enrichment Sample
      MS Office Conversion
      Office OpenXML Extract
      PDF Conversion
      PDF Conversion (Image Batching)
      PDF Conversion (Page Layout with Reblocking)
      PDF Conversion (Page Layout, Image Batching)
      PDF Conversion (Page Layout)
      PDF Conversion (Paged Text, No Rendering)
      Schema Validation
      SRA NetOwl Entity Enrichment Sample
      Status Change Handling
      Temis Entity Enrichment Sample
      WordprocessingML Process
      XHTML Conversion Processing
      XInclude Processing

      Seven of the files are triggers - all of which are namespaced with the cpf prefix:

      cpf:any-property Default Documents
      cpf:create Default Documents
      cpf:delete Default Documents
      cpf:state Default Documents
      cpf:status Default Documents
      cpf:update Default Documents

      Removing the core files created when CPF was initially installed will disable it from further functioning in your environment.

      Scripting the removal of default CPF components

      This GitHub gist demonstrates a method for removing CPF configuration from a given database - in the example below, the "Triggers" database is specfied:


      If you have an existing MarkLogic Server cluster running on EC2, there may be circumstances where you need to upgrade the existing AMI with the latest MarkLogic rpm available. You can also add a custom OS configuration.

      This article assumes that you have started your cluster using the CloudFormation templates with Managed Cluster feature provided by MarkLogic.

      To upgrade manually the MarkLogic AMI, follow these steps:

      1. Launch a new small MarkLogic instance from the AWS MarketPlace, based on the latest available image. For example, t2.small based on MarkLogic Developer 9 (BYOL). The instance should be launched only with the root OS EBS volume.
      Note: If you are planning to leverage the PAYG-PayAsYouGo model, you must choose MarkLogic Essential Enterprise.
      a. Launch a MarkLogic instance from AWS MarketPlace, click Select and then click Continue:

      b. Choose instance type. For example, one of the smallest available, t2.small
      c. Configure instance details. For example, default VPC with a public IP for easy access
      d. Remove the second EBS data volume (/dev/sdf)
      e. Optional - Add Tags
      f. Configure Security Group - only SSH access is needed for the upgrade procedure
      g. Review and Launch
      Review step - AWS view:

      2. SSH into your new instance and switch the user to root in order to execute the commands in the following steps.

      $ sudo su -

      Note: As an option, you can also use "sudo ..." for each individual command.

      3. Stop MarkLogic and uninstall MarkLogic rpm:

      $ service MarkLogic stop
      $ rpm -e MarkLogic

      4. Update-patch the OS:

      $ yum -y update

      Note: If needed, restart the instance (For example: after a kernel upgrade/core-libraries).
      Note: If you would like to add more custom options/configuration/..., they should be done between steps 4 and 5.

      5. Install the new MarkLogic rpm
      a. Upload ML's rpm to the instance. (For example, via "scp" or S3)
      b. Install the rpm:

      $ yum install [<path_to_MarkLogic_RPM>]/[MarkLogic_RPM]

      Note: Do not start MarkLogic at any point of AMI's preparation.

      6. Double check to be sure that the following files and log traces do not exist. If they do, they must be deleted.

      $ rm -f /var/local/mlcmd.conf
      $ rm -f /var/tmp/mlcmd.trace
      $ rm -f /tmp/

      7. Remove artifacts
      Note: Performing the following actions will remove the ability to ssh back into the baseline image. New credentials are applied to the AMI when launched as an instance. If you need to add/change something, mount the root drive to another instance to make changes.

      $ rm -f /root/.ssh/authorized_keys
      $ rm -f /home/ec2user/.ssh/authorized_keys
      $ rm -f /home/ec2-user/.bash_history
      $ rm -rf /var/spool/mail/*
      $ rm -rf /tmp/userdata*
      $ rm -f [<path_to_MarkLogic_RPM>]/[MarkLogic_RPM]
      $ rm -f /root/.bash_history
      $ rm -rf /var/log/*
      $ sync

      8. Optional - Create an AMI from the stopped instance.[1] The AMI can be created at the end of step 7.

      $ init 0

      [1] For more information:

      At this point, your custom AMI should be ready and it can be used for your deployments. If you are using multiple AWS regions, you will have to copy the AMI as needed.
      Note: If you'd like to add more custom options/configuration/..., they should be done between steps 4 and 5.

      Additional references:
      [2] Upgrading the MarkLogic AMI -


      Starting in MarkLogic Server version 10.0-7, XQuery FLWOR expressions that only use "let" will now stream results. Prior to 10.0-7, MarkLogic Server would have buffered results in memory. This change allows large result sets to be more easily streamed from XQuery modules.


      Due to this change, code that relies on the previous behavior of buffered results from FLWOR expression with only a "let" may experience degraded performance if the results are iterated over multiple times. This is due to the fact that once a streaming result has been exhausted, the query has to be rerun in order to iterate over it again.

      Best Practice

      Regardless of this change, the best practice is:

      • to treat all query calls as lazily-evaluated expressions and only iterate over them once.
      • If the results need to be iterated multiple times, wrap the search expression in xdmp:eager() or iterate over the results once and assign the results to a new variable.


      In 10.0-7 and prior versions, the following expression would be lazily-evaluated and run the search multiple times if the $results variable is iterated over multiple times.

      let $_ := xdmp:log("running search")
      let $results := cts:search(fn:collection(), cts:word-query("MarkLogic"))

      This behavior has not changed in 10.0-7. However, prior to 10.0-7, the following expression would short-circuit the lazy evaluation and buffer all of the results in memory

      let $results :=
          let $_ := xdmp:log("running search")
          return cts:search(fn:collection(), cts:word-query("MarkLogic"))

      In 10.0-7, this is now consistent with the other form of the expression above and returns an iterator. The search will be run multiple times if the $results variable is iterated over multiple times.

      To achieve the buffering behavior in 10.0-7 or later releases, you can wrap cts:search() inside xdmp:eager() as follows

      let $results :=
          let $_ := xdmp:log("running search")
          return xdmp:eager(cts:search(fn:collection(), cts:word-query("MarkLogic")))


      The xdmp:streamable function was added in MarkLogic 10.0-7 in order to help determine if a variable will stream or not, 

      Additional References

      For more information about lazy evaluation in MarkLogic, see the following resources


      Tuesday, February 1, 2022 : Released Pega Connector 1.0.1 which uses MLCP 10.0-8.2 with forced dependencies to log4j 2.17.1.

      Tuesday, January 25, 2022 : MarkLogic Server versions 10.0-8.3 (CentOS 7.8 and 8) is now available on the Azure marketplace. 

      Monday, January 17, 2022 : MarkLogic Server 10.0-8.3 is now available on AWS marketplace;

      Monday, January 10, 2022 : MarkLogic Server 10.0-8.3 released with Log4j 2.17.1. (ref: CVE-2021-44832 ).

      Friday, January 7, 2022 : Fixed incorrect reference to log4j version included with MarkLogic 10.0-8.2 & 9.0-13.7.

      Wednesday, January 05, 2022 : Updated workaround to reference Log4j 2.17.1. (ref: CVE-2021-44832 ).

      Tuesday, December 28, 2021 : Add explicit note that for MarkLogic Server installations not on AWS, it is safe to remove the log4j files in the mlcmd/lib directory. 

      Saturday, December 25, 2021: MLCP update to resolve CVE-2019-17571 is now available for download;

      Friday, December 24, 2021: AWS & Azure Marketplace update;

      Wednesday, December 22, 2021: additional detail regarding SumoCollector files; AWS & Azure Marketplace update; & MLCP note regarding CVE-2019-17571.

      Monday, December 20, 2021: Updated workaround to reference Log4j 2.17.0.  (ref: CVE-2021-45105 )

      Friday, December 17, 2021: Updated for the availability of MarkLogic Server versions 10.0-8.2 and 9.0-13.7;

      Wednesday, December 15, 2021: Updated to include SumoLogic Controller reference for MarkLogic 10.0-6 through 10.0-7.3 on AWS;

      Tuesday, December 14, 2021: This article had been updated to account for the new guidance and remediation steps in CVE-2021-45046;

      "It was found that the fix to address CVE-2021-44228 in Apache Log4j 2.15.0 was incomplete in certain non-default configurations. This could allows attackers with control over Thread Context Map (MDC) input data when the logging configuration uses a non-default Pattern Layout with either a Context Lookup or a Thread Context Map pattern to craft malicious input data using a JNDI Lookup pattern resulting in a denial of service (DOS) attack. ..."

      Monday, December 13, 2021: Original article published.


      Important MarkLogic Security update on Log4j Remote Code Execution Vulnerability (CVE-2021-44228)


      A flaw in Log4j, a Java library for logging error messages in applications, is the most high-profile security vulnerability on the internet right now and comes with a severity score of 10 out of 10. At MarkLogic, we take security very seriously and have been proactive in responding to all kinds of security threats. Recently a serious security vulnerability in the Java-based logging package Log4j  was discovered. Log4j is developed by the Apache Foundation and is widely used by both enterprise apps and cloud services. The bug, now tracked as CVE-2021-44228 and dubbed Log4Shell or LogJam, is an unauthenticated RCE ( Remote Code Execution ) vulnerability allowing complete system takeover on systems with Log4j 2.0-beta9 up to 2.14.1. 

      As part of mitigation measures, Apache originally released Log4j 2.15.0 to address the maximum severity CVE-2021-44228 RCE vulnerability.  However, that solution was found to be incomplete (CVE-2021-45046) and Apache has since released Log4j 2.16.0. This vulnerability can be mitigated in prior releases (<2.16.0) by removing the JndiLookup class from the classpath.  Components/Products using the log4j library are advised to upgrade to the latest release ASAP seeing that attackers are already searching for exploitable targets.

      MarkLogic Server

      MarkLogic Server version 10.0-8.3 now includes Log4j 2.17.1. (ref: CVE-2021-44832 ).

      MarkLogic Server versions 10.0-8.2 & 9.0-13.7 includes log4j 2.16.0, replacing all previously included log4j modules affected by this vulnerability. 

      MarkLogic Server versions 10.0-8.3 & 9.0-13.7 are available for download from our developer site at

      MarkLogic Server versions 10.0-8.3 & 9.0-13.7 are  available on the AWS Marketplace.  

      MarkLogic Server versions 10.0-8.3 (CentOS 7.8 and 8) & 9.0-13.7 (CentOS 8) VMs are available in the Azure marketplace. 

      MarkLogic Server does not use log4j2 within the core server product. 

      However, CVE-2021-44228 has been determined to impact the Managed Cluster System (MLCMD) in AWS

      Note: log4j is included in the MarkLogic Server installation, but it is only used by MLCMD on AWS. For MarkLogic Server installations not on AWS, you can simply remove the log4j files in the mlcmd/lib directory (sudo rm /opt/MarkLogic/mlcmd/lib/log4j*).

      AWS Customers can use the following workaround to mitigate exposure to the CVE.

      Impacted versions

      The versions that are affected by the Log4Shell vulnerability are

      • 10.0-6.3 through 10.0-8.1 on AWS 
      • 9.0-13.4 through 9.0-13.6 on AWS 

      Earlier versions of MLCMD use a log4j version that is not affected by this vulnerability.

      How to check log4j version used by MarkLogic Managed Cluster System in AWS 

      1. Access the instance/VM via SSH.
      2. Run the following command ls /opt/MarkLogic/mlcmd/lib/ | grep "log4j"

      If the log4j jar files returned are between 2.0-beta9 and up to 2.14.1 then the system contains this vulnerability.

      An example response from a system containing the CVE:


      In the above case, the log4j dependencies are running version 2.14.1 which is affected.


      The following workaround can be executed on a running MarkLogic service, without stopping it.


      1.  ssh into your EC2 instance, you must have sudo access in order to make the changes necessary for the fix.

      2.  Download and extract the Log4j 2.17.1 dependency from apache. 

      curl --output log4j.tar.gz && tar -xf log4j.tar.gz

      • If your EC2 instance does not have outbound external internet access, download the dependency onto a machine that does, and then scp the file over to the relevant ec2 instance via a bastion host.

      3. Move the relevant log4j dependencies to the /opt/MarkLogic/mlcmd/lib/ folder IE:

      sudo mv ./apache-log4j-2.17.1-bin/log4j-core-2.17.1.jar /opt/MarkLogic/mlcmd/lib/
      sudo mv ./apache-log4j-2.17.1-bin/log4j-api-2.17.1.jar /opt/MarkLogic/mlcmd/lib/
      sudo mv ./apache-log4j-2.17.1-bin/log4j-1.2-api-2.17.1.jar /opt/MarkLogic/mlcmd/lib/

      4. Remove the old log4j dependencies

      sudo rm /opt/MarkLogic/mlcmd/lib/log4j-core-2.14.1.jar
      sudo rm /opt/MarkLogic/mlcmd/lib/log4j-1.2-api-2.14.1.jar
      sudo rm /opt/MarkLogic/mlcmd/lib/log4j-api-2.14.1.jar

      SumoLogic Collector

      AMIs for MarkLogic versions 10.0-6 through 10.0-7.3 were shipped with the SumoCollector libraries.  These libraries are not needed nor are they executed by MarkLogic Server. Starting with MarkLogic version 10.0-8, the SumoCollector libraries are no longer shipped with the MarkLogic AMIs.

      It is safe to remove those libraries from all the instances that you have launched using any of the MarkLogic AMIs available in Market place. You can remove the SumoCollector directory and all it's files under /opt.

      Additionally, if you have created any clusters using the Cloud Formation templates (managed cluster feature), we would suggest that you delete the SumoCollector directory under /opt if exists.  Once MarkLogic releases new AMIs, you can update the stack with new AMI ID and perform a rolling restart of nodes so that the permanent fix would be in place.

      Other Platforms

      For the impacted MarkLogic versions listed above running on platforms besides AWS, the log4j jars are included in the MarkLogic installation folder but are never used.  The steps listed in the workaround above can still be applied to these systems even though the systems themselves are not impacted.

      MarkLogic Java Client

      The MarkLogic Java Client API has neither a direct nor indirect dependency on log4j. The MarkLogic Java Client API  does use the industry-standard SLF4J abstract interface for logging. Any conformant logging library can provide the concrete implementation of the SLF4J interface. By default, MarkLogic uses the logback implementation of the SLF4J interface. The logback library doesn't have the vulnerability that exists in the log4j library. Customers who have chosen to override logback with log4j may have the vulnerability.  Such customers should either revert to the default logback library or follow the guidance provided by log4j to address the vulnerability:

      MarkLogic Data Hub & Hub Central

      The MarkLogic Data Hub & Hub Central are not affected directly by log4j vulnerability, Datahub and Hub Central used Spring boot and spring has an option to switch default logging to use log4j, which Data Hub does not.
      The log4j-to-slf4j and log4j-api jars that we include in spring-boot-starter-logging cannot be exploited on their own. By default, MarkLogic Data Hub uses the logback implementation of the SLF4J interface. 
      The logback library doesn't have the vulnerability that exists in the log4j library.  Please refer: 

      MarkLogic Data Hub Service

      For MarkLogic Data Hub Service customers, no action is needed at this time. All systems have been thoroughly scanned and patched with the recommended fixes wherever needed. 

      MarkLogic Content Pump (MLCP)

      MarkLogic Content Pump 10.0-8.2 & 9.0-13.7 are now available for download from and GitHub. This release resolves the the CVE-2019-17571 vulnerability.

      MLCP versions 10.0-1 through 10.0-8.2 and versions prior to 9.0-13.6 used an older version of log4j-1.2.17 that is not affected by the primary vulnerability discussed in this article (CVE-2021-44228), but mlcp versions prior to 10.0-8.2 are affected by the critical vulnerability CVE-2019-17571.

      MLCP v10.0-8.2 & MLCP v9.0-13.7 specific workaround for CVE-2021-44832

      The following workaround can be executed on a host with mlcp

      1.  Download and extract the Log4j 2.17.1 dependency from apache. 

      curl --output log4j.tar.gz && tar -xf log4j.tar.gz

      2. Move the relevant log4j dependencies to the $MLCP_PATH/lib/ folder IE:

      sudo mv ./apache-log4j-2.17.1-bin/log4j-core-2.17.1.jar $MLCP_PATH/lib/
      sudo mv ./apache-log4j-2.17.1-bin/log4j-api-2.17.1.jar $MLCP_PATH/lib/
      sudo mv ./apache-log4j-2.17.1-bin/log4j-1.2-api-2.17.1.jar $MLCP_PATH/lib/
      sudo mv ./apache-log4j-2.17.1-bin/log4j-jcl-2.17.1.jar $MLCP_PATH/lib/
      sudo mv ./apache-log4j-2.17.1-bin/log4j-slf4j-impl-2.17.1.jar $MLCP_PATH/lib/

      2. Remove the old log4j dependencies

      sudo rm $MLCP_PATH/lib/log4j-core-2.17.0.jar
      sudo rm $MLCP_PATH/lib/log4j-1.2-api-2.17.0.jar
      sudo rm $MLCP_PATH/lib/log4j-api-2.17.0.jar
      sudo rm $MLCP_PATH/lib/log4j-jcl-2.17.0.jar
      sudo rm $MLCP_PATH/lib/log4j-slf4j-impl-2.17.0.jar

      Pega Connector

      The 1.0.0 Pega connector installer briefly runs MLCP 10.0-6.2 via gradle as part of the setup. MLCP 10.0-6.2 uses the old 1.2 log4j jar. The actual connector does not use log4j at runtime.  We have released Pega Connector 1.0.1 which uses MLCP 10.0-8.2 with forced dependencies to log4j 2.17.1.

      MarkLogic-supported client libraries, tools

      All other MarkLogic-supported client libraries, tools, and products are not affected by this security vulnerability.  

      Verified Not Affected

      The following MarkLogic Projects, Libraries and Tools have been verified by the MarkLogic Engineering team as not being affected by this vulnerability

      • Apache Spark Connector
      • AWS Glue Connector
      • Corb-2
      • Data Hub Central Community Edition
      • Data Hub QuickStart
      • Jena Client - Distro not affected, but some tests contain log4j;
      • Kafka Connector
      • MLCP - uses an older version of log4j that is not affected CVE-2021-44228), but it is affected by CVE-2019-17571.  See notes above. 
      • ml-gradle
      • MuleSoft Connector - The MarkLogic Connector does not depend on log4j2, but it does leverage the MarkLogic Java Client API (see earlier comments); 
      • Nifi Connector
      • XCC

      MarkLogic Open Source and Community-owned projects

      If you are using one of the MarkLogic open-source projects which have a direct or transient dependency on Log4j 2 up to version 2.14.1 please either upgrade the Log4j to version 2.16.0 or implement the workaround in prior releases (<2.16.0) by removing the JndiLookup class from the classpath.  Please refer:

      MarkLogic is dedicated to supporting our customers, partners, and developer community to ensure their safety. If you have a registered support account, feel free to contact with any additional questions.

      More information about the log4j vulnerability can be found at or 


      A powerful new feature was added to MarkLogic 8 - the ability to build applications around a declarative HTTP rewriter. You can read more about MarkLogic Server's HTTP rewriter and some of the new features it provides in our documentation.

      This article will cover some basic tips for debugging applications that make use of this feature.

      Validating your rewriter rules (Using XML Schema)

      The rewriter adheres to an XML Schema. At runtime the rewriter is not validated against this schema; this is by design so that potentially minor errors don't risk taking your application offline. As a best practice, we recommend validating your rewriters manually every time you make a change. In order to do this, you can use MarkLogic Server or any other tool that supports XML validation (the schema is standard XSD 1.0).  If you want to view the schema, it's copied to Config/rewriter.xsd when you install the product.

      In order to validate from within MarkLogic using XQuery you can simply execute:

      validate { fn:doc("/path/to/your/rewriter.xml") }

      The above will validate the XML if your rewriter rules are stored in a database. If you're using the filesystem, you can use xdmp:document-get instead.

      Alternatively, you can copy / paste the XML body into Query Console and wrap it with a call to validate as below:

      validate { * Paste your rewriter rules here * }

      The above approach should work without any issue as long as there is no content in your rewriter XML that contains any XQuery reserved syntax.

      General rewritter debugging and tracing

      For a simple "print" style debugging you can manually add trace statements at any point an eval rule is allowed. Like this:

      <trace event="customevent">data</trace>

      Then enable diagnostics (in your group settings) and add "customevent"; your custom trace will now show up in ErrorLog.txt whenever that endpoint is accessed. To read more on the use of trace events in your applications, refer to this Knowledgebase article

      There is error code handling:

      <error code="MYAPP-EXCEPTION" data1="value1" data2="... 

      You can also add ids - these will be traced out - which may aid debugging

      <match id="match-id-for-myregex" regex=".* ...

      Useful diagnostic trace events

      Note that additional trace events can generate a lot of data and may slow your application down, so make sure these do not get left on in a production-critical environment

      Below are some trace events you can use and a brief description of what each trace event does:

      Rewriter Parser Details of the parsing of the rewriter XML file
      Rewriter Evaluator Execution traces of rules as evaluated
      Rewriter Evaluator Verbose Additional (more verbose) tracing
      Declarative Rewriter Entry points into and out of the rewriter from the app server request handler
      Rewriter Print Rules After parsing and validation of the rewriter – a full dump of the internal data structures that resulted.

      Additional points to note

      Use of the "Evaluator" traces will write to the ErrorLog.txt on every request.

      The "Parser" trace event will only occur once or upon updating your rewriter.


      Prior to the 9.0-9 release, MarkLogic currently provides support for the Oracle JDK 8.  However, Oracle have recently announced End of Public Updates of Java SE 8

      What can we expect from MarkLogic?

      MarkLogic will support OpenJDK 9, OpenJDK 10 and OpenJDK 11 starting with MarkLogic Server 9.0-9 and associated products.

      These products/implementations include:

      From the 9.0-9 release onwards, we will no longer QA test our products with Oracle JDK.

      We will support Amazon Corretto JDK as part of our Amazon offerings.  Corretto meets the Java SE standard and certified compliant by AWS using the Java Technical Compatibility Kit.

      The latest version of MarkLogic Server is available to download from:

      JDK Requirements for Data Hub Framework (DHF) Users

      Requirements are discussed in further detail in the DHF documentation, however it's important to note that versions of DHF prior to the 5.2 release require Java 8.

      JDK Requirements for MarkLogic on AWS

      The mlcmd script supports startup operations and advanced use of the Managed Cluster features. The mlcmd script is installed as an executable script in /opt/MarkLogic/bin/mlcmd

      In order to run any mlcmd command, user must be logged into the host and running as root or with root privileges. The hosts must also have Java installed and the java command in the PATH or JAVA_HOME set to the JRE or JDK home directory. 

      If the cluster is configured using any of MarkLogic AMIs as-is or using MarkLogic AMI to build custom AMIs or cloud formation templates to create the cluster, mlcmd is required at the start up of MarkLogic server and so the JDK.


      The default configuration of MarkLogic Application Servers are not vulnerable to the FREAK SSL attack. 

      What is the FREAK SSL attack?

      Tuesday 2015/03/03 - Researchers of miTLS team (joint project between Inria and Microsoft Research) disclosed a new SSL/TLS vulnerability — the FREAK SSL attack (CVE-2015-0204). The vulnerability allows attackers to intercept HTTPS connections between vulnerable clients and servers and force them to use ‘export-grade’ cryptography, which can then be decrypted or altered.

      Read more about the FREAK SSL attack.

      Testing a webserver

      You can verify whether a webserver is attackable by the FREAK attack with this free SSL vulnerability checker.


      MarkLogic Server uses FIPS-capable OpenSSL to implement the Secure Sockets Layer (SSL v3) and Transport Layer Security (TLS v1) protocols. When you install MarkLogic Server, FIPS mode is enabled by default and SSL RSA keys are generated using secure FIPS 140-2 cryptography. This implementation disallows weak ciphers and uses only FIPS 140-2 approved cryptographic functions. Read more about OpenSSL FIPS mode in MarkLogic Server, and how to configure it.

      As long as FIPS mode was not explicitly disabled, MarkLogic Application Servers are not vulnerable to the FREAK SSL attack. 


      Eliminating the vulerability for all configurations requires an update to the OpenSSL library. MarkLogic Server continually updates the implementation version of the OpenSSL library so every MarkLogic Server maintenance release published after the discovery of this vulnerability will include the OpenSSL version that is not vulnerable to the FREAK attack.


      As long as FIPS mode is enabled, which is the default configuration, MarkLogic Application Servers are not vulnerable to the FREAK SSL attack


      Question Answer Further Reading
      What are Backup/Restore best practices? Please refer to our MarkLogic Support FAQ for more details
      Should we backup default databases? Please refer to our MarkLogic Support FAQ for more details
      Should I be backing up my local disk failover forests? Please refer to our Local Disk Failover FAQ for more details
      In terms of disaster recovery (DR) - how do I choose between backup/restore or replication?
      Please refer to our Database Replication FAQ for more details

      How many copies of data do we have if we enable failover, Backup/Restore, Database Replication?

      Your primary cluster has its data forests (1st copy) and likely local disk failover forests (2nd) for high availability. Your replica cluster likely has its own data forests (3rd) and local disk failover forests (4th) for more up-to-date disaster recovery copies. You can also take backups from either environment (now 5 copies) for a less up-to-date DR copy.

      Please analyze these and setup accordingly (You don't have to setup all of them or have multiple replica forests or backup copies) depending on your need.

      On which environment should I take a backup? Primary or Replica cluster? 

      In general, it's probably best to take a backup from the environment,  primary or replica (one of the two, unlikely to need near identical or identical backups from both), that can best accommodate the backup load.


      What does a MarkLogic Database Backup contain?

      MarkLogic database backups are by default self-contained with the following

      • The configuration files.
      • The Security database, including all of its forests.
      • The Schemas database, including all of its forests.
      • The Triggers database, including all of its forests.
      • All of the forests of the database you are backing up.


      White Paper:

      What are the important points to note before performing Backups/Restore?

      Refer to the "Notes about Backup and Restore Operations" section in our documentation.


      Will there be any interruption in running queries/updates while backup runs?

      Most of the time, when a backup is running, all queries and updates proceed as usual. MarkLogic simply copies stand data from the source directory to the backup target directory, file by file. Stands are read-only except for the small Timestamps file, so this bulk copy can proceed without needing to interrupt any requests. Only at the very end of the backup does MarkLogic have to halt incoming requests briefly to write out a fully consistent view for the backup, flushing everything from memory to disk.


      White Paper:

      What is Flash Backup?

      In flash backup mode you need to quiesce all forests in a given database for long enough to allow you to make a file level backup of the forest data.

      White Paper:

      KB Article:

      What are the advantages of using MarkLogic backup over other options/methods?

      • Our Backup and Restore APIs use a timestamp to guarantee that a backup is consistent according to a given timestamp; during the course of the time the backup takes to run, the on-disk stands being backed up will be kept until the backup has completed and it will also allow new updates to continue to take place (advancing the database forest timestamps), so it's generally recommended as the safest strategy to use if you want to be able to restore from a crash.
      • Our Backup and Restore API also force a checkpoint with the forest Journal files and any in-memory transactions just before the backup starts, meaning that all transactions up to the point at which the backup started are guaranteed to be in the backup set.
      • If you want to use other backup methods other than what MarkLogic provides, you can explore that. But you need to make sure that there are no updates happening at that time. Forests should be completely quiesced first; you wouldn't need to stop MarkLogic Server to do this, but you would need to (at the very least) ensure the forests were placed into flash-backup mode - this would allow queries to take place but would not allow any transactions to make changes while the backup task ran.

      KB Article:

      Can we restore backups across feature releases of MarkLogic? 

      Yes, you can restore from older version to newer version - but not vice versa.

      KB Articles:

      Can we restore backups across different OS platforms?

      No, MarkLogic backup files are platform specific and should only be restored onto the same platform. This is true for both database and forest backups.


      KB Article:

      What is the role of Journals in relation to Backup and Restore?

      Refer to the Knowledgebase article for details.

      How does "point-in-time" recovery work with Journal Archiving?
      Refer to the documentation and Knowledgebase article for details.
      Do the journal archive files from a backup become invalid with the next backup?

      New journal archives are started when the next full backup is done. During the period of time that the new full backup is running, we archive journals to both the old and new location until we're sure the new full backup will complete successfully.


      Do the archive files normally get deleted with a subsequent backup?

      They are typically deleted when the corresponding full backup is deleted.


      How much free space is needed for the Journal Archive files in a Backup? The size of the journal archive can be larger (for example 6x) and totally dependent on how much data  you are ingesting and how much time you have between backups.

      KB Article:

      Can you explain resource consumption during Backup/Restore? Full backup/restore operations are resource (I/O, CPU and Memory) intensive and should be scheduled during off-hours, when possible.


      Is it possible to restore to a target database with different number of forests than the source database? Yes, use the "Forest topology changed" option while restoring.


      KB Article:

      What is the recommended way to backup/restore multiple databases? Refer to our knowledgebase article for more details
      How to configure database backup rotation? You can configure the maximum number of full (does not apply for incremental) backups to keep by specifying a number to the "max backups" parameter. When you reach the specified maximum number of backups, the next backup will delete the oldest backup. Specify 0 to keep an unlimited number of backups. You can set this in Admin UI or use API's to set this value.


      What are the best practices for spacing incremental backups? Incremental backups are more resource-intensive than full backups as they need to query the data to find the changes between backup. You would need to monitor your system closely to ensure that the overhead of running so many incremental backups is not affecting your system performance or even that a subsequent backup starts before the previous has completed. Frequent incremental backups are not recommended, general recommendation is to space them at least 6 hours apart.

      KB Article:

      Can you explain the directory structure for Incremental backups?

      If an incremental backup directory is specified, after the first incremental backup is done, the full backup can be archived to another location. The subsequent incremental backups do not need to examine the full backup.

      Once you restore an incremental backup, you can no longer use the previous full backup location for ongoing incremental backups. After the restore, you need to make a fresh full backup and use the full backup location for ongoing incremental backups. This means that after restore of an incremental backup, scheduled backups need to be updated to use the fresh full backup location.


      Why do Incremental backups take more time than Full backups?

      Incremental backups would be expected to use higher CPU and RAM as they perform queries to determine what data has changed and needs to be backup, full backups simply backup up all available Forest data and are more likely to be I/O constrained. If the system is memory or CPU constrained during the time incremental backup is running, (i.e other processes or queries running), then the incremental task would take lower priority and could possibly take longer to run than a Full backup. Please also note that Incremental backups are designed to minimize storage - not time.

      Note that incremental backups could be fast when not much data has changed from the last time an incremental back up was taken, or when the system is otherwise idle. However, most of the time incremental backups are given lower priority, to consume least amount of resources, which ultimately results in longer run times.

      Why use incremental backup when using journal archiving? Is this a recommended combination?

      Incremental backups are more compact than archived journals and are faster to restore.

      Incremental backup improves both restore time and also space requirements over journal archiving, but it's not an either/or decision - you can use both where appropriate.

      Restoring from incremental backup taken on a different cluster fails. What do I need to check?

      Every incremental backup will store a reference to the location of the previous incremental backup and the very first one will store a reference to the location of the full backup. These are stored in a file by the name BackupTag.txt. The restore job fetches the backup locations from this file, and if they still point to an older location, then incremental restore will fail in this scenario.


      KB Article:

      Why MarkLogic Server backup is slower than file copy?

      Refer to our Knowledgebase article for more details

      Can you explain how Backup/Restore with encryption works?
      • If any forest in the backup has encryption enabled, then the entire backup will be encrypted.
      • As long as the current database being restored is encrypted, the restored database will also be encrypted.
      • By default the MarkLogic embedded KMS is automatically included in a backup. If you set the backup option to exclude and turn off the automatic inclusion of the keystore, you are responsible for saving keystore (the embedded KMS) to a secure location.


      How can I monitor MarkLogic Backup?
      • Check Database status page on the Admin UI
      • Use the MarkLogic API's

      KB Article:


      MarkLogic 9 introduces Certificate based User Authentication, which allows users to Log into MarkLogic Server without being required to enter user name/password. In previous versions, Certificates were only utilized to restrict client access to MarkLogic Server with the Digest/Basic User Authentication Scheme. In addition to Certificate based User Authentication using Internal user and External name verification MarkLogic 9 also permits authenticating and authorizing user certificates against an LDAP or Active Directory database to permit access based on MarkLogic Roles and LDAP Group membership. By using this method of authentication and authorization a site is able to maintain all users access externally without the need to manage a separate set of users within the MarkLogic security database.

      This document will expand on the concepts and configuration examples described in the associated "MarkLogic Certificate based User Authentication" knowledge base article and will show the additional steps required to configure MarkLogic to authorize a User certificate against an LDAP or Active Directory. It is highly recommended that you make yourself familiar with the previous article as it covers in more detail the steps required to setup the MarkLogic App Server to ensure that TLS Client Authentication is configured correctly to request and verify the certificates that may be presented by the user.

      Creating the External Security definition

      To authorize users presenting a certificate you should first create a new External Security definition selecting “Certificate” for authentication and LDAP for authorization.


      Next, configure the LDAP server entry.



      • Unlike standard user authorization when MarkLogic searches for the user certificate, MarkLogic uses a base Object search using the full certificate distinguished name rather than a sub-tree search off the “ldap base”. MarkLogic UI currently requires an entry for the “ldap base”; Even though it is not used, as such you will need to code a dummy value to satisfy UI verification.
      • When performing the LDAP search, MarkLogic will request the “ldap attribute” value to use when creating the temporary userid. Care should be taken when selecting this value to ensure that the value is unique for all possible Certificate DN’s that may be presented.
      • Ensure that the “ldap default user” has the required permissions to search for the Certificate within the LDAP or Active Directory server and return the required attributes.
      • MarkLogic uses the “memberOf” and “member” attributes to return Group and Group of Group membership, if your LDAP or Active Directory server using different attributes such as “isMemberOf” you can override them in the “memberOf” and “member” attribute fields. 

      Configuring the App Server

      Configure the App Server to use “certificate” authentication, set “Internal Security” to false and select the external security definition created above.


      Enable TLS Client Authentication and configure the SSL Client Certificate authorities that you will accept to sign the user certificates. Any certificates presented that is not signed by one of the specified CA’s will be rejected.



      For more details on configuring the CA certificates required for certificate based authentication please from to the knowledge base article "MarkLogic Certificate based User Authentication". 

      Configure MarkLogic Security Roles

      For each role specify one or more external names that match the “memberOf” attribute returned for the Certificate DN.

      To confirm that users are being authorized to the MarkLogic AppServer correctly, connect using your browser or command line tool such as “cUrl”.

      MacPro-4505:~ $ curl -k --cert ./mluser1.p12:password https://localhost:8013
      <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "">
      <html xml:lang="en" xmlns="">
      <title>Welcome to the MarkLogic Test page.</title>
      <body><p>This application is running on MarkLogic Server version 9.0-1.1</p></body>

      Within the AppServer AccessLog, you should see a mapping for a new temporary userid to the expected role.

      External User(mluser1) is Mapped to Temp User(mluser1) with Role(s): mladmin
      ::1 - mluser1 [18/Jul/2017:16:07:05 +0100] "GET / HTTP/1.1" 200 347 - "curl/7.51.0"


      If a user is not able to connect using their certificate, the first thing to check is if the Certificate Distinguished Name (DN) can be found in the LDAP or Active Directory database and if it contains the required userid and memberOf attributes.

      Using a tool such as OpenSSL determine the correct Subject Certificate DN, e.g.

      MacPro-4505:~ $ openssl x509 -in mluser1.pem -text
      Version: 3 (0x2)
      Serial Number: 1497030421 (0x593adf15)
      Signature Algorithm: sha256WithRSAEncryption
      Issuer: CN=User Signing Authority, O=MarkLogic, OU=Support
      Not Before: Jun 9 17:47:13 2017 GMT
      Not After : Jun 9 17:47:13 2018 GMT
      Subject: CN=mluser1, OU=Users, DC=MarkLogic, DC=Local
      Next using an LDAP lookup tool such as “ldapsearch” or "ldp.exe" on Microsoft Windows, perform a base Object search for the Certificate DN requesting the LDAP user and memberOf attribute (with the entries matching your LDAP External Security settings).

      If either the userid or memberOf attributes are missing access will be denied.

      MacPro-4505:~ $ ldapsearch -H ldap:// -x -D "cn=manager,dc=marklogic,dc=local" -W -s base -b "cn=mluser1,ou=Users,dc=MarkLogic,dc=Local" "memberOf" "cn"
      # extended LDIF
      # LDAPv3
      # base <cn=mluser1,ou=Users,dc=MarkLogic,dc=Local> with scope baseObject
      # filter: (objectclass=*)
      # requesting: memberOf uid
      # mluser1, Users, MarkLogic.Local
      dn: cn=mluser1,ou=Users,dc=MarkLogic,dc=Local
      uid: mluser1
      memberOf: cn=AppAdmin,ou=Groups,dc=MarkLogic,dc=Local
      # search result
      search: 2
      result: 0 Success
      If MarkLogic is able successfully to locate the certificate and return the required attributes, then check if the external names in the security role matches (case-sensitive) the “memberOf” attribute returned by the LDAP search.

      The following XQuery can be used to show all the external names assigned to a specific role. 

      (: execute this against the security database :)
      xquery version "1.0-ml";
      import module namespace sec = ""
          at "/MarkLogic/security.xqy";



      If MarkLogic is still not able to authenticate users, it is very useful to use a packet capture tool such as Wireshark to check - if MarkLogic is able to contact the LDAP or Active Directory server and is receiving the expected successful Admin bind and Search for the Certificate DN.

      The following example trace shows a successful BIND using the LDAP Default user followed by a successful search for the Certificate DN.


      Further Reading


      MarkLogic 9 introduces Certificate based User Authentication, which allows users to Log into MarkLogic Server without being required to enter user name/password. In previous versions, Certificates were only utilized to restrict client access to MarkLogic Server with the Digest/Basic User Authentication Scheme. Certificate based User Authentication configuration can be achieved using Internal User or External Name based user configurations.

      Certificate Authentication: Internal User vs External Name based Authentication:

      The difference between Internal User or External Name based authentication lies in the existence of the Certificate CN field based User (demoUser1 in our example) in the MarkLogic Security Database (Internal User) vs if the user retrieved from Certificate Subject field (whole Subject field as DN) is mapped as External Name value in any Existing User.

      User Certificate Example:

      There are few common steps/examples listed to add to clarity. For our example setup, the certificate presented by the App Server User (demoUser1) will be as following. 

      $ openssl x509 -in UserCert.pem -text -noout
              Version: 1 (0x0)
              Serial Number: 7 (0x7)
          Signature Algorithm: sha1WithRSAEncryption
              Issuer: C=US, ST=NY, L=New York, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic DemoCA
                  Not Before: Jul 11 02:58:24 2017 GMT
                  Not After : Aug 27 02:58:24 2019 GMT
              Subject: C=US, ST=NJ, L=Princeton, O=MarkLogic Corporation, OU=Engineering, CN=demoUser1
              Subject Public Key Info:
                  Public Key Algorithm: rsaEncryption
                      Public-Key: (1024 bit)
                      Exponent: 65537 (0x10001)
          Signature Algorithm: sha1WithRSAEncryption

      CA Certificate (User Cert Signer) Import from Admin GUI

      In order to allow MarkLogic Server to accept the Certificate presented by a user, MarkLogic Server needs Certificate Authority (CA) to sign the User Certificate installed into MarkLogic. We can install CA Certificate (below) used to sign demoUser1 Cert using Admin GUI->Configure->Security->Certificate Authority Import tab.

      $ openssl x509 -in CACert.pem -text -noout
              Version: 3 (0x2)
              Serial Number: 9774683164744115905 (0x87a6a68cc29066c1)
          Signature Algorithm: sha256WithRSAEncryption
              Issuer: C=US, ST=NY, L=New York, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic DemoCA
                  Not Before: Jul 11 02:53:18 2017 GMT
                  Not After : Jul  6 02:53:18 2037 GMT
              Subject: C=US, ST=NY, L=New York, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic DemoCA
              Subject Public Key Info:
                  Public Key Algorithm: rsaEncryption
                      Public-Key: (4096 bit)
                      Exponent: 65537 (0x10001)
              X509v3 extensions:
                  X509v3 Subject Key Identifier:
                  X509v3 Authority Key Identifier:
                  X509v3 Basic Constraints: critical
                  X509v3 Key Usage: critical
                      Digital Signature, Certificate Sign, CRL Sign
          Signature Algorithm: sha256WithRSAEncryption

      CA Certificate Import into MarkLogic from Query Console

      We can also import above Certificate Authority with xquery call pki:insert-trusted-certificates to load the Trusted CA into MarkLogic.  The sample Query Console code below demonstrates this process. 

      (Please ensure this query is executed against the Security database)

      Certificate Template & Template CA import into Client (Browser/SSL Client)

      To enable SSL App Server, we will either

      1) Create Certificate Template to utilize Self Signed Certificate.

      or, 2) Import pre-signed Certificate Certificate into MarkLogic

      In both of the above cases, we will need to import CA used to sign Certificate used by MarkLogic SSL AppServer ro Client Browser/SSL Client.

      Importing a Self Signed Certificate Authority into Windows

      Once template is created, we will link our Template with our App Server to enable SSL based App Server.

      Certificate Authentication: CN as Internal User vs External Name based Internal User

      Difference between above two lies in if Certificate CN field User (demoUser1 in our example) exist in MarkLogic Security Database as Internal User -vs- if User retrieved from Certificate Subject field is mapped as External Name to any Existing User.

      1.) Certificate Authentication: Certificate CN field value as MarkLogic Security Database Internal User

      Steps to configure Certificate based User Authentication for our User demoUser1 as MarkLogic Internal User.

      a.) Create User "demoUser1" with necessary roles in MarkLogic Security (Internal User).


      b.) On the AppServer page, we will set Authentication schema to "Certificate" with Internal Security to "true". Also, unless you want to have some Users Authenticated as External User as well, you should leave External Security object to "none".


      c.) AppServer would also select CA that will be used to sign Client/User Certificate as accepted Certificate Authorities (please see section: CA Certificate earlier for our example).


      Once Configured, accessing above App Server with Browser with User Certificate (demoUser1) installed will be able to log into MarkLogic with internal demoUser1 (Note- We will also need to assign necessary Roles to Internal User to access resource as needed). 

      2.) Certificate Authentication: User Certificate Subject field value as External Name for Internal User

      Steps to configure Certificate based User Authentication for our User demoUser1 as MarkLogic External Name for Internal User "newUser1".

      a.) Create User "newUser1" with necessary roles in MarkLogic Security (Internal User), and Configure User Certificate Subject field as External Name to User.


      b.) Create an External Security object with Certificate based Authentication.


      c.) On External Security Object Configuration itself, select CA that will be used to sign Client/User Certificate as accepted Certificate Authorities (please see section: CA Certificate earlier for our example).

      Please Note - below Configuration is different then configuring Client CA on App Server (required for Internal User).


      d.) For External Name (Cert Subject field) based linkage to Internal User, App Server needs to point to our External Security Object.




      Further Reading

      What is MLCP? MarkLogic Content Pump (MLCP) is an open-source, Java-based command-line tool to import, export and copy data to or from databases.


      How do I install MLCP? Refer to our documentation and tutorial for this.

      What are the required software for MLCP?

      • MarkLogic Server with XDBC App Server (MarkLogic 8 and later versions come with an XDBC App Server pre-configured on port 8000).
      • Oracle/Sun Java JRE 1.8 or later.


      Can I connect to MLCP via Load Balancer?

      Yes. You can configure the MLCP tool to connect to a Load Balancer that sits in front of the MarkLogic Server cluster


      What are the permissions needed for MLCP operations?
      • 'admin' role or
      • Necessary permissions mentioned in the documentation with additional privileges (for e.g read/update privileges to the database)


      Does MLCP offer a way to export triples? MLCP currently doesn’t offer a way to export triples but if you are okay with exporting them as XML files (through a collection name - for managed triples, graph name can be used as a collection name), you can do so by exporting those documents as files through MLCP

      KB articles:

      Can I configure MLCP to use SSL? Yes, please refer to our "Connecting to MarkLogic Using SSL" documentation for details.
      Can I configure Kerberos with MLCP? Yes. Please check Using MLCP With Kerberos for additional details.
      How do I ingest data in Data Hub Framework using MLCP? Check the "Ingest Using MLCP" section in our Data Hub Documentation for more details.
      Can we use MLCP to read from Amazon S3?

      There is currently no direct support between MLCP and Amazon S3.

      But you can consider using s3fs-fuse to mount the S3 Bucket as a local filesystem and then use MLCP.

      Can I filter the data by column values while importing csv via MLCP?   Not in MLCP. But you can use other tools like CORB.


      How do I debug/troubleshoot MLCP issues? Check our MLCP Troubleshooting documentation.
      Can I export large files in compressed format? Yes, use the -compress option in MLCP's export command


      What is -fastload option and when should I use it? The -fastload option can significantly speed up ingestion during import and copy operations, but it can also cause problems if not used properly. Please check the documentation for tradeoffs and other considerations


      How does MLCP handle failover?
      Failover support in MLCP is only available when running against MarkLogic 9 or later. With older MarkLogic versions, the job will fail if MLCP is connected to a host that becomes unavailable.


      Does MLCP support concurrent jobs? No, refer to our knowledge base article for details.

      What to consider when configuring the thread_count option for MLCP export?
      • By default the -thread_count is 4 (if -thread_count is not specified)
      • For best performance, you can configure this option to use the maximum number of threads supported by the app server in the group (maximum number of server threads allowed on each host in the group * the number of hosts in the group)
        • E.g.: For a 3-node cluster, this number will be 96 (32*3) where:
          • 32 is the max number of threads allowed on each host
          • 3 is the number of hosts in the cluster

      KB Articles:


      What are the differences between MLCP and CORB? Check this MarkLogic Stackoverflow discussion for more details.


      How to handle white space in URI's/folders while loading data in MLCP? Check our "Handling Whitespace in URIs" blog for details.


      How can I use delimiter in MLCP?

      Please check these links for details

      Creating Documents from Delimited Text Files

      Ingesting Delimited Text with MLCP

      Loading tab delimited files

      Does MLCP support distributed (Hadoop) mode? No, MLCP in distributed mode has been deprecated since MarkLogic 10.0-4 

      How can I invoke MLCP via gradle task?

      Check the github "MarkLogic Content Pump (mlcp) and Gradle" documentation for details. 


      Performance of the data extraction, ingestion using mlcp depends on multiple factor including hardware capacity of client node running mlcp. This article is solely focused on how to adjust mlcp thread_count and thread_count_per_split for better performance during import and export for the given hardware and the data set size.

      mlcp Import

      For mlcp import jobs, there are two options for tuning the the thread: 

      1. -thread_count

      -thread_count is the number of threads to spawn for concurrent loading. The total number of the thread count, however, is controlled by the newly calculated thread count or -thread_count if it is specified.

      2. -thread_count_per_split

       -thread_count_per_split is the maximum number of threads that can be assigned to each split. If you specify -thread_count_per_split, each input split will run with the specified number.

      What if both the options are not specified?

      Prior to 10.0-4.2,  mlcp import will use default thread count 4 for concurrent loading.

      For mlcp versions higher than or equal to 10.0-4.2, thread polling mechanism was introduced. During job initialization, mlcp conducts a thread polling to identify the maximum app server or xdbc server threads on the port that handles mlcp requests. MLCP will then use this number as the default thread count. 

      mlcp Export

      For mlcp export jobs, the only option for thread tuning is -thread_count.

      What if thread_count is not specified?

      If it is not specified, the default thread count for concurrent exporting is 4.


      For import: It is recommended to align mlcp concurrent thread count with the maximum server threads allowed on all hosts (preferrable all the E nodes) in the group, to achieve better performance. However, this may not be the case if your MarkLogic server is I/O bound. Increasing the concurrency of writes will not necessarily improve performance. Because of the polling mechanism, the concurrency of the current app server/xdbc server has been maxed out, so it's not recommended to run multiple mlcp jobs at the same time. 

      For export: It is a good reasonable practice to try out smaller numbers for thread count such as 8, 16, 24, 32, 40 or 48 threads until the environment reaches I/O bound.Since mlcp exports content from multiple MarkLogic servers and writes to the local file system on a single node, the performance is largely restricted by the I/O capability of the machine that runs the mlcp job.Further increasing the thread count may harm the performance, since the speed of the client consuming data is a lot slower than the speed of the server serving data. It may also result in long-running requests, which may timeout (SVC-EXTIME exception) on the app server/xdbc server depending on the request timeout setting.

      Additional Resources

      For more information on MLCP troubleshoot see following resources.


      MarkLogic may fail to start, with an XDMP-ENCODING error, Initialization: XDMP-ENCODING: (err:XQST0087) Unsupported character encoding: ascii.  This is caused by a mismatch in the Linux Locale character set, and the UTF-8 character set required by MarkLogic.


      There are two primary causes to this error. The first is using service instead of systemctl to start MarkLogic on some Linux distros.  The second is related to the Linux language settings.

      Starting MarkLogic Service

      On an Azure MarkLogic VM, as well as some more recent Linux distros, you must use systemctl, and not service to start MarkLogic. To start the service, use the following command:

      • sudo systemctl start MarkLogic

      Linux Language Settings

      This issue occurs when the Linux Locale LANG setting is not set to UTF-8.  This can be accomplished by changing the value of LC_ALL to "en_US.UTF-8".  This should be done for the root user for default installations of MarkLogic.  To change the system wide locale settings, the /etc/locale.conf needs to be modified. This can be done using the localectl command.

      • sudo localectl set-locale LANG=en_US.UTF-8

      If MarkLogic is configured to run as a non-root user, then setting the locale can be done in the users environment.  Setting the value can be done using the $HOME/.i18n file.  If the file does not exist, please create it and ensure it has the following:

      • export LANG="en_US.UTF-8"

      If that does not resolve the issue in the user environment, then you may need to look at setting LC_CTYPE, or LC_ALL for the locale.

      • LC_CTYPE will override the character set part of the LANG setting, but will not change other locale settings.
      • LC_ALL will override both LC_CTYPE and all locale configurations of the LANG setting.


      Overlarge workloads, underprovisioned environments, or a combination of the two often result in false failovers - where MarkLogic Server will perceive an overloaded node as unavailable. Failover events redistribute the affected node’s traffic to the remaining nodes in the cluster. False failover events, unfortunately, redistribute an overloaded node’s workload to the likely similarly overloaded (and now even fewer number of) nodes remaining in the cluster. While it’s possible to mitigate this scenario in the short term by allowing more time for nodes to talk to one another, long term correction requires throttling of workloads, increasing the environment’s hardware provisioning, or a combination of the two.

      What does failover look like in MarkLogic Server?
      High availability systems require continuity within a cluster. MarkLogic Server delivers high availability by providing fault tolerance - if a node in a MarkLogic cluster fails, other nodes automatically pick up the workload so that the data stored in forests is always available. 

      More specifically, failover in MarkLogic Server is designed to address data node (“d-node”) or forest-level failure. D-node failures can include operating system crashes, MarkLogic Server restarts, power failures, or persistent system failures (hardware failures, for example). A forest-level failure is any disk I/O or other failure that results in an error state on the forest. 

      Failover in MarkLogic Server is "hot" in the sense that switchover occurs immediately to failover hosts already running in the same cluster, with no node restarts required. Failing back from a failover host to the primary host, however, needs to be done manually and requires a node restart.

      When a node is perceived as no longer communicating with the rest of the cluster, and a quorum of greater than 50% of the nodes in the cluster vote to remove the affected node, then a failover event will occur automatically. A node is defined to no longer be communicating with the rest of the cluster when that node fails to respond to cluster wide heartbeats within the defined host timeout.

      What does false failover look like in MarkLogic Server?
      False failover events in MarkLogic Server occur when a node is present and working, but so overloaded that it can no longer respond to cluster wide heartbeats within the specified host timeout. In other words, during false failover events the affected node is so busy that it is unable to communicate its status to the other nodes in the cluster, and consequently unable to prevent the other nodes from voting to remove it from the cluster.

      There could be many reasons causing a busy node/cluster and one of the reasons that’s often overlooked is the infrastructure especially when Virtualization is involved where you can get more out of your resources by allowing VMs to share resources under the assumption that not all systems will need the assigned resources at the same time. However, if you are in a situation where multiple VMs are under load, they can outstrip the available physical resources because more than 100% of the resources have been assigned to the VMs causing what is called a "resource starvation".

      What should I do about false failover events in MarkLogic Server?
      Recall that a node is voted out when it can no longer respond to the rest of the cluster within the specified host timeout. It might be possible to mitigate false failovers in the short term by temporarily increasing the environment’s XDQP and host timeouts. Larger timeouts would give all the nodes in the cluster more time to respond to clusterwide heartbeats, which under heavy load should decrease the frequency of false failover events. That said - DO NOT get in the habit of simply increasing your timeouts to larger and larger values. Increasing timeout to avoid false failovers is, at best, a temporary/short term tactic.

      Long term correction of false failover events requires better alignment between your system's workloads and its hardware provisioning. You could, for example, reduce the workload, or spread the same workload over more time, or increase your system’s hardware provisioning. All of these tactics would free up the affected nodes to respond to the clusterwide heartbeat in a more timely manner, thereby avoiding false failover events. You can read more about aligning your workloads and hardware footprint at:

      1. MarkLogic Performance: Understanding System Resources
      2. Performance Issues in MarkLogic Server: what they look like - and what you should do about them

      Further reading:

      MarkLogic Server is optimized for query performance - if you're coming from a relational database background, you might be surprised by how much storage and storage bandwidth might be used. To better understand this behavior, it's important to recall the following:

      Speed over storage savings - While it makes sense to minimize storage footprint from a storage utilization perspective, MarkLogic Server trades space for time to take advantage of rapidly falling storage prices.

      Lazy Deletes - To better prioritize query performance, in MarkLogic Server record deletions happen in the form of "lazy deletes" where the record (or "document") is first marked as "obsolete" and consequently hidden from query results. The work of actually deleting any one record is deferred for a later time, when multiple obsolete documents can be removed and your remaining data optimized all at the same time and in bulk during a merge operation.

      Index on ingest - MarkLogic Server indexes documents as they're ingested. If your data model and index configuration is where it needs to be, that means your data is ready to be queried as soon as it's in a MarkLogic Server database. If your index configuration isn't quite where you want it, however, revising it means reindexing your entire database, creating lots of obsolete documents and resulting in potentially multiple large merge operations. This is why it's always better in MarkLogic Server to optimize your index settings in smaller environments before propagating those index settings to your bigger environments, and why you'll want to do fewer, bigger index configuration changes instead of many small index configuration changes. Each index configuration change - regardless of size - will trigger a reindex, so you'll want to minimize the number of reindexes you need to perform instead of the minimizing the number of changes in any one reindex.

        In addition to reindexing, other aspects of MarkLogic Server that take up significant storage bandwidth include:

        • Rebalancing - which redistributes your data across your database
        • Local disk failover/database replication - both make copies of your data, and those copies need their own resources
        • Backup/restore - backup is making a copy of your data, and restore is effectively a mass update of your data
        • Mass updates of existing documents - Because of the way updates are performed in MarkLogic Server (read more), updating a large number of existing records will create a large number of obsolete documents, and consequently result in lots of large merge operations. To help reduce performance overhead, and if you have no need to preserve attributes of your existing data (read more), you might want to consider simply reloading data into an empty database, instead (which would result in avoiding the creation of obsolete documents and consequent merges)


        Understanding System Resources
        Understanding MarkLogic Minimum Disk Space Requirements
        MarkLogic - Lazy Deletes
        Mass Updates - "node-replace" vs "document-insert"


        A MarkLogic cluster is a group of inter-connected individual machines (often called “nodes” or “hosts”) that work together to perform computationally intensive tasks. Clustering offers scalability and high-availability by avoiding single-points of failure. This knowledgebase article contains tips and best practices around clustering, especially in the context of scaling out.

        How many nodes should I have in a cluster?

        If you need high-availability, there should be a minimum of three nodes in a cluster to satisfy quorum requirements.

        Anything special about deploying on AWS?

        Quorum requirements hold true even in a cloud environment where you have Availability Zones (or AZs). In addition to possible node failure, you can also defend against possible AZ failure by splitting your d-Nodes and e-Nodes evenly across three availability zones.

        Load distribution after failover events

        If a d-node experiences a failover event, the remaining d-nodes pick up its workload so that the data stored in its forests remains available.

        Failover forest topology is an important factor in both high-availability and load-distribution within a cluster. Consider the example below of a 3-node cluster where each node has two data forests (dfs) and two local disk-failover forests (ldfs):

        • Case 1: In the event of a fail over, if both dfs (df1.1 and df1.2) from node1 fail over to node2, the load on node2 would double (100% to 200%, where node2 would now be responsible for its own two forests - df2.1 and df2.2 - as well as the additional two forests from node1 - ldf1.1 and ldf1.2)
        • Case 2: In the event of a fail over, if we instead set up the replica forests in such a way that when node1 goes down, df1.1 would fail over to node2 and df1.2 would fail over to node3, then the load increase would be reduced per node. Instead of one node going from 100% to 200% load, two nodes would instead go from 100% to 150%, where node2 is now responsible for its two original forests - df2.1 and df2.2, plus one of node1's failover forests (ldf1.1), and node3 would also now be responsible for its two original forests - df3.1 and df3.2, plus one of node1's failover forests (ldf1.2)

        Growing or scaling out your cluster

        If you need to fold in additional capacity to your cluster, try to add nodes in "rings of three." Each ring of three can have its own independent failover topology, where nodes 1, 2, and 3 will fail over to each other as described above, and nodes 4, 5, and 6 will fail over to each other separate from the original ring of three. This results in minimal configuration changes for any nodes already in your cluster when adding capacity.

        Important related takeaways

        • In addition to the standard MarkLogic Server clustering requirements, you'll also want to pay special attention to the hardware specification of individual nodes
          • Although the hardware specification doesn’t have to be exactly the same across all nodes, it is highly recommended that all d-nodes be of the same specification because cluster performance will ultimately be limited by the slowest d-node in the system
          • You can read more about the effect of slow d-nodes in a cluster in the "Check the Slowest D-Node" section of our "Performance Testing
            With MarkLogic" whitepaper
        • Automatic fail-back after a failover event is not supported in MarkLogic due to the risks of unintentional overwrites, which could potentially result in accidental data loss. Should a failover event occur, human intervention is typically required to manually fail-back. You can read more about the considerations involved in failing a forest back in the following knowledgebase article: Should I flip failed over forests back to their respective masters? What are the risks if I leave them?


        Further reading


        What does it mean?



        This error may sometimes be encountered when:

        • When a restore is attempted while a backup task is running
        • Another process has the backup directory locked 

        Seen when:

        • The disk containing the backup directory runs out of space
        • There's a bad disk configuration
        • The backup destination disk is unmounted


        Indicates that an operation such as a merge, backup or query was explicitly canceled. This can occur:

        • Through the Admin Interface
        • By calling an explicit cancellation function, such as xdmp:request-cancel()
        • When a client breaks the network socket connection to the server while a query is running 


        MarkLogic Server expects the system clocks to be synchronized across all the nodes in a cluster, as well as between Primary and Replica clusters. The acceptable level of clock skew (or drift) between hosts is less than 0.5 seconds, and values greater than 30 seconds will trigger XDMP-CLOCKSKEW errors, and could impact cluster availability


        Indicates that an update statement attempted to perform an update to a document that will conflict with other updates occurring in the same statement. For example:

        • A single update transaction that attempts to updates a node, then attempts to add a child element to that node in the same transaction
        • A single update transaction that attempts to insert a document and then attempts to insert a node to that document
        • A single update transaction that attempts to insert a document at the same URI twice


        Indicates that the same URI occurred in multiple forests of the same database. Under normal operating conditions, duplicate URIs are not allowed to occur, but there are ways that programmers and administrators can bypass the server safeguards


        Indicates that MarkLogic Server detected a deadlock. Depending on whether the error is frequent or infrequent or whether it occurs as a ‘debug’ level or ‘notice’ level message, you need to take appropriate corrective action to avoid the deadlock


        Indicates that MarkLogic has run out of room in the expanded tree cache during query evaluation, and that consequently it cannot continue evaluating the complete query


        Indicates that a query or other operation exceeded its processing time limit. This can be caused by:

        • Inefficient queries
        • Inadequate processing limit
        • Resource bottlenecks


        Indicates that in-memory storage is full, resulting in the forest stands being written out to disk. These are informational only and are not errors as MarkLogic Server is working as expected. However, if these messages consistently appear more frequently than once per minute, increasing the corresponding 'in-memory' settings in the affected database may be appropriate.

        • MarkLogic Server uses its list cache to hold search term lists in memory
        • If you're attempting to execute a particularly non-selective or inefficient query, your query will fail due to the size of the search term lists exceeding the allocated list cache


         Both errors indicate that the requested module does not exist or the user does not have the right permissions on the module

        MarkLogic on AWS

        What are 504 Timeout errors? How to resolve them?

        504 timeout errors indicate that the load balancer may be closing the connection before the server responds to the request. To avoid these, make sure the idle time out setting is sufficient to receive responses from the MarkLogic server.


        • The error SVC-AWSCRED indicates that either no AWS security credentials are configured or there are issues recognizing IAM role
        • If you are using a non-managed instance using a custom AMI:
          • Add the below to your /etc/marklogic.conf file (create one if not present)