Knowledgebase
Most popular articles 
 
Newest articles 
 

Introduction

This article discusses the "Stand s has n fragments" messages that may appear in error log or system log files. These messages can appear at different log levels (Notice, Warning, Error, Critical, Alert, and Emergency) as the severity will increase as the number of fragments in a single stand increases, indicating increasing risk. 

Fragment counts and their corresponding Log levels:

 In MarkLogic 8 and MarkLogic 9, the fragment count thresholds within a single stand for the log levels are:  

  • At around 84 million fragments, MarkLogic Server will report this with a Notice level log message
  • At around 109 million fragments, MarkLogic Server will report this with a Warning level log message
  • At around 134 million fragments, MarkLogic Server will report this with an Error level log message
  • At around 159 million fragments, MarkLogic Server will report this with a Critical level log message
  • At around 184 million fragments, MarkLogic Server will report this with an Alert level log message
  • At around 209 million fragments, MarkLogic Server will report this with an Emergency level log message

At 256 million fragments your data may be at risk of becoming corrupted due to integer overflow. The log level reflects the risk and is intended to get your attention at higher stand fragment counts.

Emergency level log entries

Consider an example Error Log entry where the following information is observed:

2015-06-20 10:13:39.746 Emergency: Stand /space/Data/Forests/App-Services/00000fae has 213404541 fragments.

At all levels, the messages should be monitored and managed, but at the Emergency level, you will need to take corrective action soon.  

Corrective Actions

Note that it is the number of fragments in a stand that is important, not the number of fragments in a forest.  The actions that you take should act to decrease the size of stands in a forest. 

Some of the actions you can take:

  • If not already configured, MarkLogic databases should be configured with a merge-max-size value smaller than the current forest size (Databases created in MarkLogic 7 or MarkLogic 8 have a default value of 32GB).
  • If merge-max-size already configured for the database, decrease the value of this setting. 

Summary

Occasionally, you might see an "Invalid Database Online Event" error in your MarkLogic Server Error Log. This article will help explain what this error means, as well as provide some ways to resolve it.

What the Error Means

The XDMP-INVDATABASEONLINEEVENT means that something went wrong during the database online trigger event. There are many situations that can trigger this event, such as a server-restart, or when any of the databases has a change in configuration). In most cases, this error is harmless - it is just giving you information.

Resolving the Error

We often see this error when the user id that is baked into the database online event created by CPF is no longer valid, and the net effect is that CPF's restart handling is not functioning. We believe reinstalling CPF should fix this issue.

If re-installing CPF does not resolve this error, you will want to further analyze and debug the code that is invoked by the restart trigger.

 

 

 

Details:

Upon boot of CentOS 6.3, MarkLogic users may encounter the following warning:

:WARNING: at fs/hugetlbfs/inode.c:951 hugetlb_file_setup+0x227/0x250() (Not tainted)

MarkLogic 6.0 and earlier have not been certified to run on CentOS 6.3. This message is due to MarkLogic using a resource that has been deprecated in CentOS 6.3. The message can be ignored, as it will not cause any issues with MarkLogic performance. Although this example points specifically points out CentOS 6.3, this message could potentially occur in other MarkLogic/Linux combinations.

Introduction

Some customers have reported seeing kernel level messages like this in their /var/log/messages file:

Jan 31 17:41:46 ml-c1-u3 kernel: [17467686.201893] TCP: Possible SYN flooding on port 7999. Sending cookie

This may also be seen as part of the output from a call to dmesg and could possibly follow a stack trace, for example:

[<ffffffff810d3d27>] ? audit_syscall_entry+0x1d7/0x200 
[<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b possible SYN flooding on port 7999. Sending cookies. possible SYN flooding on port 7999. Sending cookies.

What does it mean?

The tcp_syncookies configuration is likely enabled on your system.  You can check for this by viewing the contents of /proc/sys/net/ipv4/tcp_syncookies

$ cat /proc/sys/net/ipv4/tcp_syncookies
1

If the value returned is 1 (as per the example above), then tcp_syncookies are enabled for this host

Possible SYN flooding

A SYN flood is a form of denial-of-service attack in which an attacker sends a succession of SYN requests to a target's system in an attempt to consume enough server resources to make the system unresponsive to legitimate traffic.

Source: Wikipedia https://en.wikipedia.org/wiki/SYN_flood

You would expect to see evidence of a SYN flood when a "flood" of TCP SYN messages are sent to the host. Under normal operation, your kernel should acknowledge these incoming SYNs with a SYN-ACK, are not followed by ACK messages from the client. The process (or pattern) described above is known as Three Way Handshaking. The goal of this is to firmly establish communication on both the server and the client.

In the event of a real attack, a SYN flood will most likely originate from a fake IP address; during an attack, the client performing the "flood" is not waiting for the SYN-ACK response back from the server it is attacking.

Under normal operation (i.e. without SYN cookies), TCP connections will be kept half-open after receiving the first SYN because of the handshake mechanism used to establish TCP connections. Due to the fact that there is a limit to how many half open connections that the kernel can maintain at any given time, this is where the problem becomes characterized as an attack.

The term half-open refers to TCP connections whose state is out of synchronization between the two communicating hosts, possibly due to a crash of one side. A connection which is in the process of being established is also known as embryonic connection.

Source: Wikipedia https://en.wikipedia.org/wiki/TCP_half-open

If SYN cookies are enabled, then the kernel doesn't track half-open connections. Instead it relies on the sequence number in the following ACK datagram that the ACK follows a SYN and a SYN-ACK which establishes full communication between client and server. By ignoring half-open connections, SYN floods are no longer a problem.

In the case of MarkLogic, this message can appear if the rate of incoming messages is perceived to the kernel as being unusually high. In this case, this would not be indicative of a real SYN flooding attack, but to the TCP/IP stack it looks like it exhibits the same characteristics and the kernel responds by reporting a possible (fake) attack.

Notes from the kernel documentation

See the section of the kernel documentation for tcp_syncookies - BOOLEAN for some further information regarding this feature:

The syncookies feature attempts to protect a socket from a SYN flood attack. This should be used as a last resort, if at all. This is a violation of the TCP protocol, and conflicts with other areas of TCP such as TCP extensions. It can cause problems for clients and relays. It is not recommended as a tuning mechanism for heavily loaded servers to help with overloaded or misconfigured conditions. For recommended alternatives see tcp_max_syn_backlog, tcp_synack_retries, and tcp_abort_on_overflow.

Further down, they state:

Note, that syncookies is fallback facility. It MUST NOT be used to help highly loaded servers to stand against legal connection rate. If you see SYN flood warnings in your logs, but investigation shows that they occur because of overload with legal connections, you should tune another parameters until this warning disappear. See: tcp_max_syn_backlog, tcp_synack_retries, tcp_abort_on_overflow.

Source: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt

Tuning on a MarkLogic Server

Any dmesg output indicating "possible SYN flooding on port 7999" may appear in tandem with very heavy XDQP (TCP) traffic within a MarkLogic cluster - this link provides further detail in relation to a similar scenario with Apache HTTP server. You can tune your TCP settings to try to avoid SYN Flooding error messages, but SYN flooding can also be a symptom of a system under resource pressure. 

If a MarkLogic Server instance sees SYN flooding message on a system that is otherwise healthy and the messages occur because of normal and expected MarkLogic Server communications, you may want to increase the backlog (tcp_max_syn_backlog) or adjust some of the other settings (such as tcp_synack_retries, tcp_abort_on_overflow). However, if SYN Flooding message only occurs on a system that is under resource pressures, then solving the resource issue should be the focus.  

How to disable SYN cookies

You can disable syncookies by adding the following line to /etc/sysctl.conf:

# disable TCP SYN Flood Protection
net.ipv4.tcp_syncookies = 0

Also note that the new setting will take only effect after a host reboot.

Further reading

Introduction

After upgrading to MarkLogic 10.x from any of the previous versions of MarkLogic, examples of the following Warning and Notice level messages may be observed in the ErrorLogs:

Warning: Lexicon '/var/opt/MarkLogic/Forests/Documents/00000006/c4ea1b602ee84a34+Lexicon' collation='http://marklogic.com/collation/zh-Hant' out of order


Notice: Repairing out of order lexicon /var/opt/MarkLogic/Forests/Documents/00000006/c4ea1b602ee84a34+Lexicon collation 'http://marklogic.com/collation/zh-Hant' version 0 to 602

Warning: String range index /space/Forests/Documents/0006ef0e/c0dc932d1b4bcaae-37c6e3905909f64e+string collation 'http://marklogic.com/collation/' out of order. 

Notice: Repairing out of order string range index /space/Forests/Documents/0006ef0e/c0dc932d1b4bcaae-37c6e3905909f64e+string collation 'http://marklogic.com/collation/' version 0 to 602

Starting with MarkLogic 10.0, the server now automatically checks for any lexicons or string range indexes that may be in need of repair.  Lexicons and range indexes perform "self-healing" in non-read-only stands whenever a lexicon/range index is opened within the stand.

Reason

This is due to changes introduced to the behavior of MarkLogic's root collation.

Starting with MarkLogic 10.0, the root collation has been modified, along with all collations that derive from it, which means there may be some subtle differences in search ordering.

For more information on the specifics of these changes, please refer to http://www.unicode.org/Public/UCA/6.0.0/CollationAuxiliary.html

This helps the server to support newer collation features, such as reordering entire blocks of script characters (for example: Latin, Greek, and others) with respect to each other. 

Implementing these changes has, under some circumstances, improved the performance of wildcard matching by more effectively limiting the character ranges that search scans (and returns) for wildcard-based matching.

Based on our testing, we believe this new ordering yields better performance in a number of circumstances, although it does create the need to perform full reindexing of any lexicon or string range index using the root collation.

MarkLogic Server will now check lexicons and string range indexes and will try to repair them where necessary.  During the evaluation, MarkLogic Server will skip making further changes if any of the following conditions apply:

(a) They are already ordered according to the latest specification provided by ICU (1.8 at the time of writing)

(b) MarkLogic Server has already checked the stand and associated lexicons and indexes

(c) The indexes use codepoint collation (in which case, MarkLogic Server will be unable to change the ordering).

Whenever MarkLogic performs any repairs, it will always log a message at Notice level to inform users of the changes made.  If for any reason, MarkLogic Server is unable to make changes (e.g. a forest is mounted as read-only), MarkLogic will skip the repair process and nothing will be logged.

As these changes have been introduced from MarkLogic 10 onwards, you will most likely observe these messages in cases where recent upgrades (from prior releases of the product) have just taken place.

Repairs are performed on a stand by stand basis, so if a stand does not contain any values that require ordering changes, you will not see any messages logged for that stand.

Also, if any ordering issues are encountered during the process of a merge of multiple stands, there will only be one message logged for the merge, not one for each individual stand involved in that merge.

Summary

  • Repairs will take place for any stand that has been found to have a lexicon or string index that has an out-of-order and out-of-date (e.g. utilising a collation described by an earlier version of ICU) collation, unless that stand is mounted as read only.
  • Any repair will generate Notice messages when maintenance takes place.
  • Whenever a lexicon or string Range index is opened, this check/repair will take place for any string range index; lexicon call (e.g. cts:values); range query (e.g. cts:element-range-query) and during merges merges.
  • The check looking for ICU version mismatches plus items that are out-of-order, so any lexicon / string range index with older ordering (and which requires no further changes), no further action will be taken for that stand.

Known side effects

If the string range index or lexicon is very large, repairing can cause some performance overhead and may impact search performance during the repair process.

Solution

These messages can be avoided by issuing a full reindex of your databases immediately after performing your upgrade to MarkLogic 10.

Introduction

Forests in MarkLogic Server may be in one of several mount states. On mounting, local disk failover forests or database replication forests should both eventually reach the sync replicating or async replicating state. There are occasions, however, where local disk failover or database replication forests will sometimes get stuck in the wait replication state. This knowledgebase article will itemize many of these wait replication scenarios, as well as the operational tactics to use in response. 

Wait replication scenarios

Wait replication as a result of lack of quorum

A quorum in MarkLogic server represents more than 50% of the total number nodes of the cluster. It's very important to note the total number of nodes - regardless of group membership, forest assignment, whether nodes are running/not running, etc. - if a machine exists in the hosts.xml configuration file and in the list of hosts in the Admin UI, it contributes to the total count.

While it's possible to run a MarkLogic cluster with only a subset of the configured nodes up, it's not a recommended configuration. In addition, if the number of active nodes in your cluster falls below the greater than 50% quorum threshold, you might run into forests in the wait replication state due to the lack of quorum.

What to do about it? You'll need to alter your cluster's configuration to meet the quorum requirement. That can mean either removing missing nodes from the cluster's configuration (essentially telling the cluster to stop looking for those missing nodes), or alternatively bringing up nodes that are currently part of the configuration, but not actively returning heartbeats (effectively letting the cluster see nodes it expects to be there). 

You can read more about quorum at the following knowledgebase articles:

Wait replication as a result of mixed file permissions

The root MarkLogic process is simply a restarter process which waits for the non-root (daemon) process to exit. If the daemon process exits abnormally, for any reason, the root process will fork and exec another process under the daemon process. The root process runs no XQuery scripts, opens no sockets, and accesses no database files. While it's possible to run the MarkLogic process as a non-root user, be very careful about forest file permissions - if your configured MarkLogic user doesn't have the necessary permissions, you might see wait replication and an inability to correctly failover to local disk failover forests when necessary - in which case you'll need to set your forest file permissions correctly to move forward. You can read more about running the MarkLogic process as a non-root user at:

Wait replication due to upgrading in the wrong order

Per our documentation, when upgrading you must first upgrade your replica environment, then subsequently upgrade your master environment.

if your cluster upgrades aren’t done in the correct order, you’re going to need to:

  1. Decouple your master and replica clusters, then stop the replica cluster

  2. Edit your replica cluster's databases.xml to remove entries with Security database replication

  3. Start the replica cluster, beginning with the node that hosts the Security forest

  4. Manually run http://(hostname of node hosting the Security forest):8001/security-upgrade-go.xqy?force=true

  5. Re-couple your master and replica clusters

You can read more about upgrading environments using database replication at:

Wait replication because you downgraded

MarkLogic Server does not support downgrades. If you do attempt to downgrade your installation, your replica forests will be stuck in wait replication.

What to do about it? As in the case of upgrading in the wrong order, you'll need to manually run http://(hostname of node hosting the Security forest):8001/security-upgrade-go.xqy?force=true. You can read more about MarkLogic Server and downgrades at:

Wait replication because your master and replica forest names don't match

By default, the "Connect Forests by Name" option is set to true. This means the server has certain expectations around how master and replica forests should be named

What to do about it? Set "Connect Forests by Name" to false, then manually connect master and replica forests. You can read more about wait replication due to forest name mismatch at:

Wait replication as a result of merge blackouts (completely disabled merges)

What is merging and why do we need merge blackouts?

MarkLogic Server does lazy deletes, which marks documents obsolete (but doesn't actually delete them). Merges are when obsolete documents are actually deleted - in bulk, while also optimizing your data. Merge blackouts prevent this deferred deletion and optimization from happening. Merge blackouts can also sometimes result in wait replication. Consider a database that has both master and local disk failover forests where you have configured a merge blackout with the “disable merges completely” option (instead of “limit merges to” option). If a node failure on any of the nodes holding some of these forests were to occur during the merge blackout period, as soon as the failed node comes back online, all the forests associated with that specific node go into a “wait replication” state until the merge blackout period ends or is manually removed.

Notes:

  • Avoid completely disabling merges
  • If you do need to control merges, it's much better to set the maximum merge size in your blackout to a smaller number (“limit merges to” option)

Introduction

When configuring database replication, it is important to note that the Connect Forests by Name field is true by default. This works great because, when new forests of the same name are later added to the Master and Replica databases, they will be automatically configured for Database Replication.

The issue

The problem arises when you use replica forest names that do not match the original Master forest names. In that case, you may find that failover events cause forests to get stuck in the Wait Replication state. The usual methods of failing back to the designated masters will not work - restarting the replicas will not work, and neither will shutting down cluster/removing labels/restarting cluster.

Resolution

In this case, the way to fix the issue is to set Connect Forests by Name to false, and then you must manually connect the Master forests on the local cluster to the Replica forests on the foreign cluster, as described in the documentation: Connecting Master and Replica Forests with Different Names.

it is worth noting that, starting MarkLogic 7, you are also allowed to rename the replica forests. Once you rename the replica forests to the same name as the forest name of the designated master database (e.g., the Security database should have a Security forest in both the master and replica), then they will be automatically configured for Database Replication, as expected.

Updates

Wednesday, April 13,2022 : This article had updates on new releases for Data Hub Framework (DHF) -  DHF 5.7.1  ,  Data Hub Central. - Data Hub Central 5.7.1 

Monday, April 04, 2022: This article had been updated to account for the new guidance and remediation steps in CVE-2022-22965;

Thursday, March 31, 2022: Original article published.

Subject :

(Spring4Shell) CVE-2022-22965: Spring Framework RCE via Data Binding on JDK 9+

Summary :

Wednesday March 30, 2022, reports emerged of a new remote code execution flaw that affects Spring Framework. This vulnerability also popularly known as "Spring4Shell" is a new, previously unknown security vulnerability.

The CVE designation is CVE-2022-22965 with a CVSS Score of 9.8. Spring have acknowledged the vulnerability and released 5.3.18 and 5.2.20 to patch the issue as well as version 2.6.6 for spring-boot .

MarkLogic is aware of this vulnerability and is in the process of assessing the impact to our products and Client API's.

Update on Analysis as of 4/22/2022 - 

1.1. MarkLogic Server

MarkLogic Server, both on-premise or on AWS/Azure are not vulnerable to CVE-2022-22965. 

There are no known impact on Admin GUI, Query Console and Monitoring History/Dashboard. 

1.2. MarkLogic Java Client

No direct impact : In Java Client API, we only used spring-jdbc, 5.2.7

It doesn’t meet the prerequisites listed in CVE-2022-22965 of https://help.marklogic.com/Knowledgebase/Article/View/spring4shell-cve-2022-22965-spring-framework-rce-via-data-binding-on-jdk-9

These are the prerequisites for the exploit:

  • JDK 9 or higher
  • Apache Tomcat as the Servlet container
  • Packaged as WAR
  • spring-webmvc or spring-webflux dependency

Spring-jdbc has a transitive dependency on spring-core and spring beans ( identified as vulnerable ) . Hence, we have upgraded our spring-jdbc to version 5.3.18 which is available in latest Java Client API 5.5.3 Release available on DMC and GitHub.

1.3. MarkLogic Data Hub & Hub Central

MarkLogic Data Hub and Data Hub Central are impacted.  Data Hub Framework (DHF) V 5.7.1 is now available . 

1.4. MarkLogic Data Hub Service

  • Hub Central is impacted. The Hub Central component exists only on DHS versions >= 3.0. For customers using Hub Central in DHS wishing to update dependencies or versions once the new version is available, please contact MarkLogic Support assigned to the attention of the Cloud Services team.
  • mlcmd is not affected.
  • Sumo Logic is not affected. Sumo Logic Support validated that it is not vulnerable to known exploitable CVE-2022-22965 methods. The Sumo Logic collector also is not vulnerable to known Spring Cloud framework exploitation methods. Out of an abundance of caution, Sumo Logic will be updating its Sumo Logic Service; no action is required on your part, however. 

1.5. Marklogic-supported client libraries, tools

1.5.1. Un-Impacted versions  

S.No.

Component

Comments 

1. XCC No action is needed at this time. All systems have been thoroughly scanned and patched with the recommended fixes wherever needed. 
2. MLCP No action is needed at this time. All systems have been thoroughly scanned and patched with the recommended fixes wherever needed. 
3, mlcmd  MLCMD uses XMLSH and it is not effected by this vulnerability. 

1.5.2. Impacted versions (Scroll down the table

S.No.

Component

Comments

1. Java Client Util ml-javaclient-util-4.3.1 is now available on github, maven central. Download link is here.
2. ml-gradle/ml-app-deployer

ml-gradle-4.3.4 is now available on github. Download link is here.

ml-app-deployer-4.3.3 is now available on github, maven central. Download link is here

3. Data Hub Framework DHF 5.7.1 is now available . Download link for Github and Maven Central are available .
4. Data Hub Client Jar Data Hub Client Jar.  Download link for Github are available . 
5. Data Hub Central Data Hub Central 5.7.1 is now available. Downlink link Download link for Github and Maven Central are available .
6. Data Hub Central Community DHCCE 5.7.1 is now available on github  
7. Apache Spark Connector Spark connector 1.0.1 is now available at - https://developer.marklogic.com/products/spark/  
8. AWS Glue Connector

Glue connector 1.0.1 is now available at - https://aws.amazon.com/marketplace/pp/prodview-ws7nrqwwj3qec 

Please find the documentation here - https://docs.marklogic.com/cloudservices/aws/release-notes/release-notes-aws-dhs-tools.html

9. Pega Connector Upgrade to ml-gradle 4.3.4    

1.6. MarkLogic Open Source and Community-owned projects

1.6.1. Un-Impacted versions

S.No

Community Libraries

Comments

1. MuleSoft Connector  MuleSoft applications do not run in Tomcat containers or get packaged as WARs, the affected Spring versions are not vulnerable.The current MuleSoft Connector does not fall into the prerequisites, even though it does have a dependency on ml-javaclient-util, which appears to have Spring Framework llbraries that are affected. Although, ml-javaclient-util Spring dependencies should be updated
2. ml-javaclient-util Affected Spring versions in the dependencies for 4.2.0 and the latest 4.3.0, but should be safe as-is unless bundled into a Tomcat/Spring app. Although,, ml-javaclient-util Spring dependencies should be updated

1.6.2. Impacted versions  - 

Details will be updated here if any are identified..

MarkLogic is dedicated to supporting our customers, partners, and developer community to ensure their safety. If you have a registered support account, feel free to contact support@marklogic.com with any additional questions.

1.6.2. Impacted versions  - 

Details will be updated here if any are identified..

1.7. Contact and Links

MarkLogic is dedicated to supporting our customers, partners, and developer community to ensure their safety. If you have a registered support account, feel free to contact support@marklogic.com with any additional questions.

Introduction

This article will show you how to add a Fast Data Directory (FDD) to an existing forest.

Details

The fast data directory stores transaction journals and stands. When the directory becomes full, larger stands will be merged into the data directory. Once the size of the fast data directory approaches its limit, then stands are created in the data directory.

Although it is not possible to add an FDD path to a currently-existing forest, it is possible to do the following:

1. Destroy an existing forest configuration (while preserving the data)

2. Recreate a forest with the same name and data, with an FDD added

 

The queries below illustrate steps one and two of the process. Note that you can also do this with Admin UI.

The query below will delete the forest configurations but not data.

Preparation:

1. Schedule a downtime window for this procedure (DO NOT DO THIS ON A LIVE PRODUCTION SYSTEM)

2. Ensure that all ingestion and merging has stopped

3. Just to be on safer side, take a Backup of the forest first before applying this in Production

3. Detach the forest before running these queries


1) Use the following API to delete an existing forest configuration

NOTE: make sure to set the $delete-data parameter to false().

admin:forest-delete(
$config as element(configuration),
$forest-ids as xs:unsignedLong*,
$delete-data as xs:Boolean {=FALSE}
) as element(configuration)


2) Use the following API to create a new forest  pointing to the old data directory which includes the configured FDD:

admin:forest-create(
$config as element(configuration),
$forest-name as xs:string,
$host-id as xs:unsignedLong,
$data-directory as xs:string?,
[$large-data-directory as xs:string?],
[$fast-data-directory as xs:string?]
) as element(configuration)



Here's an example query that uses these APIs:

xquery version "1.0-ml";

declare namespace html = "http://www.w3.org/1999/xhtml";

import module namespace admin = "http://marklogic.com/xdmp/admin" 
at "/MarkLogic/admin.xqy";

let $config := admin:get-configuration()

(: preserve some path values from the old forest :)

let $forest-name := "YOUR_FOREST_NAME"

let $new-fast-data := "YOUR_NEW_FAST_DATA_DIR"

let $old-data := admin:forest-get-data-directory($config, admin:forest-get-id($config, $forest-name))

let $old-large-data := admin:forest-get-large-data-directory($config, admin:forest-get-id($config, $forest-name))

return
admin:save-configuration(admin:forest-delete(
$config, admin:forest-get-id($config, $forest-name),
fn:false())),

let $config1 := admin:get-configuration()
return
admin:save-configuration(admin:forest-create(
    $config1,
    $forest-name,
    xdmp:host(),
    $old-data,
    $old-large-data,
    $new-fast-data
))

You can create and attach the forest in a single transaction. This is also possible using the admin UI (as two separate transactions); i.e., deleting only configuration of forest without data.

After attaching the forest, please reindex and data will then migrate to FDD. Note that the sample query needs to be executed on the host where the forest resides.


 

 

Introduction

MarkLogic has shipped with a REST API since MarkLogic 7.

In MarkLogic 8 the REST API was vastly expanded, allowing ways for MarkLogic Database administrators to manage almost all common MarkLogic administration tasks over an HTTP connection to MarkLogic's REST endpoints.

This Knowledgebase article will cover some examples of common administration tasks and will show some working examples to give you a taste of what can be done if you're using the latest version of MarkLogic Server.

While there are a significant number of examples throughout our extensive documentation in this area, many of these make use of CURL. In this Knowledgebase article, we're going to use XQuery calls to demonstrate how the payloads are structured.

Creating a backup using a call to the REST API (XQuery)

In the example code below, we demonstrate a call that will perform a backup of the Documents forest which places the backup in the /tmp directory.

Running the query in the above code example will return a response (in JSON format) containing a job ID for the requested task:

{
"job-id": "4903378997555340415", 
"host-name": "yourhostnamehere"
}

The next example will demonstrate a status check for a given job ID

Query the status of an active or recent job

The above query will return a response that would look like this:

{
"job-id": "4903378997555340415", 
"host-name": "yourhostnamehere", 
"status": "completed"
}

Further reading on the MarkLogic REST API:

Alternatives to Configuration Manager

Overview

The MarkLogic Server Configuration Manager provided a read-only user interface to the MarkLogic Admin UI and could be used for saving and restoring configuration settings. The Configuration Manager tool was deprecated starting with MarkLogic 9.0-5, and is no longer available in MarkLogic 10.

Alternatives

There are a number of alternatives to the Configuration Manager. Most of the options take advantage of the MarkLogic Admin API, either directly or behind the scenes. The following is a list of the most commonly used options:

  • Manual Configuration
  • ml-gradle
  • Configuration Management API

Manual Configuration

For a single environment, the following Knowledge base covers the process of Transporting Resources to a New Cluster.

ml-gradle

For a repeatable process, the most widely used approach is ml-gradle.

A project would be created in Gradle, with the desired configurations. The project can then be used to deploy to any environment - test, prod, qa etc - creating a known configuration that can be maintained under source control, which is a best practice.

Similar to Configuration Manager, ml-gradle also allows for exporting the configuration of an existing cluster. You can refer to transporting configuration using ml-gradle for more details.

While ml-gradle is an open source community project that is not directly supported, it enjoys very good community and developer support.  The underlying APIs that ml-gradle uses are fully supported by MarkLogic.

Configuration Management API

An additional option is to use the Configuration Management API directly to export and import resources.

Summary

Both ml-gradle and the Configuration Management API use the MarkLogic Admin API behind the scenes but, for most use cases, our recommendation is to use ml-gradle rather than writing the same functionality from scratch.

Alternatives to Ops Director

Overview

The MarkLogic Ops Director provided a basic dashboard for monitoring the health one or more MarkLogic Server clusters, and sending out basic alerts based on predefined conditions. It has been deprecated starting with MarkLogic 10.0-5, and will no longer be supported as of November 14, 2021. Our experience has shown that our customers are most effective monitoring MarkLogic Servers by integrating commercial off the shelf monitoring tools with our Management APIs.

Monitoring DHS

Note: Our Data Hub Service customers are not impacted by this announcement. One of the benefits of using our Data Hub Service is that the MarkLogic Corporation will manage and monitor the underlying MarkLogic Server instances for you.

Alternatives

There are a number of different alternatives to Ops Director, depending on your requirements, and existing monitoring infrastructure. Ops Director used the Management API to obtain the required information, specifically the /manage/v2/logs endpoint to read the logs and look for any "Critical" or "Error" messages using a Regular Expression (Regex). These endpoints are still available, and could be leveraged by administrators with shell or python scripts, which could also include alerting.

Detecting and Reporting Failover Events

If there is also a requirement to monitor at the Host or Database level there are REST API endpoints available for any scripted solution. Performance related data stored in the Meters database can also be accessed via REST.

The MarkLogic Monitoring History can also be extended to provide some basic visualizations.

Hacking Monitoring History

Commercial Alternatives

If your requirements are more complex than can be easily met by the options above, there are many commercial monitoring solutions that can be used to monitor MarkLogic. These include Elastic/Kibana, Splunk, DataDog and NewRelic, among others. Many organizations are already using enterprise monitoring applications provided by a commercial vendor. Leveraging the existing solutions will typically be the simplest option. If a monitoring solution already exists within your organization, you can check to see if there is an existing plugin, extension or library for monitoring MarkLogic.

If a plugin, extension or library does not already exist, most monitoring solutions also allow for retrieving data from REST endpoints, allowing them to pull metrics directly from MarkLogic even if a there is not a pre-existing solution.

Available Plugins - Extensions - Libraries

Here are a sample of some of the available options that are being used by other customers to monitor their MarkLogic infrastructure. These options are being listed here for reference only. MarkLogic Support does not provide support for any issues encountered using the productions mentioned here. Please refer to the solution vendor, or the github project page for any issues encountered.

Splunk

MarkLogic Monitoring for Splunk provides configurations and pre-built dashboards that deliver real-time visibility into Error, Access, and Audit log events to monitor and analyze MarkLogic logs with Splunk.

DataDog

Monitor MarkLogic with Datadog

AppDynamics

https://github.com/Appdynamics/marklogic-monitoring-extension

Elastic/Kibana

Elastic/Kibana

New Relic

MarkLogic New Relic Plugin

Note: Currently there is a published New Relic Plugin that works with the latest versions of MarkLogic. However, New Relic has decided to deprecate plugins in favor of New Relic Integrations. Currently New Relic has limited plugin access to accounts that have access plugins in the last 30 days, but they plan to discontinue this access in June, 2021.

Other Resources

Summary

On Internet Explorer 9 and Internet Explorer 10, application services UI should be run in Compatibility Mode.

Details:

When using the Application Services UI in Internet Explorer 9 or Internet Explorer 10, you may notice some minor UI bugs.  These minor UI bugs occur just within MarkLogic Application Services, NOT within application built with it.  These UI bugs can be avoided if you run IE 9 or IE 10 in compatibility view.

Instructions on how to configure compatibility modes in IE 9 or IE 10: 

1. Press ALT-T to bring up the Tools menu
2. On the Tools menu, click 'Compatibility View Settings' 
3. Add the domain to the list of domains to render in compatibility view.

Introduction

A question that customers frequently ask is for advice on managing backups outside the standard XQuery APIs or the web interface provided by MarkLogic.

This Knowledgebase article demonstrates two approaches to allow you to integrate the backup of a MarkLogic database into your dev-ops workflow by allowing such processes to be scripted or managed outside the product.

Creating a backup using the ReST API

You can use the ReST API to perform a database backup and to check on the status at any given time.

The examples listed below use XQuery to make the calls to the ReST API over http but you could similarly adapt the below examples to work with cURL - examples will also be given for this approach.

The process

Here is an example that demonstrates a backup of the Documents database:

Running this should give you a job id as part of the response (in this example, we're using JSON to format the response but this can easily be changed by modifying the headers elements in the above sample to return application/xml instead):

{"job-id":"8774639830166037592", "host-name":"yourhostnamehere"}

Below is an example that demonstrates checking for the status of a given backup with the job-id given in the first step:

Example: using cURL (instead of XQuery)

Adapting the above examples so they work from cURL instead, you can generate a call that looks like this:

curl -s -X POST  --anyauth -u username:password --header "Content-Type:application/json" -d '{"operation": "backup-database", "backup-dir": "/tmp/backup", "journal-archiving": true, "include-replicas": true}'  http://localhost:8002/manage/v2/databases/Documents\?format\=json

And to check on the status, the cURL payload could be modified to look like this:

{"operation": "backup-status", "job-id" : "8774639830166037592","host-name": "yourhostnamehere"}

Further reading

Summary

Customers using the MarkLogic AWS Cloud Formation Templates may encounter a situation where someone has deleted an EBS volume that stored MarkLogic data (mounted at /var/opt/MarkLogic).  Because the volume, and the associated data are no longer available, the host is unable to rejoin the cluster.  

Getting the host to rejoin the cluster can be complicated, but it will typically be worth the effort if you are running an HA configuration with Primary and Replica forests.

This article details the procedures to get the host to rejoin the cluster.

Preparing the New Volume and New Host

The easiest way to create the new volume is using a snapshot of an existing host's MarkLogic data volume.  This saves the work of manually copying configuration files between hosts, which is necessary to get the host to rejoin the cluster.

In the AWS EC2 Dashboard:Elastic Block Store:Volumes section, create a snapshot of the data volume from one of the operational hosts.

Next, in the AWS EC2 Dashboard:Elastic Block Store:Snapshots section, create a new volume from the snapshot in the correct zone and note the new volume id for use later.

(optional) Update the name of the new volume to match the format of the other data volumes

(optional) Delete the snapshot

Edit the Auto Scaling Group with the missing host to bring up a new instance, by increasing the Desired Capacity by 1

This will trigger the Auto Scaling Group to bring up a new instance. 

Attaching the New Volume to the New Instance

Once the instance is online, and startup is complete connect to the new instance via ssh

Ensure MarkLogic is not running, by stopping the service and checking for any remaining processes.

  • sudo service MarkLogic stop
  • pgrep -la MarkLogic

Remove /var/opt/MarkLogic if it exists, and is mounted on the root partition.

  • sudo rm -rf /var/opt/MarkLogic

Edit /var/local/mlcmd and update the volume id listed in the MARKLOGIC_EBS_VOLUME variable to the volume created above.

  • MARKLOGIC_EBS_VOLUME="[new volume id],:25::gp2::,*"

Run mlcmd to attach and mount the new volume to /var/opt/MarkLogic on the instance

  • sudo /opt/MarkLogic/mlcmd/bin/mlcmd init-volumes-from-system
  • Check that the volume has been correctly attached and mounted

Remove contents of /var/opt/MarkLogic/Forests (if they exist)

  • sudo rm -rf /var/opt/MarkLogic/Forests/*

Run mlcmd to sync the new volume information to the DynamoDB table

  • sudo /opt/MarkLogic/mlcmd/bin/mlcmd sync-volumes-to-mdb

Configuring MarkLogic With Empty /var/opt/MarkLogic

If you did not create your volume from a snapshot as detailed above, complete the following steps.  If you created your volume from a snapshot, then skip these steps, and continue with Configuring MarkLogic and Rejoining Existing Cluster

  • Start the MarkLogic service, wait for it to complete its initialization, then stop the MarkLogic service:
    • sudo service MarkLogic start
    • sudo service MarkLogic stop
  • Move the configuration files out of /var/opt/MarkLogic/
    • sudo mv /var/opt/MarkLogic/*.xml /secure/place (using default settings; destination can be adjusted)
  • Copy the configuration files from one of the working instances to the new instance
    • Configuration files are stored here: /var/opt/MarkLogic/*.xml
    • Place a copy of the xml files on the new instance under /var/opt/MarkLogic

Configuring MarkLogic and Rejoining Existing Cluster

Note the host-id of the missing host found in /var/opt/MarkLogic/hosts.xml

  • For example, if the missing host is ip-10-0-64-14.ec2.internal
    • sudo grep "ip-10-0-64-14.ec2.internal" -B1 /var/opt/MarkLogic/hosts.xml

  • Edit /var/opt/MarkLogic/server.xml and update the value for host-id to match the value retrieved above

Start MarkLogic and view the ErrorLog for any issues

  • sudo service MarkLogic start; sudo tail -f /var/opt/MarkLogic/Logs/ErrorLog.txt

You should see messages about forests synchronizing (if you have local disk failover enabled, with replicas) and changing states from wait or async replication to sync replication.  Once all the forests are either 'open' or 'sync replicating', then your cluster is fully operational with the correct number of hosts.

At this point you can fail back to the primary forests on the new instances to rebalance the workload for the cluster.

You can also re-enable xdqp ssl enabled, by setting the value to true on the Group Configuration page, if you disabled the setting as part of these procedures.

Update the Userdata In the Auto Scaling Group

To ensure that the correct volume will be attached if the instance is terminated, the Userdata needs to be updated in a Launch Configuration.

Copy the Launch Configuration associated with the missing host.

Edit the details

  • (optional) Update the name of the Launch Configuration
  • Update the User data variable MARKLOGIC_EBS_VOLUME and replace the old volume id with the id for the volume created above.
    • MARKLOGIC_EBS_VOLUME="[new volume id],:25::gp2::,*"
  • Save the new Launch Configuration

Edit the Auto Scaling Group associated with the new node

Change the Launch Configuration to the one that was just created and save the Auto Scaling Group.

Next Steps

Now that normal operations have been restored, it's a good opportunity to ensure you have all the necessary database backups, and that your backup schedule has been reviewed to ensure it meets your requirements.

Introduction

Microsoft Azure Key Vault TLS certificates are being migrated to use the DigiCert Root G2 CA from the existing Baltimore CA.

Impact on MarkLogic Server

As Marklogic Server is currently not shipped with the DigiCert Root G2 CA certificate in the Certificate Authorities store, the following issues can occur if MarkLogic uses a new or migrated Azure Key Vault endpoint.

1. Any call to an Azure Key Vault endpoint service using xdmp:http-* (XQuery) or xdmp.http* (Javascript) will fail with a certificate verification error.

2. If Azure Key Vault is used as an external Key Management Store (KMS), calls to the Azure Key Vault to retrieve the encryption keys will fail, and any encrypted Forests will not be mounted.

Solution

Until MarkLogic Server is updated to include the required DigiCert Root G2 Certificate, you can use the following procedures to address these issues.

1. Download the DigiCert Root G2 CA certificate and import it to the MarkLogic Security Database using the Admin UI. Configure->Security->Certificate Authorities->Import

DigiCert Global Root G2 Download

2. For users who have enabled Encryption at Rest in MarkLogic Server, the following additional step is required.

i. Copy the DigiCert Global Root G2 PEM certificate downloaded above to the MarkLogic Server node.

ii. Append the PEM contents to the following file in the MarkLogic Server installation directory.

./MarkLogic/Config/azurekeyvault-ca.pem

-----BEGIN CERTIFICATE-----
MIIDdzCCAl+gAwIBAgIEAgAAuTANBgkqhkiG9w0BAQUFADBaMQswCQYDVQQGEwJJ
RTESMBAGA1UEChMJQmFsdGltb3JlMRMwEQYDVQQLEwpDeWJlclRydXN0MSIwIAYD
VQQDExlCYWx0aW1vcmUgQ3liZXJUcnVzdCBSb290MB4XDTAwMDUxMjE4NDYwMFoX
DTI1MDUxMjIzNTkwMFowWjELMAkGA1UEBhMCSUUxEjAQBgNVBAoTCUJhbHRpbW9y
ZTETMBEGA1UECxMKQ3liZXJUcnVzdDEiMCAGA1UEAxMZQmFsdGltb3JlIEN5YmVy
VHJ1c3QgUm9vdDCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAKMEuyKr
mD1X6CZymrV51Cni4eiVgLGw41uOKymaZN+hXe2wCQVt2yguzmKiYv60iNoS6zjr
IZ3AQSsBUnuId9Mcj8e6uYi1agnnc+gRQKfRzMpijS3ljwumUNKoUMMo6vWrJYeK
mpYcqWe4PwzV9/lSEy/CG9VwcPCPwBLKBsua4dnKM3p31vjsufFoREJIE9LAwqSu
XmD+tqYF/LTdB1kC1FkYmGP1pWPgkAx9XbIGevOF6uvUA65ehD5f/xXtabz5OTZy
dc93Uk3zyZAsuT3lySNTPx8kmCFcB5kpvcY67Oduhjprl3RjM71oGDHweI12v/ye
jl0qhqdNkNwnGjkCAwEAAaNFMEMwHQYDVR0OBBYEFOWdWTCCR1jMrPoIVDaGezq1
BE3wMBIGA1UdEwEB/wQIMAYBAf8CAQMwDgYDVR0PAQH/BAQDAgEGMA0GCSqGSIb3
DQEBBQUAA4IBAQCFDF2O5G9RaEIFoN27TyclhAO992T9Ldcw46QQF+vaKSm2eT92
9hkTI7gQCvlYpNRhcL0EYWoSihfVCr3FvDB81ukMJY2GQE/szKN+OMY3EU/t3Wgx
jkzSswF07r51XgdIGn9w/xZchMB5hbgF/X++ZRGjD8ACtPhSNzkE1akxehi/oCr0
Epn3o0WC4zxe9Z2etciefC7IpJ5OCBRLbf1wbWsaY71k5h+3zvDyny67G7fyUIhz
ksLi4xaNmjICq44Y3ekQEe5+NauQrz4wlHrQMz2nZQ/1/I6eYs9HRCwBXbsdtTLS
R9I4LtD+gdwyah617jzV/OeBHRnDJELqYzmp
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIDjjCCAnagAwIBAgIQAzrx5qcRqaC7KGSxHQn65TANBgkqhkiG9w0BAQsFADBh
MQswCQYDVQQGEwJVUzEVMBMGA1UEChMMRGlnaUNlcnQgSW5jMRkwFwYDVQQLExB3
d3cuZGlnaWNlcnQuY29tMSAwHgYDVQQDExdEaWdpQ2VydCBHbG9iYWwgUm9vdCBH
MjAeFw0xMzA4MDExMjAwMDBaFw0zODAxMTUxMjAwMDBaMGExCzAJBgNVBAYTAlVT
MRUwEwYDVQQKEwxEaWdpQ2VydCBJbmMxGTAXBgNVBAsTEHd3dy5kaWdpY2VydC5j
b20xIDAeBgNVBAMTF0RpZ2lDZXJ0IEdsb2JhbCBSb290IEcyMIIBIjANBgkqhkiG
9w0BAQEFAAOCAQ8AMIIBCgKCAQEAuzfNNNx7a8myaJCtSnX/RrohCgiN9RlUyfuI
2/Ou8jqJkTx65qsGGmvPrC3oXgkkRLpimn7Wo6h+4FR1IAWsULecYxpsMNzaHxmx
1x7e/dfgy5SDN67sH0NO3Xss0r0upS/kqbitOtSZpLYl6ZtrAGCSYP9PIUkY92eQ
q2EGnI/yuum06ZIya7XzV+hdG82MHauVBJVJ8zUtluNJbd134/tJS7SsVQepj5Wz
tCO7TG1F8PapspUwtP1MVYwnSlcUfIKdzXOS0xZKBgyMUNGPHgm+F6HmIcr9g+UQ
vIOlCsRnKPZzFBQ9RnbDhxSJITRNrw9FDKZJobq7nMWxM4MphQIDAQABo0IwQDAP
BgNVHRMBAf8EBTADAQH/MA4GA1UdDwEB/wQEAwIBhjAdBgNVHQ4EFgQUTiJUIBiV
5uNu5g/6+rkS7QYXjzkwDQYJKoZIhvcNAQELBQADggEBAGBnKJRvDkhj6zHd6mcY
1Yl9PMWLSn/pvtsrF9+wX3N3KjITOYFnQoQj8kVnNeyIv/iPsGEMNKSuIEyExtv4
NeF22d+mQrvHRAiGfzZ0JFrabA0UWTW98kndth/Jsw1HKj2ZL7tcu7XUIOGZX1NG
Fdtom/DzMNU+MeKNhJ7jitralj41E6Vf8PlwUHBHQRFXGU7Aj64GxJUTFy8bJZ91
8rGOmaFvE7FBcf6IKshPECBV1/MUReXgRPTqh5Uykw7+U0b6LJ3/iyK5S9kJRaTe
pLiaWN0bfVKfjllDiIGknibVb63dDcY3fe0Dkhvld1927jyNxF1WW6LZZm6zNTfl
MrY=
-----END CERTIFICATE-----

Note: This file will need to be updated on each node in a MarkLogic Server Cluster

References

https://learn.microsoft.com/en-us/azure/security/fundamentals/tls-certificate-changes

https://docs.marklogic.com/pki:insert-trusted-certificates

Backup/Restore settings for Local Disk Failover

When configuring backups for a database, the 'include replica forests' setting is important  in order to handle forest failover events.   When 'include replica forests' is set to 'true', both the master and the replica forests will also be included in the database backup.

This KB article will go over an example failover scenario, and will show how a scheduled backup/restore works with different 'include replica forests' and 'journal archiving' settings.

Scenario

Consider a 3 node cluster with hosts Host-A, Host-B and Host-C; and a database 'backup-test' with the following forest assignments: (forests ending with 'p' are primary and those ending with 'r' are replica).  Under normal conditions, the primary forests will be in 'open' state, and the replica forests will be in the 'sync replicating' state.

Host A Host B Host C
forest-1p (open) forest-2p(open) forest-3p(open)
forest-3r (sync replicating) forest-1r (sync replicating) forest-2r (sync replicating)


Failover and Forest states

Now consider what happens when Host-A goes offline. When Host-A's primary forests complete failover, it's replica forests will take over.   The following will be the forest state layout when this happens

Host A Host B Host C
forest-1p (disabled) forest-2p (open) forest-3p (open)
forest-3r (disabled) forest-1r (open) forest-2r (sync replicating)

Backup Examples: 

When 'Include replica Forests' is false and 'Journal Archiving' is true

Forest 1p is disabled, and the corresponding replica forest-1r is now Open because of the failover.  In this case a backup task will not succeed during this time because replica forests have not been configured for backups. The following 'Warning' level message will be logged:

Warning: Not backing up database backup-test because first forest master forest-1p is not available, and replica backups aren't enabled

When Host-A is brought up again, the forest states will be

forest-1p - sync replicating
forest-1r - open

At this time, backups will succeed and because journal archiving is enabled, journals will be written to the backup data.

However, you will not be able to do a "point in time restore' using journal archiving. When the configured master is not the acting master and backup is not enabled for replicas, the following error occurs when a restore to a point in time is attempted :

Operation failed with error message: xdmp:database-restore((xs:unsignedLong("5138652658926200166"), "/space/20160927-1125008228810", xs:dateTime("2016-09-27T11:06:21-07:00"), fn:true(), ()) -- Unable to restore replica forest forest-1r because the master forest forest-1p is not also restored, or is not acting master. Check server logs.

To get past this, the forests need to be failed back in order to make the 'configured master' same as the 'acting master'

When 'Include replica forests' is true and 'Journal Archiving' is true

In this case, backups will succeed when forests are failed over to their replica forests because replica forests are configured for backups. And, because journal archiving is enabled, journals will be also written to the backup data.

Even in this case, point in time restore will not work similar to the previous case, until the forests are failed back.

Related documentation

MarkLogic Administrator's Guide: Backing up and Restoring a Database Following Local Disk Failover 

MarkLogic Administrator's Guide: Restoring Databases with Journal Archiving

MarkLogic Knowledgebase Article: Understanding the role of journals in relation to backup and restore journal archiving

MarkLogic Knowledgebase Article: Database backup / restore and local disk failover

Before executing significant operational procedures on production systems, such as

  • Production Go Live events;
  • Major version Upgrades;
  • Adding/removing nodes to a cluster;
  • Deploying a new application or an application upgrade;
  • ...

MarkLogic recommends:

  • Thorough testing of any operational procedures on non-production systems.
  • Opening a ticket with MarkLogic Technical Support to give them a heads up, along with any useful collateral that would help expedite diagnostics of issues if any occur, such as
    • The finalized plan & timeline or schedule of the operational procedure
    • support dump, taken before the operational procedure, in order to record the configuration of the system ahead of time; This may come in handy if an incident occurs as we may want to know the actual changes that had been made. You can create a MarkLogic Server support dump from our Admin UI by selecting the 'Support' tab; select scope=cluster, detail=status only, destination=browser -> save output to disk. Attach the support dump to the ticket as a file either as an email attachment or uploading through our support portal. 
    • A few days of error logs from before the operational event so that we can determine whether artifacts in the error logs are new or whether they existed prior to the event.
    • You can alternatively turn Telemetry on before the event and force an upload of the support dump & error logs.
    • Any architecture or design details of the system that you are able to share.
  • Please make sure that all individuals who are responsible for the event and who may need to contact the MarkLogic Technical Support team are registered MarkLogic Support contacts. They can register for an account per instructions available at https://help.marklogic.com/marklogic/AccountRequest.  They will want to register before the event as ONLY registered support contacts can create a ticket with MarkLogic Technical Support. We do not want registration and entitlement verification to get in the way of the ability to work on an urgent production issue.
  • Review the MarkLogic Support Handbook - http://www.marklogic.com/files/Mark_Logic_Support_Handbook.pdf. The following sections in the "HOW TO RECEIVE SUPPORT SERVICES" chapter of the handbook are useful to be acquainted with before an incident occurs
    • Section: What to do Prior to Logging a Service Request 
    • Section: Working with Support
    • Section: Escalation Process
    • Section: Understanding Case Priority and Response Time Targets
  • For urgent issues (production outages), remember that you can raise an urgent incident per the instructions in the support handbook; MarkLogic takes urgent incidents seriously, as every urgent issue results in a text message being sent to every support engineer, engineering management and the senior executive at MarkLogic. 
  • Enable Debug level logging so that any issues that arise can be more easily diagnosed.  Debug level logging does not have any noticeable impact on system performance.

Summary

In some cases it is required to change the default environment variables of a MarkLogic Server installation or configuration

Making Changes to Defaults

When changes to the default configurations need to be made, we recommend using /etc/marklogic.conf to make those changes. The file will not exist in a default installation, and should be manually created. We recommend the file only contain the variables that are being changed or added. This file will also be unaffected by MarkLogic upgrades.

Note: We do not recommend making changes to /etc/sysconfig/MarkLogic, as this file is part of the MarkLogic installation package, and it may be replaced or changed during a MarkLogic upgrade with no notification. Any direct file customizations will be overwritten and lost, which can result in various problems when the MarkLogic service is restarted.

During startup, MarkLogic will first source its own environment variable file, and then it will source /etc/marklogic.conf, which ensures the locally defined variables take precedence.

Changing the Default Data Directory

A common use of the /etc/marklogic.conf file is to change the default data directory (/var/opt/MarkLogic).

export MARKLOGIC_DATA_DIRECTORY = "/my/custom/path/MarkLogic"

If that file exists when the server is first initialized, then MarkLogic will run from the custom location. If MarkLogic has already been initialized, then you may need to stop the service and manually move /var/opt/MarkLogic to your custom location.

Using the MarkLogic AMI

When using the MarkLogic AMI, without using the MarkLogic Cloud Formation template, it is necessary to create /etc/marklogic.conf to disable the Managed Cluster feature.

export MARKLOGIC_MANAGED_NODE = 0

If this is done after the instance is launched, then you may encounter the issue mentioned in the KB SVC_SOCHN Warning During Start Up on AWS.

Common Configurable Variables

  • MARKLOGIC_INSTALL_DIR - Where the MarkLogic binaries are installed
  • MARKLOGIC_DATA_DIR - Where MarkLogic stores configurations and forest data
  • MARKLOGIC_EC2_HOST - Whether MarkLogic will utilize EC2 specific features and settings
  • MARKLOGIC_AZURE_HOST - Whether MarkLogic will utilize Azure specific features and settings
  • MARKLOGIC_MANAGED_NODE - Whether MarkLogic will utilize the Managed Cluster feature
  • MARKLOGIC_USER - User that MarkLogic runs as
  • MARKLOGIC_HOSTNAME - Manually set the MarkLogic host name. Must be set prior to initialization or the hostname from the OS will be used
  • TZ - Allows for MarkLogic to operate with a different time zone setting than the OS

Further reading

Best Practice for Adding an Index in Production

Summary

It is sometimes necessary to remove or add an index to your production cluster. For a large database with more than a few GB of content, the resulting workload from reindexing your database can be a time and resource intensive process, that can affect query performance while the server is reindexing. This article points out some strategies for avoiding some of the pain-points associated with changing your database configuration on a production cluster.

Preparing your Server for Production

In general, high performance production search implementations run with tight controls on the automatic features of MarkLogic Server. 

  • Re-indexer disabled by default
  • Format-compatibility set to the latest format
  • Index-detection set to none.
  • On a very large cluster (several dozen or more hosts), consider running with expunge-locks set to none
  • On large clusters with insufficient resources, consider bumping up the default group settings
    • xdqp-timeout: from 10 to 30
    • host-timeout: from 30 to 90

The xdqp and host timeouts will prevent the server from disconnecting prematurely when a data-node is busy, possibly triggering a false failover event. However, these changes will affect the legitimate time to failover in an HA configuration. 

Preparing to Re-index

When an index configuration must be changed in production, you should:

  • First, index-detection should be set back to automatic
  • Then, the index configuration change should be made

When you have Database Replication Configured:

If you have to add or modify indexes on a database which has database replication configured, make sure the same changes are made on the Replica cluster as  well. Starting with ML server version 9.0-7, index data is also replicated from the Master to the Replica, but it does not automatically check if both sides have the same index settings. Reindexing is disabled by default on a replica cluster. However, when database replication configuration is removed (such as after a disaster),  the replica database will reindex as necessary. So it is important that the Replica database index configuration matches the Master’s to avoid unnecessary reindexing.

Note: If you are on a version prior to 9.0-7 - When adding/updating index settings, it is recommended that you update the settings on the Replica database before updating those on the Master database; this is because changes to the index settings on the Replica database only affect newly replicated documents and will not trigger reindexing on existing documents.

Further reading -

Master and Replica Database Index Settings

Database Replication - Indexing on Replica Explained

  • Finally, the reindexer should be enabled during off-hours to reindex the content.

Reindexing works by reloading all the Uris that are affected by the index change, this process tends to create lots of new/deleted fragments which then need to be merged. Given that reindexing is very CPU and disk I/O intensive, the re-indexer-throttle can be set to 3 or 2 to minimize impact of the reindex.

After the Re-index

After the re-index has completed, it is important to return to the old settings by disabling the reindexer and setting index-detection back to none.

If you're reindexing over several nights or weekends, be sure to allow some time for the merging to complete. So for example, if your regular busy time starts at 5AM, you may want to disable the reindexer at around midnight to make sure all your merging is completed before business hours.

By following the above recommendations, you should be able to complete a large re-index without any disruption to your production environment.

Summary

MarkLogic Server can ingest and query all sorts of data such as XMLtextJSON, binary, generic, etc. There are some things to consider when choosing to simply load data "as-is" vs. doing some degree of data modeling or data transformation prior to ingestion.

Details

Loading data "as-is" can minimize time and complexity during ingest or document creation. That can, however, sometimes mean more complex, slower performing queries. It may also mean more storage space intensive indexing settings.

In contrast, doing some degree of data transformation prior to ingestion can sometimes result in dramatic improvements in query performance and storage space utilization due to reduced indexing requirements.

An Example

A simple example will demonstrate the how a data model can affect performance. Consider the data model used by Apple's iTunes:

<plist version="1.0">
<dict>
  <key>Major Version</key><integer>10</integer>
  <key>Minor Version</key><integer>1</integer>
  <key>Application Version</key><string>10.1.1</string>
  <key>Show Content Ratings</key><true/>
  <dict>
    <key>Track ID</key><integer>290</integer>
    <key>Name</key><string>01-03 Good News</string>
          …
  </dict>
</dict>
 

Note the multiple <key> sibling elements, at multiple levels - where both levels are named the same thing (in this case, <dict>). Let's say you wanted to query a document like this for "Application Version." In this case, time will be spent performing index resolution for the encompassing element (here, <key>). Unfortunately, because there are multiple sibling elements all sharing the same element name, all of those sibling elements will need to be retrieved and then evaluated to see which of them actually match the given query criteria. Consider a slightly revised data model, instead:

 

<iTunesLibrary version="1.0">
<application>
  <major-version>10</major-version>
  <minor-version>1</minor-version>
  <app-version>10.1.1</app-version>
  <show-content-ratings>true</show-content-ratings>
  <tracks>
    <track-id>290</track-id>
    <name>01-03 Good News</name>
          …
  </tracks>
</application>

Here, we only need to query and therefore retrieve and evaluate the single <app-version> element, instead of multiple retrievals/evaluations as in the previous example data model.  

At Scale

Although this is a simple example, when processing millions or even billions of records, eliminating small processing steps could have significant performance impact.

BEST PRACTICES FOR EXPORTING AND IMPORTING DATA IN BULK

Handling large amounts of data can be expensive in terms of both computing resources and runtime. It can also sometimes result in application errors or partial execution. In general, if you’re dealing with large amounts of data as either output or input, the most scalable and robust approach is to break-up that workload into a series of smaller and more manageable batches.

Of course there are other available tactics. It should be noted, however, that most of those other tactics will have serious disadvantages compared to batching. For example:

  • Configuring time limit settings through Admin UI to allow for longer request timeouts - since you can only increase timeouts so much, this is best considered a short term tactic for only slightly larger workloads.
  • Eliminating resource bottlenecks by adding more resources – often easier to implement compared to modifying application code, though with the downside of additional hardware and software license expense. Like increased timeouts, there can be a point of diminishing returns when throwing hardware at a problem.
  • Tuning queries to improve your query efficiency – this is actually a very good tactic to pursue, in general. However, if workloads are sufficiently large, even the most efficient implementation of your request will eventually need to work over subset batches of your inputs or outputs.

For more detail on the above non-batching options, please refer to XDMP-CANCELED vs. XDMP-EXTIME.

WAYS TO EXPORT LARGE AMOUNTS OF DATA FROM MARKLOGIC SERVER

1.    If you can’t break-up the data into a series of smaller batches - use xdmp:save to write out the full results from query console to the desired folder, specified by the path on your file system. For details, see xdmp:save.

2.    If you can break-up the data into a series of smaller batches:

            a.    Use batch tools like MLCP, which can export bulk output from MarkLogic server to flat files, a compressed ZIP file, or an MLCP database archive. For details, see Exporting Content from MarkLogic Server.

            b.    Reduce the size of the desired result set until it saves successfully, then save the full output in a series of batches.

            c.    Page through result set:

                               i.     If dealing with documents, cts:uris is excellent for paging through a list of URIs. Take a look at cts:uris for more details.

                               ii.     If using Semantics

                                             1.    Consider exporting the triples from the database using the Semantics REST endpoints.

                                             2.    Take a look at the URL parameters start? and pageLength? – these parameters can be configured in your SPARQL query to return the results in batches.  See GET /v1/graphs/sparql for further details.

WAYS TO IMPORT LARGE AMOUNTS OF DATA INTO MARKLOGIC SERVER

1.    If you’re looking to update more than a few thousand fragments at a time, you'll definitely want to use some sort of batching.

             a.     For example, you could run a script in batches of say, 2000 fragments, by doing something like [1 to 2000], and filtering out fragments that already have your newly added element. You could also look into using batch tools like MLCP

             b.    Alternatively, you could split your input into smaller batches, then spawn each of those batches to jobs on the Task Server, which has a configurable queue. See:

                            i.     xdmp:spawn

                            ii.    xdmp:spawn-function

2.    Alternatively, you could use an external/community developed tool like CoRB to batch process your content. See Using Corb to Batch Process Your Content - A Getting Started Guide

3.    If using Semantics and querying triples with SPARQL:

              a.    You can make use of the LIMIT keyword to further restrict the result set size of your SPARQL query. See The LIMIT Keyword

              b.    You can also use the OFFSET keyword for pagination. This keyword can be used with the LIMIT and ORDER BY keywords to retrieve different slices of data from a dataset. For example, you can create pages of results with different offsets. See  The OFFSET Keyword

Introduction

This article outlines various factors influencing the performance of xdmp:collection-delete function and furthermore provides general best practices for improving the performance of large collection deletes.

What are collections?

Collections in MarkLogic Server are used to organize documents in a database. Collections are a powerful and high-performance mechanism to define and manage subsets of documents.

How are collections different from directories?

Although both collections and directories can be used for organizing documents in a database, there are some key differences. For example:

  • Directories are hierarchical, whereas collections are not. Consequently, collections do not require member documents to conform to any URI patterns. Additionally, any document can belong to any collection, and any document can also belong to multiple collections
  • You can delete all documents in a collection with the xdmp:collection-delete function. Similarly, you can delete all documents in a directory (as well as all recursive subdirectories and any documents in those directories) with a different function call - xdmp:directory-delete
  • You can set properties on a directory. You cannot set properties on a collection

For further details, see Collections versus Directories.

What is the use of the xdmp:collection-delete function?

xdmp:collection-delete is used to delete all documents in a database that belong to a given collection - regardless of their membership in other collections.

  • Use of this function always results in the specified unprotected collection disappearing. For details, see Implicitly Defining Unprotected Collections
  • Removing a document from a collection and using xdmp:collection-delete are similarly contingent on users having appropriate permissions to update the document(s) in question. For details, see Collections and Security
  • If there are no documents in the specified collection, then nothing is deleted, and the function still returns the empty sequence

What factors affect performance of xdmp:collection-delete?

The speed of xdmp:collection-delete depends on several factors:

Is there a fast operation mode available within the call xdmp:collection-delete?

Yes. The call xdmp:collection-delete("collection-uri") can potentially be fast in that it won't retrieve fragments. Be aware, however, that xdmp:collection-delete will retrieve fragments (and therefore perform much more slowly) when your database is configured with any of the following:

What are the general best practices in order to improve the performance of large collection deletes?

  • Batch your deletes
    • You could use an external/community developed tool like CoRB to batch process your content
    • Tools like CoRB allow you to create a "query module" (this could be a call to cts:uris to identify documents from a number of collections) and a "transform module" that works on each URI returned. CoRB will run the URI query and will use the results to feed a thread pool of worker threads. This can be very useful when dealing with large bulk processing. See: Using Corb to Batch Process Your Content - A Getting Started Guide
  • Alternatively, you could split your input (for example, URIs of documents inside a collection that you want to delete) into smaller batches
    • Spawn each of those batches to jobs on the Task Server instead of trying to delete an entire collection in a single transaction
    • Use xdmp:spawn-function to kick off deletions of one document at a time - be careful not to overflow the task server queue, however
      • Don't spawn single document deletes
      • Instead, make batches of size that work most efficiently in your specific use case
    • One of the restrictions on the Task Server is that there is a set queue size - you should be able to increase the queue size as necessary
  • Scope deletes more narrowly with the use of cts:collection-query

Related knowledgebase articles:

 

Introduction

MarkLogic Server delivers performance at scale, whether we're talking about large amounts of data, users, or parallel requests. However, people do run into performance issues from time to time. Most of those performance issues can be found ahead of time via well-constructed and well-executed load testing and resource provisioning.

There are three main aspects to load testing against and resource provisioning for MarkLogic:

  1. Building your load testing suite
  2. Examining your load testing results
  3. Addressing hot spots

Building your load testing suite

The biggest issue we see with problematic load testing suites is unrepresentative load. The inaccuracy can be in the form of missing requests, missing query inputs, unanticipated query inputs, unanticipated or underestimated data growth rates, or even a population of requests that skews towards different load profiles compared to production traffic. For example - a given load test might heavily exercise query performance, only to find in production that ingest requests represent the majority of traffic. Alternatively, perhaps one kind of query represents the bulk of a given load test when in reality that kind of query is dwarfed by the number of invocations of a different kind of query.

Ultimately, to be useful, a given load test needs to be representative of production traffic. Unfortunately, the less representative a load test is, the less useful it will be.

Examining your load testing results

Beginning with version 7.0, MarkLogic Server ships a Monitoring History dashboard, visible from any host in your cluster at port 8002/history. The Monitoring History dashboard will illustrate the usage of resources such as CPU, RAM, disk I/O, etc... both at the cluster and individual host levels. The Monitoring History dashboard will also illustrate the occurrence of read and write locks over time. It's important to get a handle on both resource and lock usage in the course of your load test as both will limit the performance of your application - but the way to address those performance issues depends on which class of usage is most prevalent.

Addressing hot spots

By having a representative load test and closely examining your load testing results, you'll likely find hot spots or slow performing parts of your application. MarkLogic Server's Monitoring History allows you to correlate resource and lock usage over time against the workload being submitted by your load tests. Once you find a hot spot, it's worthwhile examining it more closely by either running those requests in isolation or at larger scales. For example, you could run 4x and 16x the number of parallel requests, or 4x and 16x the number of inputs to an individual request - both of which will give you an idea of how the suspect requests scale in response to increased load.

Once you've found a hot spot - what should you do about it? Well, that ultimately depends on the kind of usage you're seeing in your cluster's Monitoring History. If it's clear that your suspect requests are running into a resource bound (for example, 100% utilization of CPU/RAM/disk I/O/etc.), then you'll either need to provision more of that limiting resource (either through more machines, or more powerful machines, or both), or reduce the amount of load on the system provisioned as-is. It may also be possible to re-architect the suspect request to be more efficient with regard to its resource usage.

Alternatively, you may find that your system is not, in fact, seeing a resource bound - where it appears there are plenty of spare CPU cycles/free RAM/low amounts of disk I/O/etc. If you're seeing poor performance in that situation, it's almost always the case that you'll instead see large spikes in the number of read/write locks taken as your suspect requests work through the system. Provisioning more hardware resources may help to some small degree in the presence of read/write locks, but what really needs to happen is the requests need to be re-architected to use as few locks as possible, and preferably to run completely lock free.

 

 

 

Introduction

While there are many different ways to define schemas in MarkLogic Server, one should be aware of both the location strategy the server will use (defined here: http://docs.marklogic.com/guide/admin/schemas), as well as the different locations in which your particular schema may reside.

Schema Location

Schemas can reside in either the Schemas database defined for your content database, or within the server's Config directory.  If there is no explicit schema map defined, the server will use the following schema location strategy:

1) If the XQuery program explicitly references a schema for the namespace in question, MarkLogic Server uses this reference.
2) Otherwise, MarkLogic Server searches the schema database for an XML schema document whose target namespace is the same as the namespace of the element that MarkLogic Server is trying to type.
3) If no matching schema document is found in the database, MarkLogic Server looks in its Config directory for a matching schema document.
4) If no matching schema document is found in the Config directory, no schema is found.

There can sometimes be issues with step #2 when there are multiple schema documents in the schema database whose target namespace matches the namespace of the element that MarkLogic Server is trying to type. In that situation, it would be best to explicitly define a default schema mapping - schema maps can be defined through the Admin API or the Admin User Interface. Be aware that you can define schema mappings at both the group level (in which case the mapping would then apply to all application servers in the group) or at the individual application server level.

Best Practices

Now that we know how the server locates schemas and where schema can potentially reside - what are the best practices?

In general, it's best to localize your schema impacts as narrowly as possible. For example, instead of using a single Schemas database or the server's one and only Config directory, it would instead be better to define a specific Schemas database that would be used for the relevant content database. Similarly, unless you know you need a defined schema mapping to apply to every application server in a group, it would instead be better to define your schema mappings at the application server level as opposed to the group level.

Summary

Although not exhaustive, this article lists some best practices for the use of MarkLogic Server and Amazon's VPC

Details

  1. Nodes within a MarkLogic cluster need to communicate with one another directly, without the presence of a load balancer in-between them.
  2. Whether in the context of a VPC or not, before attempting to join a node to a cluster, one should verify whether each node is able to ping or to ssh from the one node to the other (or vice versa). If you're not able to ping or ssh from one machine to another, then issues seen during a MarkLogic cluster join is very likely to be localized to the network configuration and should be diagnosed at the network layer.
  3. The following items should be double-checked when using VPCs:
    1. If a private subnet is used for any MarkLogic instance, that subnet needs access to the public internet for the following situations:
      1. If Managed Cluster support is used, MarkLogic requires access to AWS services which require outbound connectivity to the internet (at minimum to the AWS service web sites).
      2. If foreign clusters are used then MarkLogic needs to connect to all hosts in the foreign cluster
      3. If Amazon S3 is used then MarkLogic needs to communicate with the S3 public web services.
    2. It is assumed that the creator of the VPC has properly configured all subnets which MarkLogic needs to be installed to have outbound internet. There are many ways that private subnets can be configured to communicate outbound to the public internet. NAT instances are one example [AWS VPC NAT]. Another option is using DirectConnect to route outbound traffic through the organization's internet connection.
    3. All subnets which host instances running MarkLogic in the same cluster need to be able to communicate via port 7999.
    4. Inbound ssh connectivity is required for command line administration of each server requiring port 22 to be accessible from either a VPN or a public subnet.
    5. With regard to application traffic (as opposed to intra-cluster traffic as seen during cluster joining) connectivity to the MarkLogic server(s) needs to be open to whatever applications for which it is required. Application traffic can be sent through an internal or external load balancer, a VPN, direct access from applications in the same subnet or routing through another subnet.

Introduction

This knowledgebase article contains critical tips and best practices you'll need to know to best use MarkLogic Server with your favorite BI Tools.

BI Tool Q&A

Q: What's a TDE? Is that a Tableau Data Extract?

A: In MarkLogic terms, TDE stands for Template Driven Extraction. A template is a document (XML or JSON) that declares how a view is to be populated. It defines a context -- the root path of all the documents that are involved in this view -- then, for each column in the view, it defines a column name, type, and a path to the data inside the document. You can define the value of a column using several pieces of data in the document, plus some functions, even some programming operations such as IF. For example, if your documents have the "last-updated" year and month and day in different parts of the document, your Template can pull in those three pieces, concatenate them, then cast the result as a date.

Q: When modifying TDEs, do I need to reindex?

A: TDEs map an SQL-like view on top of MarkLogic. If you change an existing view, you do need to reindex the database. Before kicking off a resource- and time- intensive reindex, however, be aware that there are some TDE configurations that cannot be updated. You can read more about exactly which kinds of TDEs may or may not be updated at the following knowledgebase article: Updating a TDE View.

Q: Can MarkLogic handle queries that require a large number of columns?

A: Yes, but you'll want to pay attention to potential performance impacts. In general, it's much better to spread a large number of columns across multiple TDEs, instead of having a single TDE containing all those same columns. Data modeling is also important here - TDEs should be meaningful with regard to their intended use. Definitely check out MLU's Data Modeling Series, in particular Progressive Transformation using the Envelope Pattern and Impact of Normalization: Lessons Learned.

Q: What are some common patterns and antipatterns for good performance with BI tools?

A: First, avoid using Nullable columns in filters and drilldowns. There are optimizations in MarkLogic Server's SQL engine to detect patterns with "null" - but different BI tool generate their code in different ways and can sometimes result in code that circumvents those optimizations. In general, if performance is a priority, it's usually better to use an actual value such as "N/A" or "0".

Second, enable Query Reduction or similar options in your BI tool of choice. Without this option, if you choose to filter on a year - say "2018" - and then also select "2019", multiple SQL queries will be sent to MarkLogic in quick succession unnecessarily.

Q: What do I need to watch out for when connecting my BI tool to MarkLogic?

A: If performance is a priority, exercise caution when using joins. In general, the best practice is to create collections of data in MarkLogic that represent the subsets of data needed externally as closely as possible. You can learn more about what tools are available to see how many and what kind of joins are being used by your query in the What debugging tools are available for Optic, SQL, or SPARQL code in MarkLogic Server? knowledgebase article, and you can learn more about how to create more meaningful data models and subsets of your data models in the aforementioned MLU's Data Modeling Series, as well as in the MarkLogic World presentation Getting the Most from MarkLogic Semantics (also available in video form).

References

Introduction

If you're looking to use any of the interfaces built on top of MarkLogic's semantics engine (Optic API, SQL, or SPARQL) - you'll want to make sure you're using the best practices itemized in this knowledgebase article. It's not unusual to see one or even two orders of magnitude performance improvements, as a result. Note that this article is really just a distillation of the MarkLogic World presentation "Getting the Most from MarkLogic Semantics" - available in both pdf and YouTube formats.

Best Practices for Using Semantics at Scale

1) Scope your query - more constrained queries will do less work, and will therefore take less time

  • Trim resultsets early
  • Partition
    • Query partitions or subsets of your data, instead of your entire database
    • Define partitions with Collections
    • Make use of your partitions with collection queries
    • Use cts:query to partition even further
  • Keep like-triples in the same document
  • Use MarkLogic indexes to scope a query
    • Collection query (or SPARQL FROM) to partition the RDF space
    • Put ontologies and other lookup/mapping triples into their own graphs/collections
    • Consider pushing-down some SPARQL FILTERs to the document

2) Pay attention to your data model

3) Resultset size specific tips

  • For small resultsets – from SPARQL, get the docs with a search
  • For large resultsets
    • Get docs in a single read, no joins
    • Large result sets may incur connection churning overhead – paginate large resultsets to ensure connection reuse

4) Hardware tips

  • Add more memory - allows the optimizer to choose faster plans
  • Add more hardware - allows for increased parallelization

5) Avoid unnecessary work

  • Re-use queries with bind variable - query plan is cached for 5 minutes
  • Dedup processing
    • De-duplication has no effect on results if you have no duplicate triples and/or you use DISTINCT
    • Skipping dedup processing can result in substantial performance improvements

Introduction

Backing up multiple databases simultaneously may make some of the backups fail with error XDMP-FORESTOPIN.

 

Details

While configuring a scheduled backup, one can also select to backup the associated auxiliary databases like security, schemas, triggers. Generally, all the content databases share these auxiliary databases so issue may arise when more than one scheduled backup tries to backup the same auxiliary database. When two backups try to backup the same auxiliary database, the backup will fail throwing XDMP-FORESTOPIN error. Generally this error comes when the system attempts to start one forest operation (backup, restore, remove, clear, etc.) while another, exclusive operation is already in progress. For example, starting a new backup while a previous backup is still in progress.

 

Recommendations

One should be extra cautious while configuring scheduled backups and selecting auxiliary databases with them. If one really wants to backup the auxiliary databases with the content database then one needs to pay special attention to the timing and ensure that no two backups pose this timing threat.

As most of the applications don't make frequent changes to their auxiliary databases hence MarkLogic recommends to schedule backup for them separately - instead of selecting them together with the content databases.

Introduction

Problems can occur when trying to explicitly search (or not search) parts of documents when using a global configuration approach to include and exclude elements.

Global Approach

Including and excluding elements in a document using a global configuration approach can lead to unexpected results that are complex to diagnose.  The global approach will require positions to be enabled in your index settings, expanding the disk space requirements of your indexes and may result in greater processing time of your position dependent queries.  It may also require adjustments to your data model to avoid unintended includes or excludes; and may require changes to your queries in order to limit the number of positions used.

If circumstances dictate that you must instead use the less preferred global configuration approach, you can read more about including/excluding elements in word queries here: http://docs.marklogic.com/guide/admin/wordquery#id_77008

Recommended Approach

In general, it's better to define specific fields, which are a mechanism designed to restrict your query to portions of documents based on elements. You can read more about fields here: http://docs.marklogic.com/guide/admin/fields

 

 

Introduction

In MarkLogic 8, support for native JSON and server side JavaScript was introduced.  We discuss how this affects the support for XML and XQuery in MarkLogic 8.

Details

In MarkLogic 8, you can absolutely use XML and XQuery. XML and XQuery remain central to MarkLogic Server now and into the future. JavaScript and JSON are complementary to XQuery and XML. In fact, you can even work with XML from JavaScript or JSON from XQuery.  This allows you to mix and match within an application—or even within an individual query—in order to use the best tool for the job.

See also:

Server-side JavaScript and JSON vs XQuery and XML in MarkLogic Server

XQuery and JavaScript interoperability

Introduction

Sometimes you may find that there are one or more tasks that are taking too long to complete or are hogging too many server resources, and you would like to remove them from the Task Server.  This article presents a way to cancel active tasks in the Task Server.

Details

To cancel active tasks in the Task Server, you can browse to the Admin UI, navigate to the Status tab of the Group's Task Server, and cancel the tasks. However, this may get tedious if there are many tasks to be terminated.

As an alternative, you can use the server monitoring built-ins to programmatically find and cancel the tasks. The documentation for the MarkLogic Server API contains includes information for all the builtin functions you will need (refer to http://docs.marklogic.com/xdmp/server-monitoring).

Sample Script

Here is a sample script that removes the task based on the path to the module that is being executed:

let $host-id := xdmp:host()
let $host-task-server-id := xdmp:host-status($host-id)//*:task-server/*:task-server-id/text()
let $task-server-status := xdmp:server-status($host-id,$host-task-server-id)
let $task-server-requests := $task-server-status/*:request-statuses
let $scheduled-task-request := $task-server-requests/*:request-status[*:request-text = "/PATH/TO/SCHEDULED/TASK/MODULE.XQY"]/*:request-id/text()
return
   xdmp:request-cancel($host-id,$host-task-server-id,$scheduled-task-request)

Summary

MarkLogic stores all signed Certificates, private keys, and Certificate Authority Certificates inside the Security Database. The Security Database also stores Users, Passwords, Roles, Privileges, and many other Authentication related configurations. While setting up DR Cluster, many Administrators prefers to Replicate the Security Database to a DR (Disaster Recovery) cluster to avoid re-configuring DR cluster with Same User/Role/Privileges etc. 

Security Database Replication presents design challenges and issues while Accessing Application Servers on the DR cluster.

  • Certificates installed on the Master Cluster Security Database will get replicated to the DR cluster Security Database; However those Replicated Certificates are not useful to the DR Cluster, since Signed Certificates are typically tied to a single host (though exceptions include SAN and Wild Card Certificates).  
  • At the same time, since replicated databases are read-only, we are not able to install a new Signed Certificates on the DR Cluster as the replicated Security Database is read-only.

This article discusses the different aspect of the above problem and provides a solution.

Configuration: Security Database replicated to DR Cluster

For article discussion purpose, we will consider a 3 node Master cluster coupled to a 3 node DR cluster, where the Security DB is replicated from Master to DR Cluster. We will also have an Application Server configured attached to "DemoTemp1" Template in Master cluster. 

       Master_Cluster_Hosts.png         DR_Cluster_Hosts.png

Issues in DR Cluster.

Certificate Authentication based on CN field 

When client browsers connect to the application server using HTTPS, they check to make sure your SSL Certificate matches the host name in the address bar. There are three ways for browsers to find a match:

  1.    The host name (in the address bar) exactly matches the Common Name (CN) in the certificate's Subject.
  2.    The host name matches a Wildcard Common Name. For example, www.example.com matches the common name *.example.com.
  3.    The host name is listed in the Subject Alternative Name field.

The most common form of SSL name matching is the first option -  SSL client compares server name to the Common Name in the server's certificate. 

Since Temporary Signed Certificates have CN field of Master Cluster nodes, the Application Server on the DR Cluster will fail when used with the MarkLogic generated Temporary Signed Certificate.

Certificate Requests

When we attach Template on DR Cluster to any application server and generate a certificate request, MarkLogic Server will generates a Temporary Signed Certificate for all the nodes in Cluster in the Application Server Group.

Master_Cert_Template_Status.png    DR_Cert_Template_Status_1.png

To install Certificate Signed by 3rd party, replacing temporary Signed Certificate, we will need to generate a certificate requests. You can generate a certificate requests in MarkLogic for All nodes using the Request button under "Needed Certificate Request" on Certificate Template "Status" tab.

  • On the Master cluster, MarkLogic will generate 3 Certificate requests with CN field matching for each of 3 nodes. All 3 new Certificate Request are internally stored in the Security Database.
  • On the DR Cluster, Clicking Certificate Request will result in an ERROR, since the DR Cluster has a replicated Security Database that is in a Read-Only ("open replica") state i.e. security database updates arel not allowed.

Pending Certificate Requests

Each Certificate request are intended for specific individual nodes, as Certificate request originator will incorporate client FDQN into Certificate CN field while request generation. MarkLogic Server will use the hostname (which in most cases matches your FDQN) as the CN field value in the Certificate Request.

Certificate request generated on Master Cluster are stored in Security Database, which will get replicated to DR Cluster Security Database (as/when Security DB replication is configured); However Certificate requests generated on Master Cluster are not relevant to DR Cluster as they have Master Cluster nodes FQDN as CN Fields in them.

Master_Cert_Template_Status_Post_Request.png    DR_Cert_Template_Status_Post_Request.png

Solution

To install Signed Certificates intended for the DR Cluster, where Certificate CN field matches the FQDN of DR Cluster, we will need to install the DR cluster's Signed Certificates on the Master Cluster.  That certificate will then be replicated to the DR Cluster through the normal database replication of the Security database. 

Step 1. Generate Certificate Request (intended for DR nodes).

You would generate Certificate request using XQuery on QConsole against the Security database on the Master cluster itself, but the values used in your XQuery will be for DR/Replica Cluster nodes FQDN. For example, for the first node in DR Cluster "engrlab-130-026.engrlab.marklogic.com, you would run below Query from Query Console on any Node on Master Cluster against Security Database. We will change the FQDN value to each node and run Query total 3 times.

xquery version "1.0-ml"; 
import module namespace pki = "http://marklogic.com/xdmp/pki" at "/MarkLogic/pki.xqy";
pki:generate-certificate-request(
      pki:template-get-id(
           pki:get-template-by-name("DemoTemp1")),
                                    "engrlab-130-026.engrlab.marklogic.com",
                                    "engrlab-130-026.engrlab.marklogic.com",
                                    ())

Step 2. Download Certificate Request and Get them Signed.

We should be able to see Certificate request pertaining to each nodes (for Master as well as DR Nodes) on Certificate Template status tab on Master Cluster GUI and DR Cluster GUI both. Download them and get them signed by the favorite Certificate Authority.

Master_Cert_Template_Status_QC_Request.png    DR_Cert_Template_Status_QC_Request.png

Step 3. Install All Signed Certificates (for Master + DR Nodes) on Master Cluster 

Install all Signed Certificates (including Cert intended for Replica Cluster) on Master Cluster Admin GUI Certificate Template Import tab. If we try to Install Certificates on DR/Replica cluster from Admin GUI, we will get XDMP-FORESTNOT --Forest Security not available: open replica Error. Our Application Server on the DR Cluster will find the appropriate Certificates for the node from the list of all Certificates. Below screenshot shows the status of Certificate Template from Master cluster as well as DR cluster (Both should be identical).

Master_Cert_Template_Status_Final.png    DR_Cert_Template_Status_Final.png

Step 4. Importing Pre-Signed Cert where Keys are generated outside of MarkLogic.

Please read "Import pre-signed Certificate and Key for MarkLogic HTTPS App Server" to import Certificate Req/Key generated outside of MarkLogic; For our purpose, we will need to import Certificates (and their respective Keys) for both Clusters (Master as well as DR/Replica) from the QConsole on Master Cluster itself.

Further Reading

Summary

Each node in MarkLogic Server Cluster has a hostname, a human-readable nickname corresponding to the network address of the device. MarkLogic retrieves the hostname from underlying operating system during installation. On Linux, we can retrieve platform hostname value by running "$ hostname" from a shell prompt. 

$ hostname

129-089.engrlab.marklogic.com

In most environments, hostname is the same as the platform's Fully-Qualified-Domain-Name (FQDN). However, there are scenarios where hostname could be different than the FQDN. On such environments you would use FQDN (engrlab-129-089.engrlab.marklogic.com) to connect to platform instead of hostname

$ ping engrlab-129-089.engrlab.marklogic.com

PING engrlab-129-089.engrlab.marklogic.com (172.18.129.89) 56(84) bytes of data.

64 bytes from engrlab-129-089.engrlab.marklogic.com (172.18.129.89): icmp_seq=1 ttl=64 time=0.011 ms

During Certificate Installation to Certificate template on environments where hostname and FQDN mismatch, MarkLogic looks for the CN field in the Installed Certificate to find a matching hostname in the cluster. However since CN field (reflecting FQDN) does not match the hostname known to MarkLogic, MarkLogic does not assign the installed Certificate to any specific host in Cluster.

Subject: C=US, ST=NJ, L=Princeton, O=MarkLogic, OU=Eng, CN=engrlab-129-089.engrlab.marklogic.com

Installing Certificates in this scenario results in the installed Certificate not replacing the Temporary Certificate, and the Temporary Certificate will still be used with HTTPS App Server instead of the installed Certificates.

This article details different solutions to address this issue. 

Solution:

1) Hostname change

By default MarkLogic picks the hostname value presented by the underlying operating system. However we can always change the hostname string stored in MarkLogic Server after installation using Admin API admin:host-set-name ( http://docs.marklogic.com/admin:host-set-name )

Changing the hostname in MarkLogic (to reflect the FQDN name) will not affect the underlying Platform/OS hostname values, but will result in MarkLogic being able to find the correct host for the installed Certificate (CN field = hostname), and thus able to link then installed Certificate to specific host in Cluster.

2) XQuery code linking Installed Cert to specific Host

You can also use below XQuery code from QConsole against Security DB (as content source) to update Certificate xml files in Security DB, linking Installed Certificate to Specific host.

Please change the Certificate Template-Name, and Host-Name in below XQuery to reflect values from your environment.

xquery version "1.0-ml";

import module namespace pki = "http://marklogic.com/xdmp/pki"  at "/MarkLogic/pki.xqy";
import module namespace admin = "http://marklogic.com/xdmp/admin"  at "/MarkLogic/admin.xqy";

(: Change to your hostname string :)
(: if Qconsole is launched from the same host, then below can be used as well :)
(: let $hostname := xdmp:host-name()    :)
let $hostname :="129-089.engrlab.marklogic.com"
let $hostid := admin:host-get-id(admin:get-configuration(), $hostname)

(: FQDN name matching Certificate CN field value :)
let $fqdn := "engrlab-129-089.engrlab.marklogic.com"

(: Change to your Template Name string :)
let $templateid := pki:template-get-id(pki:get-template-by-name("YourTemplateName"))

for $i in cts:uris()
where 
(   (: locate Cert file with Public Key :)
    fn:doc($i)//pki:certificate/pki:template-id=$templateid 
    and fn:doc($i)//pki:certificate/pki:authority=fn:false()
    and fn:doc($i)//pki:certificate/pki:host-name=$fqdn
)
return <h1> Cert File - {$i} 
{xdmp:node-delete(doc($i)//pki:certificate/pki:host-id)}
{xdmp:node-insert-child(doc($i)/pki:certificate, <pki:host-id>{$hostid}</pki:host-id>)}
{
    (: extract cert-id :)
    let $certid := fn:doc($i)//pki:certificate/pki:certificate-id
    for $j in cts:uris()
    where 
    (
        (: locate Cert file with Private key :)
        fn:doc($j)//pki:certificate-private-key/pki:template-id=$templateid 
        and fn:doc($j)//pki:certificate-private-key/pki:certificate-id=$certid
    )
    return <h2> Cert Key File - {$j}
    {xdmp:node-delete(doc($j)//pki:certificate-private-key/pki:host-id)}
    {xdmp:node-insert-child(doc($j)/pki:certificate-private-key, <pki:host-id>{$hostid}</pki:host-id>)}
    </h2>
} </h1>
 

Also, note that above will not replace/overwrite the temporary Certificate, however our App Server will start using Installed Certificate from this point instead of Temporary Certificate. One can also delete the now unused Temporary Certificate file from QConsole without any negative effect.

3) Certificate with Subject Alternative Name (SAN Cert)

You can also request your IT (or Certificate issuer) to provide a Certificate with altSubjectName that matches MarkLogic's understanding of the host. MarkLogic, during the Installation of the Certificate, will look for Alternative names and link Certificate to correct host based on altSubjectName field.

 

Further Reading

 

Introduction: When you may need to change the state of forests

In most cases, all forests in your MarkLogic cluster will be configured to allow all (any) updates to be made.

If we consider running the following example in Query Console:

In the majority of cases, calling the above function should return "all", indicating that the forest is in a state to allow incoming queries to read data from the forest and to allow queries to update content (and to add new content) into that forest.

At any given time, a forest can be configured to be in one of four different states:

  • all
  • read-only
  • delete-only
  • flash-backup

You may want to change the state of the forests in a given database for several reasons

read-only
To run your application in maintenance mode where data can be read but no data on-disk can be changed
delete-only
In a situation where you are migrating data from a legacy database or removing data from a given forest
flash-backup
In a situation where you need to quiesce all forests in a given database for long enough to allow you to make a file level backup of the forest data.

Forest states explained

Sample state management module

Below is an example template for modifying the state of all forests in a given database:

Further reading

Forest States
http://docs.marklogic.com/guide/admin/forests#id_43487
Setting Forests to "read only"
http://docs.marklogic.com/guide/admin/forests#id_72520
Setting Forests to "delete only"
http://docs.marklogic.com/guide/admin/forests#id_20932

Link to Example Code

flash-backup.xqy

Introduction

This article discusses some of the issues you should think about when preparing to change the IP address' of a MarkLogic Server.

Detail: 

If the hostnames stay the same, then changing IP addresses should not have any adverse side effects since none of the default MarkLogic Server settings require an IP address.

Here are some caveats:

  1. Make sure there are no application servers that have an 'address' setting to an IP address that will no longer be accessible/exist after the change.
  2. Similarly, make sure there a no external (to MarkLogic Server) dependencies on the original IP addresses.
  3. Make sure you allow some time (on the order of minutes) for the routing tables to propagate across the DNS servers before bringing up MarkLogic Server.
  4. Make sure the hosts themselves are reachable via the standard Unix channels (ping, ssh, etc) before starting MarkLogic Server.
  5. Make sure you test this in a non-production environment box before you implement it in production.

Introduction

If you have an existing MarkLogic Server instance running on EC2, there may be circumstances where you need to change the size of available storage.

This article discusses approaches to ensure a safe increase in the amount of available storage for your EC2 instances without compromising MarkLogic data integrity.

This article assumes that you have started your cluster using the CloudFormation templates provided by MarkLogic.

The recommended method (I.) is to shut down the cluster, do the resize using snapshots and start again. If you wish to avoid downtime an alternative procedure (II.) using multiple volumes and rebalancing is described below.

In both procedures we are recommending a single, large EBS volume as opposed to multiple smaller ones because:

1. Larger EBS volumes have faster IO as described by the Amazon EBS Volume types at http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html

2. You have to keep enough spare capacity on every single volume to allow for merges.  MarkLogic disk space requirements are described in our Installation Guide.

I. Resizing using AWS snapshots

This is the recommended method. This procedure follows the same steps as official Amazon AWS documentation, but highlights MarkLogic specific steps. Please review AWS Documentation in detail before proceeding:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-expand-volume.html

1. Make sure that you have an up to date backup of your data and a working restore plan.

2. Stop the MarkLogic cluster by going to AWS Console -> CloudFormation -> Actions -> Update Stack

aws-update-stack.png

Click through the pages and leave all other settings intact, but change Nodes to and review and confirm updating the stack. This will stop the cluster.

This is also covered in Marklogic EC2 documentation:

https://docs.marklogic.com/guide/ec2/managing#id_59478

4. Create a snapshot of the volume to resize.

5. Create a new volume from the snapshot.

Ensure that the new volume is sufficiently large to cover MarkLogic disk space requirements (generally at least 1.5x of the planned total forest size).

6. Detach the old volume.

7. Attach the newly expanded volume.

Steps 4-7 are exactly as covered in AWS documentation and have no Marklogic specific parts.

8. Restart MarkLogic cluster, by going to AWS Console -> CloudFormation -> Actions -> Update Stack and changing Nodes to the original setting.

9. Connect to the machine using SSH and resize the logical partition to match the new size. This is covered in AWS documentation, the commands are:

- resize2fs for ext3 and 4

xfs_growfs for xfs

10. The new volume will have a different id. You need to update the CloudFormation template so that the data volumes are retained and remounted when the cluster or nodes are restarted. The easiest way is to use mlcmd shell script provided by Marklogic. Also using SSH, run the following:

/opt/MarkLogic/bin/mlcmd sync-volumes-to-mdb

This will synchronise the EBS volume id with the CloudFormation template.

At this point the procedure is complete and you can delete the old EBS volume and once you have verified that everything is working fine, also delete the snapshot created in step 4.

II. Resizing with no downtime, using MarkLogic Rebalancing

This method avoids cluster downtime but it is slightly more complicated than procedure 1 and rebalancing will take additional time and add load to the cluster during rebalancing. In most cases procedure 1 takes far less time to complete, however, the cluster is down for the duration. With this procedure the cluster can serve requests at all times.

This procedure follows the same steps as official Amazon AWS documentation where possible, but highlights MarkLogic specific steps. Please review AWS Documentation in detail before proceeding:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-expand-volume.html

The procedure is described in more detail in the MarkLogic Server on Amazon EC2 Guide at https://docs.marklogic.com/guide/ec2/managing#id_81403

1. Create a new volume.

Ensure that the new volume is sufficiently large to cover MarkLogic disk space requirements (generally at least 1.5x of the planned total forest size).

2. Attach the volume to the EC2 instance. Please take a note of the EC2 device mount point, for example /dev/sdg and see here where it maps to in Linux and in RedHat: https://docs.marklogic.com/guide/ec2/managing#id_17077

3. SSH into the instance and execute the /opt/MarkLogic/bin/mlcmd init-volumes-from-system command to create a filesystem for the volume and update the Metadata Database with the new volume configuration. The init-volumes-from-system command will output a detailed report of what it is doing. Note the mount directory of the volume from this report.

4. Once the volume is attached and mounted to the instance, log into the Administrator Interface on that host and create a forest or forests, specifying host name of the instance and the mount directory of the volume as the forest Data Directory. For details on how to create a forest, see Creating a Forest in the Administrator's Guide.

5. Once the status of the new forest is set to "open", attach the new forest(s) to the database and retire all the forest(s) on the old volume. If you only have 1 data volume then this includes forests for Schemas, Security, Triggers, Modules etc. It is possible to script this part using XQuery, JS or REST:

https://docs.marklogic.com/admin:forest-create

This will trigger rebalancing - database fragments will start to move to the new forests. This process will take several hours or days, depending on the size of data and the Admin UI will show you an estimate.

The Admin UI for this is covered here: https://docs.marklogic.com/guide/admin/forests#id_93728

and here is more information on rebalancing: https://docs.marklogic.com/guide/admin/database-rebalancing#id_87979

6. Once the old forest(s) have 0 fragments in them you can detach them and delete the old forest(s). The migration to a new volume is complete.

7. Optional removing of the old volume. If your original volume was data only, the original volume should be empty after this procedure and you can:

a) unmount the volume in Linux

b) delete the volume in AWS EC2 console

c) issue /opt/MarkLogic/bin/mlcmd sync-volumes-to-mdb. This will preserve the new volume mappings in the Cloud Formation template and the volumes will be preserved and remounted when nodes are restarted or even terminated.

Introduction

A common use case in many business applications is to find if an element exists in any document or not. This article provide ways to find such documents and explain points that should be taken care of while designing a solution.

 

Solution

In general, existence of an element in a document can checked by using below XQuery.

cts:element-query(xs:QName('myElement'),cts:and-query(()))

Note the empty cts:and-query construct here. An empty cts:and-query is used to fetch all fragments.

Hence running below search query will bring back all the documents having element "myElement".

 

Wrapping the query in cts:not-query will bring back all the documents *not* having element "myElement" 

 

As a search using cts:not-query is only guaranteed to be accurate if the underlying query that is being negated is accurate from its index resolution, hence to check existence of a specific XPath, we need to index that XPath.
e.g. if you want to find documents having /path/1/A (and not /path/2/A) then you can create a field index for path /path/1/A and then use it in your query instead.

 

Things to remember

1.) Have unique element name in a single document i.e. try not to use same element name at multiple places within a document if they have different meaning for your use case. Either give them different element names or put them under different namespaces to remove any ambiguity. e.g. if you have element "table" at two places in a single document then you can put them both under different namespaces such as html:table & furniture:table or you can name them differently such as html_table & furniture_table.

2.) If element names are unique within a document then you don't need to create additional indexes. If element names are not unique within a document and you are interested in only a specific XPath then create path(field) indexes on those XPaths and use the same in your not-query.

 

Introduction

MarkLogic Server has shipped with full support for the W3C XML Schema specification and schema validation capabilities since version 4.1 (released in 2009).

These features allow for the validation of complete XML documents or elements within documents against an existing XML Schema (or group of Schemas), whose purpose is to define the structure, content, and typing of elements within XML documents.

You can read more about the concepts behind XML Schemas and MarkLogic's support for schema based validation in our documentation:

https://docs.marklogic.com/guide/admin/schemas

Caching XML Schema data

In order to ensure the best possible performance at scale, all user created XML Schemas are cached in memory on each individual node within the cluster using a portion of that node's Expanded Tree Cache.

Best practices when making changes to pre-existing XML Schemas: clearing the Expanded Tree Cache

In some cases, when you are redeploying a revised XML Schema to an existing schema database, MarkLogic can sometimes refer to an older, cached version of the schema data associated with a given document.

Therefore, it's important to note that whenever you plan to deploy a new or revised version of a Schema that you maintain, as a best practice, it may be necessary to clear the cache in order to ensure that you have evicted all cached data stored for older versions of your schemas.

If you don't clear the cache, you may sometimes get references to the old, cached schema references and as result, you may get errors like:

XDMP-LEXVAL (...) Invalid lexical value

You can clear all data stored in the Expanded Tree Cache in two ways:

  1. By restarting MarkLogic service on every host in the cluster. This will automatically clear the cache, but it may not be practical on production clusters.
  2. By issuing a call to xdmp:expanded-tree-cache-clear() command on each host in the cluster. You can run the function in query console or via REST endpoint and you will need a user with admin rights to actually clear the cache.

An example script has been provided that demonstrates the use of XQuery to execute the call to clear the Expanded Tree Cache against each host in the cluster:

Please contact MarkLogic Support if you encounter any issues with this process.

Related KB articles and links:

Summary

XDMP-ODBCRCVMSGTOOBIG can occur when a non-ODBC process attempts to connect to an ODBC application server.  A couple of reasons that this can happen is that there is an http application that has been accidentally configured to point to the ODBC port, or a load balancer is sending http health checks to an ODBC port. There are a number of common error messages that can indicate whether this is the case.

Identifying Errors and Causes

One method of determining the cause of an XDMP-ODBCRCVMSGTOOBIG error is to take the size value and convert it to Characters.  For example, given the following error message:

2019-01-01 01:01:25.014 Error: ODBCConnectionTask::run: XDMP-ODBCRCVMSGTOOBIG size=1195725856, conn=10.0.0.101:8110-10.0.0.103:54736

The size, 1195725856, can be converted to the hexadecimal value 47 45 54 20, which can be converted to the ASCII value "GET ".  So what we see is a GET request being run against the ODBC application server.

Common Errors and Values

Error Hexadecimal Characters
XDMP-ODBCRCVMSGTOOBIG size=1195725856 47 45 54 20 "GET "
XDMP-ODBCRCVMSGTOOBIG size=1347769376 50 55 54 20 "PUT "
XDMP-ODBCRCVMSGTOOBIG size=1347375956 50 4F 53 54 "POST"
XDMP-ODBCRCVMSGTOOBIG size=1212501072 48 45 4C 50 "HELP"

Conclusion

XDMP-ODBCRCVMSGTOOBIG errors, do not affect the operation of MarkLogic Server, but can cause error logs to fill up with clutter.  Determining that the errors are caused by an http request to an ODBC port can help to identify the root cause, so the issue can be resolved.

Summary

Meters data can be a good resource for getting an approximation of the number of requests being managed by the server at a given time. It's also important to understand how Meters data is generated, should there be a discrepancy between the Meters samples, and the entries in the access log.

Meters Request Data

The Meters data is designed to record a sampling of activity, every few seconds. Meters data is not designed to accurately record server request rates much lower than every few seconds. Request rates are 15-second moving averages, recalculated every second and available in real time through the xdmp:host-status, xdmp:server-status and xdmp:forest-status built-in functions.

Meters Samples

The metering subsystem samples these real-time rates on the minute and saves the samples in the Meters database. Meters sampled data of events that occur less frequently than the moving average period will be lower than the number of access log entries. The difference between the two will depend on when the last event happened and when the sample was taken.

This mean that if an event happens once a minute, the request rate will rise when an event happens, but then decay away within a few seconds. If the sample is taken after the event has decayed, the saved meters data will be lower than the actual number of requests

Conclusion

The result of the Meters sampling method means that it is not unusual for Meters to under report the number of requests in certain circumstances.

Summary

In MarkLogic Server v7.0-2, the tokenizer keys, for languages where MarkLogic provides generic language support, were removed so that they now all use the same key. For example, Greek falls into this class of languages. This change was made as part of an optimization for languages in which MarkLogic Server has advanced stemming and tokenization support.  

Stemmed searches that include characters from languages that do not have advanced language support, performed on MarkLogic Server v7.0-2 or later releases, against content loaded on a version previous to v7.0-2, may not return the expected results.

Resolution

In order to successfully run these stemmed searches, you can either:

  • Reindexing the database ; or
  • Reinsert the affected documents (i.e. the documents that contain characters in languages for which MarkLogic Server only has generic language support).

If these are not possible in your environment, you can always run the query unstemmed.

An Example

The following example demonstrates the issue

  1. On MarkLogic Server version 7.0-1, insert a document (test.xml) that contains the Greek character 'ε'.
  2. Run this query 
    xdmp:estimate( cts:search( doc('test.xml'), 'ε')),
    cts:contains( doc('test.xml'), 'ε')
  3. The query will return the correct results: 1, true
  4. Upgrade MarkLogic Server to version 7.0-3 or later and run the query again
  5. The query will return incorrect results: 0, false 
  6. Reindex the database and re-run the query
  7. The query will return the correct result once again.
     

Introduction:

As the Configuration Manager has been deprecated starting with MarkLogic version 9.0-5, there is a common question on the ways how the configuration of database or an application server from an old version of MarkLogic instance to new version of MarkLogic server or between any two versions of MarkLogic server post 9.0-4

This article outlines the steps on how to migrate the resource configuration information from one server to other using Gradle and ml-gradle plugin.

Pre-Requisite

As a pre-requisite, have the compatible gradle (6.x) and the latest ml-gradle plugin(latest version is 4.1.1) installed and configured on the client (local machine or a machine from where the gradle project has to run) machine. 

Solution:

The entire process is divided into two major parts Exporting resource configuration from the source cluster and Importing the resource configuration onto the destination cluster.

1. Exporting resource configuration from the source cluster/host:

On the machine where gradle is installed and the plug-in is configured, create a project as suggested in https://github.com/marklogic-community/ml-gradle#start-using-ml-gradle

In the example steps below the source project is  /Migration

1.1 Creating the new project with the source details:

While creating this new project, please provide the host MarkLogic server host, username, password, REST port, multiple environment details in the command line and once the project creation is successful, you can verify the Source server details in the gradle.properties file.

macpro-user1:Migration user1$ gradle mlNewProject
Starting a Gradle Daemon (subsequent builds will be faster)
> Configure project :For Jackson Kotlin classes support please add "com.fasterxml.jackson.module:jackson-module-kotlin" to the classpath 
> Task :mlNewProject
Welcome to the new project wizard. Please answer the following questions to start a new project. Note that this will overwrite your current build.gradle and gradle.properties files, and backup copies of each will be made.

[ant:input] Application name: [myApp]
<--<-<--<-------------> 0% EXECUTING [20s]
[ant:input] Host to deploy to: [SOURCEHOST]
<-------------> 0% EXECUTING [30s]
<-------------> 0% EXECUTI[ant:input] MarkLogic admin username: [admin]
<-------------> 0% EXECUTING [34s]
[ant:input] MarkLogic admin password: [admin]
<-<---<--<-------------> 0% EXECUTING [39s]
[ant:input] REST API port (leave blank for no REST API server):
<---<-------------> 0% EXECUTING [50s]
[ant:input] Test REST API port (intended for running automated tests; leave blank for no server):
<-------------> 0% EXECUTING [1m 1s]
[ant:input] Do you want support for multiple environments?  ([y], n)
<-------------> 0% EXECUTING [1m 6s]
[ant:input] Do you want resource files for a content database and set of users/roles created? ([y], n)
<-------------> 0% EXECUTING [1m 22s]
Writing: gradle.properties
Making directory: ~/Migration/src/main/ml-config
Making directory: ~/Migration/src/main/ml-modules
Use '--warning-mode all' to show the individual deprecation warnings.
See https://docs.gradle.org/6.6.1/userguide/command_line_interface.html#sec:command_line_warnings

BUILD SUCCESSFUL in 1m 27s

1 actionable task: 1 executed

Once this build was successful, you can see the below directory structure created under the project directory:

1.2 Exporting the configuration of required resources:

Once the new project is created, export the required resources from the source host/cluster by creating a properties file(Not in the project directory but some other directory) as suggested in the documentation with all the resources details that need to be exported to the destination cluster. In that properties file, specify the names of the resources (Databases, Forests, app servers etc..)using the keys mentioned below with comma-delimited values:

For example, a sample properties file looks like below:

file.properties: 

cpfConfigs=my-domain-1 
databases=my-database1,my-database2
domains=my-domain-1,my-domain-2 
groups=my-group 
pipelines=my-pipeline-1 
privilegesExecute=my-privilege-1
privilegesUri=my-privilege-2
roles=my-role-1,my-role-2
servers=my-server-1,my-server-2
tasks=/path/to/task.xqy,/path/to/other/task.xqy
triggers=my-trigger-1,my-trigger-2 
users=user1,user2

Once the file is created, run the below: 

macpro-user1:Migration user1$ gradle -PpropertiesFile=~/file.properties mlExportResources

> Task :mlExportResources
Exporting resources to: ~/Migration/src/main/ml-config

Exported files:
~/Migration/src/main/ml-config/databases/Documents.json
.
.
.
~/Migration/src/main/ml-config/security/users/miguser.json
Export messages:
The 'forest' key was removed from each exported database so that databases can be deployed before forests.
The 'range' key was removed from each exported forest, as the forest cannot be deployed when its value is null.
The exported user files each have a default password in them, as the real password cannot be exported for security reasons.
Use '--warning-mode all' to show the individual deprecation warnings.
See https://docs.gradle.org/6.6.1/userguide/command_line_interface.html#sec:command_line_warnings

BUILD SUCCESSFUL in 1s

1 actionable task: 1 execute

Once this build was successful, the below directory structure is created under the project directory which includes the list of resources that have been exported and their config files (Example screenshot below):

With this step finished, the export of required resources from the source cluster is created. This export is now ready to be imported with these configurations(resources) into the new/destination cluster.

2. Importing Resources and the configuration on new/Destination host/Cluster:

For importing resource configuration on to the destination host/cluster, again create a new project and use the export that has been created in step 1.2 Exporting the configuration of required resources. Once these configuration files are copied to the new project, make the necessary modification to reflect the new cluster (Like hosts and other dependencies) and then deploy the configuration into the new project.

2.1 Creating a new project for the import with the Destination Host/cluster details:

While creating this new project, provide the destination MarkLogic server host, username, password, REST port, multiple environment details in the command line and once the project creation is successful, please verify the destination server details in the gradle.properties file. In the example steps below the source project is  /ml10pro

macpro-user1:ml10pro user1$ gradle mlNewProject
> Task :mlNewProject
Welcome to the new project wizard. Please answer the following questions to start a new project.

Note that this will overwrite your current build.gradle and gradle.properties files, and backup copies of each will be made.
[ant:input] Application name: [myApp]
<-------------> 0% EXECUTING [11s]
[ant:input] Host to deploy to: [destination host]

<-------------> 0% EXECUTING [25s]
[ant:input] MarkLogic admin username: [admin]

<-------------> 0% EXECUTING [28s]
[ant:input] MarkLogic admin password: [admin]

<-------------> 0% EXECUTING [36s]
[ant:input] REST API port (leave blank for no REST API server):

<-------------> 0% EXECUTING [41s]
[ant:input] Do you want support for multiple environments?  ([y], n)

<-------------> 0% EXECUTING [44s]
[ant:input] Do you want resource files for a content database and set of users/roles created? ([y], n)

<-------------> 0% EXECUTING [59s]
Writing: gradle.properties

Making directory: /Users/rgunupur/Downloads/ml10pro/src/main/ml-config
Making directory: /Users/rgunupur/Downloads/ml10pro/src/main/ml-modules
Use '--warning-mode all' to show the individual deprecation warnings.
See https://docs.gradle.org/6.6.1/userguide/command_line_interface.html#sec:command_line_warnings

BUILD SUCCESSFUL in 59s

1 actionable task: 1 executed

Once the project is created, you can observe the below directory structure created:

 

2.2 Copying the required configuration files from Source project to destination project:

In this step, copy the configuration files that have been created by exporting the resource configuration from the source server in step “ 1.2 Exporting the configuration of required resources”

For example, 

macpro-user1:ml10pro user1$ cp ~/Migration/src/main/ml-config  ~/ml10pro/src/main/ml-config

After copying, the directory structure in this project looks like below:

NOTE:

Please make sure that after copying configuration files from source to destination, review each and every configuration file and make the necessary changes for example, the host details should be updated to Destination server host details. Similarly, perform any other changes that are needed per the requirement.

For example, under ~/ml10pro/src/main/ml-config/forests/<database>/<forestname>.xml file you see the entry:

"host" : "Sourceserver_IP_Adress",
change the host details to reflect the destination host details. So after changing, it should look like:
"host" : "Destination_IP_Adress",
Similarly, For each forest, please define the host details of the specific node that is required.
For example for forest 1, if it has to be on node 1, define forest1.xml with 
"host" : "node1_host",
Similarly, any other configuration parameters that have to be updated, it has to be updated in that specific resource.xml file under the destination ml-config directory.
Best Practice:
As this involves modifying the configuration files, it is advised to have back up and maintain version control(like GitHub or svn) to track back the modifications.
If there is a requirement to deploy the same configuration to multiple environments (like PROD, QA, TEST) all that is needed is to have gradle.properties files created for a different environment where this configuration needs to be deployed. As explained in step 2.1 Creating a new project for the import with the Destination Host/cluster details, the property values for different environments need to be provided while creating the project so that the gradle.properties file for different environments are created.

2.3 Importing the configuration (Running mlDeploy):

In this step, import the configuration that has been copied/exported from a resource. After making sure that the configuration files are all copied from the source and then modified for the correct host details and other required changes, run the below:

macpro-user1:ml10pro user1$ gradle mlDeploy
> Task :mlDeleteModuleTimestampsFile

Module timestamps file /Users/rgunupur/Downloads/ml10pro/build/ml-javaclient-util/module-timestamps.properties does not exist, so not deleting
Use '--warning-mode all' to show the individual deprecation warnings.See https://docs.gradle.org/6.6.1/userguide/command_line_interface.html#sec:command_line_warnings

BUILD SUCCESSFUL in 44s

3 actionable tasks: 3 executed

Once the build is successful, go to the admin console of the destination server and verify that all the required configurations have been imported from the source server.

 

Further read:

For more information, refer to our documentation and knowledge base articles:

https://help.marklogic.com/Knowledgebase/Article/View/686/0/transporting-configuration-to-a-new-cluster

https://help.marklogic.com/knowledgebase/article/View/alternatives-to-configuration-manager

https://github.com/marklogic-community/ml-gradle

https://github.com/marklogic-community/ml-gradle/wiki/Resource-reference

https://developer.marklogic.com/code/ml-gradle/

 

Introduction

This Knowledgebase article outlines the procedure to enable HTTPS on an AWS Elastic Load Balancer (ELB) using Route 53 or an external supplier as the DNS provider and with an AWS generated certificate.

The AWS Certificate Manager (ACM) automatically manages and renews the certificate and this certificate will be accepted by all current browsers without any security exceptions.

The downside is that you do need control over your Hosted DNS name entry - either through Route 53 or through another provider.

Prerequisites

  1. MarkLogic AWS Cluster
  2. An AWS Route 53 hosted Domain or similar externally hosted Domain; the procedure described in this article assumes that Route 53 is being used, however where possible we have tried to detail the changes needed and these should also be applicable for another external DNS provider.

Procedure

  1. Click on your hostname in Route 53 to edit it

  1. Create a new Alias Record Set to point to your Elastic Load Balancer.

  1. In the Record Set entry on the right hand side, enter an Alias name for your ELB host, select Alias and from the Alias Target select the ELB load balancer to use, then click the Create button to update the Route 53 entry.

  1. In can take a little while for AWS to propagate the DNS update throughout the network but once it is available it is worth checking that you are able to reach your MarkLogic cluster using the new address, e.g.

  1. Once the Route 53 entry is updated and available you will need to request a new certificates through ACM, if you have other certificates already in ACM you can select Request a certificate

Otherwise select Get Started with Provision Certificates and select Request a public certificate

  1. Enter your required Certificate domain name and click Next:

Note: This should match your DNS Alias name entry created in Step 3.

In addition you can also add additional records such as a "Wildcard" entry, this is particularly useful if you want to use the same certificate for multiple hostnames, e.g if you have Clusters identified by versions such as ml9.[yourdomain].com & ml10.[yourdomain].com

  1. Select DNS as the Validation Method and click "Review"

  1. Before confirming and proceeding check the Hostnames are correct as certificates with invalid hosts names will not be usable.

  1. To complete validation, AWS will require you to add random CNAME entries to the DNS record to confirm that you are the owner. If you are using Route 53 this is as simple as selecting each entry in turn, numbers will vary depending on the number of Doamin name entries you specified in step 6, and clicking "Create record in Route 53". Once all entries have been created click Continue

  1. If the update is successful a Success message is displayed

  1. If your DNS Hostname is provided by an external provider you will need to download the entries using the "Export DNS configuration to a file link" and provide this information to your DNS provider to make the necessary updates.

The file is a simple CSV file and specifies one or more CNAME entries that need to be created with the required name and values. Once the AWS DNS validation process picks up these changes have been made the certificate creation process will be completed automatically.

Domain Name,Record Name,Record Type,Record Value
marklogic.[yourdomain].com,_c3949adef7f9a61dd6865a13e65acfdb.marklogic.[yourdomain].com.,CNAME,_7ec4e5ce2cf31212e20ce68d9d0ab9fd.kirrbxfjtw.acm-validations.aws.
*.[yourdomain].com,_9b2138934ee9bbe8562af4c66591d2de.[yourdomain].com.,CNAME,_924153c45d53922d31f7d254a216aed0.kirrbxfjtw.acm-validations.aws.
  1. Once the Certificate has been validated by either of the methods in Steps 9 or 11 the certificate will be marked as Issued and be available for the Load Balancer to use.

  1. Configure the ELB for HTTPS And the new AWS generated Certificate
  2. Edit the ELB Listeners and change the Cipher

  1. (Optional) For production environments it is recommended to allow TLSv1.2 only

  1. Next select the Certificate and repeat Steps 15 and 16 for each listener that you want to secure.

  1. From the ACM available certificates select the newly generated certificate for this domain and click Save

  1. Save the Listeners updates and ensure the update was successful.

  1. You should now be able to access your MarkLogic cluster securely over HTTPS using the AWS generated certificate.

Introduction

HAProxy (http://www.haproxy.org/) is a free, fast and reliable solution offering high availability, load balancing and proxying for TCP and HTTP-based applications.

MarkLogic 8 (8.0-8 and above) and MarkLogic 9 (9.0-4 and above) include improvements to allow you to use HAProxy to connect to MarkLogic Server.

MarkLogic Server supports balancing application requests using both the HAProxy TCP and HTTP balancing modes depending on the transaction mode being used by the MarkLogic application as detailed below:

  1. For single-statement auto-commit transactions running on MarkLogic version 8.0.7 and earlier or MarkLogic version 9.0.3 and earlier, only TCP mode balancing is supported. This is due to the fact that the SessionID cookie and transaction id (txid) are only generated as part of a multi-statement transaction.
  2. For multi-statement transactions or for single-statement auto-commit transactions running on MarkLogic version 8.0.8 and later or MarkLogic version 9.0.4 and later both TCP and HTTP balancing modes can be configured.

The Understanding Transactions in MarkLogic Server and Single vs. Multi-statement Transactions in the MarkLogic documentation should be referenced to determine whether your application is using single or multi-statement transactions.

Note: Attempting to use HAProxy in HTTP mode with Single-statement transactions prior to MarkLogic versions 8.0.8 or 9.0.4 can lead to unpredictable results.

Example configurations

The following example configurations detail only the parameters relevant to enabling load balancing of a MarkLogic application, for details of all parameters that can be used please refer to the HAProxy documentation.

TCP mode balancing

The following configuration is an example of how to balance requests to a 3-node MarkLogic application using the "roundrobin" balance algorithm based on the source IP address. The health of each node is checked by a TCP probe to the application server every 1 second.

backend app
mode tcp
balance roundrobin
stick-table type ip size 200k expire 30m
stick on src
default-server inter 1s
server app1 ml-node-1:8012 check id 1
server app2 ml-node-2:8012 check id 2
server app3 ml-node-3:8012 check id 3

HTTP mode balancing

The following configuration is an example of how to balance requests to a 3-node MarkLogic application using the "roundrobin" balance algorithm based on the "SessionID" cookie inserted by the MarkLogic server.

The health of each node is checked by issuing an HTTP GET request to the MarkLogic health check port and checking for the "Healthy" response.

backend app
mode http
balance roundrobin
cookie SessionID prefix nocache
option httpchk GET / HTTP/1.1\r\nHost:\ monitoring\r\nConnection:\ close
http-check expect string Healthy
server app1 ml-node-1:8012 check port 7997 cookie app1
server app2 ml-node-2:8012 check port 7997 cookie app2
server app3 ml-node-3:8012 check port 7997 cookie app3

Summary

MarkLogic Server organizes Trusted Certificate Authorities (CA) by Organization Name.  Trusted Certificate Authorities are the issuers of digital certificates, which in turn are used to certify the public key on behalf of the named subject as given in the certificate.  These certificates are used in the authentication process by:

  1. A MarkLogic Application Server configured to use SSL (HTTPS).
  2. Any Web Client which is making a connection to a MarkLogic Application Server over HTTPS (in the case of SSL Client Authentication).

Example Scenarios

Consider the following example:

$openssl x509 -in CA.pem -text -noout
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 18345409437988140316 (0xfe97fcaf8a61b51c)
    Signature Algorithm: sha1WithRSAEncryption
        Issuer: C=US, ST=CA, L=San Carlos, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic CA
        Validity
            Not Before: Nov 30 04:08:31 2015 GMT
            Not After : Nov 29 04:08:31 2020 GMT
        Subject: C=US, ST=CA, L=San Carlos, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic CA

In this example, From viewing the Trusted CA Subject field, the CA Certificate name will be listed with the organisation name of "MarkLogic Corporation" (O=MarkLogic Corporation) in MarkLogic's list of Certificate Authorities.

You can view the full list of currently configured Trusted Certificate Authorities by logging into the MarkLogic administration Application Server (on port 8001) and viewing the status page: Configure -> Security -> Certificate Authorities

Trusted CA Certificate without Organization name (O=)

In some cases, there are legitimate Trusted CA Certificates which do not contain any further information about the Organization responsible for the certificate.

The example below shows a sample self signed root CA (DemoLab CA) which highlights this scenario:

$openssl x509 -in DemoLabCA.pem  -text -noout
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 12836463831212471403 (0xb22447d80f91b46b)
    Signature Algorithm: sha1WithRSAEncryption
        Issuer: CN=DemoLab CA
        Validity
            Not Before: Nov 30 05:23:13 2015 GMT
            Not After : Nov 29 05:23:13 2020 GMT
        Subject: CN=DemoLab CA

If this Certificate were to be loaded into the MarkLogic, no name would appear under the list of <em>Certificate Authorities</em>in the list provided through the administration Application Server at Configure -> Security -> Certificate Authorities

In the case of the above example, it would be difficult to use the certificate validated by DemoLab CA (and to use DemoLab CA as our Trusted Certificate Authority) as MarkLogic will only list certificates that are associated with an Organization.

Solution

To workaround this issue, we can configure MarkLogic to use the certificate through some scripting with Query Console.

1) Loading the CA using Query Console

Start by using a call to pki:insert-trusted-certificates to load the Trusted CA into MarkLogic.  The sample Query Console code below demonstrates this process (Please ensure this query is executed against the Security database)

Make a note of value of the id returned by MarkLogic. It will return an unsigned long (xs:unsignedLong) which is the id value that can be used later to retrieve that certificate

2) Attach Trusted CA with "SSL Client Certificate Authorities" using Query Console

The next step is to associate the certificate that we just inserted from our filesystem (DemoLabCA.pem) with a given MarkLogic Application Server. Once this is done, any client connecting to that application server over SSL will be presented with the cerificate and DemoLab CA will be used to match the certificate using the Common Name value (Common Name eq "DemoLab CA")

3) Verify attached Trusted CA for Client Cetificate Authorities

Executing the above code should return the same identifier (for the Trusted CA) as returned as result of the code executed in step 1. Additionally, we can see that our Application Server (DemoAppServer) is now configured to expect an SSL Client Certificate Authority signed by DemoLab CA.

Further Reading

Introduction

MarkLogic Server is engineered to scale out horizontally by easily adding forests and nodes. Be aware, however, that when adding resources horizontally, you may also be introducing additional demand on the underlying resources.

Details

On a single node, you will see some performance improvement in adding additional forests, due to increased parallelization. This is a point of diminishing returns, though, where the number of forests can overwhelm the available resources such as CPU, RAM, or I/O bandwidth. Internal MarkLogic research (as of April 2014) shows the sweet spot to be around six forests per host (assuming modern hardware). Note that there is a hard limit of 1024 primary forests per database, and it is a general recommendation that the total number of forests should not grow beyond 1024 per cluster.

At cluster level, you should see performance improvements in adding additional hosts, but attention should be paid to any potentially shared resources. For example, since resources such as CPU, RAM, and I/O bandwidth would now be split across multiple nodes, overall performance is likely to decrease if additional nodes are provisioned virtually on a single underlying server. Similarly, when adding additional nodes to the same underlying SAN storage, you'll want to pay careful attention to making sure there's enough I/O bandwidth to accommodate the number of nodes you want to connect.

More generally, additional capacity above a bottleneck generally exacerbates performance issues. If you find your performance has actually decreased after horizontally scaling out some part of your stack, it is likely that a part of your infrastructure below the part at which you made changes is being overwhelmed by the additional demand introduced by the added capacity.

Summary

MarkLogic Application Servers will keep a connection open after completing and responding to a request, waiting for another new request, until the Keep Alive timeout expires. However, there is an exception scenario where the connection will close regardless of timeout settings when the content is larger then 1 MB. This article is intended to provide further insight into connection close with respect to Payload size.

HTTP Header

Connection-Length

In general, Application Servers communicating in HTTP send the Content-Length header as part of their response HTTP Headers to indicate how many bytes of data the client application should expect to receive. For example

HTTP/1.1 200 OK
Content-type: application/sparql-results+json; charset=UTF-8
Server: MarkLogic
Content-Length: 1264
Connection: Keep-Alive
Keep-Alive: timeout=5

This requires Application Servers to know the length of the entire response data before the very first bytes (Response HTTP Headers) are put on to the wire. For small amounts of data, the time to calculate the content-length is fast; For large amounts of content, the calculation may be time consuming with the extreme being that the client finds the server unresponsive due to the delay in calculating the entire response length. Additionally, the server may need to bring the entire content into Memory Buffer, putting further burden on server resources.

Chunked-encoding

To allow servers to begin transmitting dynamically-generated content before knowing the total size of that content, HTTP 1.1 supports chunked encoding. This technique is widely used in music & video streaming and other industries. Chunked encoding eliminates the need of knowing the entire content length before sending a portion of the data, thus making the server looks more responsive.

At the time of this writing, MarkLogic Server (v8.0-6 and earlier releases) does not support chunked encoding. However, do look for this feature in future releases of MarkLogic Server.

Connection Close

In MarkLogic Server v7 and v8, MarkLogic Server closes the connection after transmitting content greater 1MB, which allows MarkLogic to avoid calculating content length in advance. The client will not see Content-Length Header for Larger (>1MB) content in HTTP Response from MarkLogic. Instead it will receive a Connection Close header in HTTP Response. After sending the entire content, MarkLogic Server will terminate the connection, to indicate to Client that the end of content has been reached.

Closing the existing connection for content larger then 1MB is an exception to the Keep-Alive configuration. This may result in unexpected behavior on clients that relying on MarkLogic Server respecting the Keep-Alive configuration, so this behavior should be accounted while designing Client Application Connection Pool.

Client Applications may have to send TCP SYN again to establish new connection to send subsequent request, which will add overhead of TCP 3 way handshake before sending next request. However, in the context of the data transfer for larger payload (>1MB), where many more round trips are added in overall communication, overhead of TCP 3 way handshake is very nominal.

Further Reading

Summary

CSV files are a very common data exchange format. It is often used as an export format for spreadsheets, databases or any other application. Depending on the application, you might be able to change the delimiter character to a #hash or *asterix etc. One of the default delimiter definitions is a tab character. Content Pump supports reading and loading such CSV files.

Detail

The Content Pump -delimiter option defines which delimiter will be used to split the columns. Defining a tab as a value for the delimiter option on the command line isn't straight forward.

Loading tab delimited data files with content pump can result in an error massage like the following:

mlcp>bin/mlcp.sh IMPORT -host localhost -port 9000 -username admin -password secret -input_file_path sample.csv -input_file_type delimited_text -delimiter '    ' -mode local
13/08/21 15:10:20 ERROR contentpump.ContentPump: Error parsing command arguments: 
13/08/21 15:10:20 ERROR contentpump.ContentPump: Missing argument for option: delimiter
usage: IMPORT [-aggregate_record_element <QName>]
... 

Depending on the command line shell, a tab needs to be escaped to be understand from the shell script: 

On bash shell, this should work: -delimiter $'\t'
On Bourne shell, this should work: -delimiter 'Ctrl+V followed by tab' 
Alternative way would be to use: -delimiter \x09 

If none of these work, another approach you can try is to use the -options_file /path/to/options-file parameter. The options file can contains all of the same parameters as the command line does. The benefit of using an option file is that the command line is simpler and characters are interpreted as intended. The options file will contain multiple lines where the first line is always the action like IMPORT,  EXPORT etc. followed by a pair of lines. The first line is the option parameter and second the value for the option.

A sample could look like the following:

IMPORT
-host
localhost
-port
9000
-username
admin
-password
secret
-input_file_path
/path/to/sample.csv
-delimiter
' '
-input_file_type
delimited_text


Make sure the file is saved in UTF-8 format to avoid any parsing problems. To define a tab as delimiter, place a real tab between single quotes (i.e. '<tab>')

To use this option file with mlcp execute the following command:

Linux, Mac, Solaris:

mlcp>bin/mlcp.sh -options_file /path/to/sample.options

Windows:

mlcp>bin/mlcp.bat -options_file /path/to/sample.options

The options file can take any paramter which mlcp understands. It is important that the action command is defined on the first line. It is also possible to use both command line parameters and the option file. Command line parameters take precedence over those defined in the options file.

Summary

There are sometimes circumstances where the MarkLogic data directory owner can be changed.  This can create problems where MarkLogic Server is unable to read and/or write its own files but is easily corrected.

MarkLogic Server user

There are sometimes circumstances where the MarkLogic data directory owner can be changed; this can create problems where MarkLogic Server is unable to read and/or write its own files.

The default location for the data directory on Linux is /var/opt/MarkLogic and the default owner is daemon.

If you are using a nondefault (non-daemon) user to run MarkLogic, for example mlogic, you would usually have 

    export MARKLOGIC_USER=mlogic

in 

    /etc/marklogic.conf 

Correct the data directory ownership

If the file ownership is incorrect, the way forward is to change the ownership back to the correct user.  For example, if using the default user daemon:

1.  Stop MarkLogic Server.

2.  Make sure that the user you are using is correct and available on this machine.

3.  Change the ownership of all the MarkLogic files (by default /var/opt/MarkLogic and any/all forests for this node) to daemon.  The change needs to be made recursively below the directory to include all files.  Assuming all nodes in the cluster run as daemon, you can use another unaffected node as a check.  You may need to use root/sudo permissions to change owner.  For example:

chown -R daemon:daemon /var/opt/MarkLogic

4.  Start MarkLogic Server.  It should now come up as the correct user and able to manage its files.

References

Introduction:

MarkLogic Server allows you to set-up an alerting application to notify users when new content is available that matches a predefined query. This can be achieved through the Alerting API with the Content Processing Framework (CPF). CPF is designed to keep state for documents, so it is easy to use CPF to keep track of when a document in a particular scope is created or updated, and then perform some action on that document. However, although alerting works for document updates and inserting, it does not occur for document deletes. You will have to create a custom CPF pipeline to catch the delete through an appropriate status transition.

Details

To achieve alerting for document delete, you will have to write your own custom pipeline with status transition to handle deletes. For example:

<status-transition>
   <annotation>custom delete action</ annotation>
   <status>deleted</p:status>
   <priority>5000</p:priority>
   <always>true</always>
   <default-action>
       <module>/custom-delete-action.xqy</module>
   </default-action>
</status-transition>

The higher 'priority' value and 'always' = true indicates that the custom pipeline has precedence over the default status change handling pipeline to handle document deletes.  Similarly, in the action module, you can write your custom code for alerting.

Note: By default, when a document is deleted, the on-delete pre-commit trigger is fired and it calls the action in the Status Change Handling pipeline (if enabled) for ‘delete’ status transition. It is recommended that you do not modify this pipeline as it can cause compatibility problems in future upgrades and releases of MarkLogic server.

Summary

Packer from HashiCorp is a provisioning tool, allowing for the automated creation of machine images, extending the ability to manage infrastructure to machine images. Packer supports a number of different image types including AWS, Azure, Docker, VirtualBox and VMWare.

Packer can be used to create a customized MarkLogic Amazon Machine Image (AMI) which can then be deployed to AWS and used in a Cluster. We recommend using the official MarkLogic AMIs whenever possible, and making the necessary customizations to the official images. This ensures that MarkLogic Support is able to quickly diagnose any issues that may occur, as well as reducing the risk of running MarkLogic in a way that is not fully supported.

The KB article, Customizing MarkLogic with Packer and Terraform, covers the process of customizing the official MarkLogic AMI using Packer.

Setting Up Packer

For the purpose of this example, I will assume that you have already installed the AWS CLI, with the correct credentials, and you have installed Packer.

Packer Templates

A Packer template is a JSON configuration file that is used to define the image that we want to build. Templates have a number of keys available for defining the machine image, but the most commonly used ones are builders, provisioners and post-processors.

  • builders are responsible for creating the images for various platforms.
  • provisioners is the section used to install and configure software running on machines before turning them into images.
  • post-processors are actions applied to the images after they are created.

Creating a Template

For our example, we are going to take build from the official Amazon Linux 2 AMI, where we will install the required prerequisite packages, install MarkLogic, and apply some customizations before creating a new image.

Defining Variables

Variables help make the build more flexible, so we will utilize a separate variables file, marklogic_vars.json, to define parts of our build.

{
  "vpc_region": "us-east-1",
  "vpc_id": "vpc-06d3506111cea30d0",
  "vpc_public_sn_id": "subnet-03343e69ae5bed127",
  "vpc_public_sg_id": "sg-07693eb077acb8635",
  "instance_type": "t3.large",
  "ssh_username": "ec2-user",
  "ami_filter": "amzn2-ami-hvm-2.*-ebs",
  "ami_owner": "amazon",
  "binary_source": "./",
  "binary_dest": "/tmp/",
  "marklogic_binary": "MarkLogic-10.0-4.2.x86_64.rpm"
}

Here we've identified the instance details so our image can be launched, as well as the filter values, ami_filter and ami_owner, that will help us retrieve the correct base image for our AMI. We are also identifying the name of the MarkLogic binary, along with some path details on where to find it locally, and where to place it on the remote host.

Creating Our Template

Now that we have some of the specific build details defined, we can create our template, marklogic_ami.json. In this case we are going to use the build and provisioners keys in our build.

{
    "builders": [
      {
        "type": "amazon-ebs",
        "region": "{{user `vpc_region`}}",
        "vpc_id": "{{user `vpc_id`}}",
        "subnet_id": "{{user `vpc_public_sn_id`}}",
        "associate_public_ip_address": true,
        "security_group_id": "{{user `vpc_public_sg_id`}}",
        "source_ami_filter": {
          "filters": {
          "virtualization-type": "hvm",
          "name": "{{user `ami_filter`}}",
          "root-device-type": "ebs"
          },
          "owners": ["{{user `ami_owner`}}"],
          "most_recent": true
        },
        "instance_type": "{{user `instance_type`}}",
        "ssh_username": "{{user `ssh_username`}}",
        "ami_name": "ml-{{isotime \"2006-01-02-1504\"}}",
        "tags": {
          "Name": "ml-packer"
        }
      }
    ],
    "provisioners": [
      {
        "type": "shell",
        "script": "./marklogicInit.sh"
      },
      {
        "destination": "{{user `binary_dest`}}",
        "source": "{{user `binary_source`}}{{user `marklogic_binary`}}",
        "type": "file"
      },
      {
        "type": "shell",
        "inline": [ "sudo yum -y install /tmp/{{user `marklogic_binary`}}" ]
      }
    ]
  }

In the build section we have defined the network and security group configurations and the source AMI details. We have also defined the naming convention (ml-YYYY-MM-DD-TTTT) for the our new AMI with ami_name and added a tag, ml-packer. Both of those will make it easier to find our AMI when it comes time to deploy it.

Provisioners

In our example, we are using the shell provisioner to execute a script against the machine, the file provisioner to copy the MarkLogic binary file to the machine, and the shell provisioner to install the MarkLogic binary, all of which will be run prior to creating the image. There are also provisioners available for Ansible, Salt, Puppet, Chef, and PowerShell, among others.

Provisioning Script

For our custom image, we've determined that we need install Git, to create a symbolic link MarkLogic needs on Amazon Linux 2, and to setup /etc/marklogic.conf to disable the MarkLogic Managed Cluster feature, all of which we will do inside a script. We've named the script marklogicInit.sh, and it is stored in the same directory as our Packer template.

#!/bin/bash -x
echo "**** Starting setup.sh ****"
echo "**** Creating LSB symbolic link ****"
sudo ln -s /etc/system-lsb /etc/redhat-lsb
echo "**** Installing Git ****"
sudo yum install -y git
echo "**** Setting Up /etc/marklogic.conf ****"
echo "export MARKLOGIC_MANAGED_NODE=0" >> /tmp/marklogic.conf
sudo cp /tmp/marklogic.conf /etc/
echo "**** Finishing setup.sh ****"

Executing Our Build

Now that we've completed setting up our build, it's time to use Packer to create the image.

packer build -debug -var-file=marklogic_vars.json marklogic_ami.json

Here you can see that we are telling Packer to do a build using marklogic_ami.json and referencing our variables file with the -var-file flag. We've also added the -debug flag which will disable parallelism and enable debug mode. In debug mode, Packer will stop after each step and prompt you to hit Enter to go to the next step.

The last part of the build output will print out the details of our new image:

Wrapping Up

We have now created a customized MarkLogic AMI using Packer, which can be used to deploy a self managed cluster.

Introduction

If you're looking at the MarkLogic Admin UI on port 8001, you may have noticed that the status page for a given database displays the last backup dateTime for a given database.

We have been asked in the past how this gets computed so the same check can be performed using your own code.

This Knowledgebase article will show examples that utilise XQuery to get this information and will explore the possibility of retrieving this using the MarkLogic ReST API

XQuery: How does the code work?

The simple answer is in the forest status for each of the forests in the database (note these values only appear if you have created a backup already).  For the sake of these examples, let's say we have a database (called "test") which contains 12 forests (test-1 to test-12).  We can get the backup status for these using a call to our ReST API:

http://localhost:8002/manage/LATEST/forests/test-1?view=status&format=html

In the results returned, you should see something like this:

last-backup : 2016-02-12T12:30:39.916Z datetime
last-incr-backup : 2016-02-12T12:37:29.085Z datetime

In generating that status page, what the MarkLogic code does is to create an aggregate: a database doesn't contain documents in MarkLogic; it contains forests and those forests contain documents.

Continuing the example above (with a database called "test" containing 12 forests) if I run the following:

This will return the forest status(es) for all forests in the database "test" and return the forest names using XPath, so in this case, we would see:

<forest-name xmlns="http://marklogic.com/xdmp/status/forest">test-1</forest-name>
[...]
<forest-name xmlns="http://marklogic.com/xdmp/status/forest">test-12</forest-name>

Our admin UI is interrogating each forest in turn for that database and finding out the metrics for the last backup.  So to put that into context, if we ran the following:

This gives us:

<last-backup xmlns="http://marklogic.com/xdmp/status/forest">2016-02-12T12:30:39.946Z</last-backup>
[...]
<last-backup xmlns="http://marklogic.com/xdmp/status/forest">2016-02-12T12:30:39.925Z</last-backup>

The code (or the status report) doesn't want values for all 12 forests, it just wants the time the last forest completed the backup (because that's the real time the backup completed), so our code is running a call to fn:max:

Which gives us the max value (as these are all xs:dateTimes, it's finding the most recent date), which in the case of this example is:

2016-02-12T12:30:39.993Z

The same is true for the last incremental backup (note all that we're changing here is the XPath to get to the correct element:

So we can get the max value for this by getting the most recent time across all forests:

This would give us 2016-02-12T12:37:29.161Z

Using the ReST API

The ReST API also allows you to get this information but you'd need to jump through a few hoops to get to it; the ReST API status for a given database would give you the names of all the forests attached to that database:

http://localhost:8002/manage/LATEST/databases/test

And from there you could GET the information for all of those forests:

http://localhost:8002/manage/LATEST/forests/test-1?view=status&format=html
[...]
http://localhost:8002/manage/LATEST/forests/test-12?view=status&format=html

Once you'd got all those values, you could do what MarkLogic's admin code does and get the max values for them - although at this stage, it might make more sense to write a custom endpoint that returns this information, something like:

Where you could make a call to that module to get the aggregates (e.g.):

http://[server]:[port]/[modulename.xqy]?db=test

This would return the database status for any given parameter-name that is passed in.

 

Problem:

When searching for matches using OR'ed word-queries, and in the case where there are overlapping matches, (i.e. one query contains the text of another query), the results of a cts:highlight query are not as desired.

 

For example:

 

let $p := <p>From the memoirs of an accomplished artist</p>

 

let $query :=

 

cts:or-query(

(cts:word-query("accomplished artist"),

cts:word-query("memoirs of an accomplished artist"))

)

 

return cts:highlight($p, $query, <m>{$cts:text}</m>)

 

 The desired outcome of this would be:

               <p>From the <m>memoirs of an accomplished artist</m> </p>

 Whereas, the actual results are:

                <p>From the <m>memoirs of an </m> <m>accomplished artist</m></p>

 

This behavior is by design and the results are expected. It is because cts:highlight  breaks up overlapping  areas into separate matches.

The cts:highlight built-in variables – $cts:queries and $cts:action help in understanding how this works, as well as to work-around this problem.

  $cts:queries --> returns the matching queries for each of the matched texts.

  $cts:action --> can be used with xdmp:set to specify what should happen next

  • "continue" - (default) Walk the next match. If there are no more matches, return all evaluation results.
  • "skip" - Skip walking any more matches and return all evaluation results
  • "break" - Stop walking matches and return all evaluation results

   For eg., replacing the return statement with the following in the original query:

return

 cts:highlight($p, $query,

<m>{$cts:text,<number-of-matches>{count($cts:queries)}</number-of-matches>,

<matched-by>{$cts:queries}</matched-by>}</m>)

 

==>

 

<p>From the

     <m>memoirs of an

     <number-of-matches>1</number-of-matches>

     <matched-by>

      <cts:word-query xmlns:cts="http://marklogic.com/cts">

       <cts:text xml:lang="en">memoirs of an accomplished artist</cts:text>

      </cts:word-query>

    </matched-by>

     </m>

 

   <m>accomplished artist

   <number-of-matches>2</number-of-matches>

    <matched-by>

      <cts:word-query xmlns:cts="http://marklogic.com/cts">

     <cts:text xml:lang="en">memoirs of an accomplished artist</cts:text>

      </cts:word-query>

      <cts:word-query xmlns:cts="http://marklogic.com/cts">

    <cts:text xml:lang="en">accomplished artist</cts:text>

      </cts:word-query>

    </matched-by></m></p>

 

These results give us a better understanding of how the text is being matched. We can see that " accomplished artist" is matched by both the word-queries 'accomplished artist' and 'memoirs of an accomplished artist'; hence the results of cts:highlight seem different.

To work around this problem, we can insert a small piece of code: 

 

let $p := <p>From the memoirs of an accomplished artist</p>

let $query :=

     cts:or-query(

        (cts:word-query("accomplished artist"),

        cts:word-query("memoirs of an accomplished artist")))

 

     return cts:highlight($p,$query,

 

       ( if (count($cts:queries) gt 1) then xdmp:set($cts:action, "continue")

         else

       ( let $matched-text := <x>{$cts:queries}</x>/cts:word-query/cts:text/data(.)

        return <m>{$matched-text}</m> )

        ))

 

==>

 

<p>From the <m>memoirs of an accomplished artist</m></p>

 

 

Please note that this solution relies on assumptions about what's inside the or-query, but this example could be modified to handle other overlapping situations.

 

   

 



      These results giv

      e us a better understanding of how the text is being matched. We can see that " accomplished artist" is matched by both the word-queries, and hence the results of cts:highlight seem different.

      Summary

      Packer from HashiCorp is an open source provisioning tool, allowing for the automated creation of machine images, extending the ability to manage infrastructure to machine images. Packer supports a number of different image types including AWS, Azure, Docker, VirtualBox and VMWare.

      These powerful tools can be used together to deploy a MarkLogic Cluster to AWS using the MarkLogic CloudFormation Template, using a customized Amazon Machine Image (AMI). The MarkLogic CloudFormation Template is the preferred method recommended by MarkLogic for building out MarkLogic clusters within AWS. By default the MarkLogic CloudFormation Template uses the official MarkLogic AMIs.

      While this guide will cover a some portions of Terraform, the primary focus will be using Packer to customize an official MarkLogic AMI. For more detailed information on Terraform, we recommend reading Deploying MarkLogic to AWS with Terraform, which includes more detailed information on using Terraform, as well as the example files referenced later in this article.

      Setting Up Packer

      For the purpose of this example, I will assume that you have already installed the AWS CLI, with the correct credentials, and you have installed Packer.

      Packer Templates

      A Packer template is a JSON configuration file that is used to define the image that we want to build. Templates have a number of keys available for defining the machine image, but the most commonly used ones are builders, provisioners and post-processors.

      • builders are responsible for creating the images for various platforms.
      • provisioners is the section used to install and configure software running on machines before turning them into images.
      • post-processors are actions applied to the images after they are created.

      Creating a Template

      For our example, we are going to take the official MarkLogic AMI and apply some customizations before creating a new image.

      Defining Variables

      Variables help make the build more flexible, so we will utilize a seperate variables file, vars.json, to define parts of our build.

      {
      "vpc_region": "us-east-1",
      "vpc_id": "vpc-06d3506111cea30d0",
      "vpc_public_sn_id": "subnet-03343e69ae5bed127",
      "vpc_public_sg_id": "sg-07693eb077acb8635",
      "ami_filter": "release-MarkLogic-10*",
      "ami_owner": "679593333241",
      "instance_type": "t3.large",
      "ssh_username": "ec2-user"
      }

      Creating Our Template

      Now that we have some of the specific build details defined, we can create our template, base_ami.json. In this case we are going to use the build and provisioners keys in our build.

      {
        "builders": [
          {
            "type": "amazon-ebs",
            "region": "{{user `vpc_region`}}",
            "vpc_id": "{{user `vpc_id`}}",
            "subnet_id": "{{user `vpc_public_sn_id`}}",
            "associate_public_ip_address": true,
            "security_group_id": "{{user `vpc_public_sg_id`}}",
            "source_ami_filter": {
              "filters": {
              "virtualization-type": "hvm",
              "name": "{{user `ami_filter}}",
              "root-device-type": "ebs"
              },
              "owners": ["{{user `ami_owner`}}"],
              "most_recent": true
            },
            "instance_type": "{{user `instance_type`}}",
            "ssh_username": "{{user `ssh_username`}}",
            "ami_name": "ml-{{isotime \"2006-01-02-1504\"}}",
            "tags": {
              "Name": "ml-packer"
            }
          }
      ],
        "provisioners": [
          {
            "type": "shell",
            "script": "./baseInit.sh"
           },
          {
            "destination": "/tmp/",
            "source": "./marklogic.conf",
            "type": "file"
          },
          {
            "type": "shell",
            "inline": [ "sudo mv /tmp/marklogic.conf /etc/marklogic.conf" ]
          }
        ]
      }

      In the build section we have defined the network and security group configurations and the source AMI details. We have also defined the naming convention (ml-YYYY-MM-DD-TTTT) for the our new AMI with ami_name and added a tag, ml-packer. Both of those will make it easier to find our AMI when it is time to use it with Terraform.

      Provisioners

      In our example, we are using the shell provisioner to execute a script against the machine, the file provisioner to copy the marklogic.conf file to the machine, and the shell provisioner to move the file to /etc/, all of which will be run prior to creating the image. There are also provisioners available for Ansible, Salt, Puppet, Chef, and PowerShell, among others.

      Provisioning Script

      For our custom image, we've determined that we need an additional piece of software installed, which we will do inside a script. We've named the script baseInit.sh, and it is stored in the same directory as our packer template.

      #!/bin/bash
      echo "**** Starting setup.sh ****"
      echo "Installing Git"
      sudo yum install -y git
      echo "**** Finishing setup.sh ****"

      Executing Our Build

      Now that we've completed setting up our build, it's time to use packer to create the image.

      packer build -debug -var-file=vars.json base_ami.json

      Here you can see that we are telling packer to do a build using base_ami.json and referencing our variables file with the -var-file flag. We've also added the -debug flag which will disable parallelism and enable debug mode. In debug mode, packer will stop after each step and prompt you to hit Enter to go to the next step.

      The last part of the build output will print out the details of our new image:

      ==> Builds finished. The artifacts of successful builds are:
      --> amazon-ebs: AMIs were created:
      us-east-1: ami-0100....

      Terraform and the MarkLogic CloudFormation Template

      At this point we have our image and want to use it when deploying the MarkLogic CloudFormation Template. Unfortunately there is no simple way to do this, as the MarkLogic CloudFormation Template does not have the option to specify a custom AMI. Fortunately Terraform has some functions available that we can use to make the changes to the Template.

      Variables

      First we want to add a couple entries to our existing Terraform variables file.

      variable "ami_tag" {
        type = string
        default = "ml-packer"
      }

      variable "search_string" {
        type = string
        default = "ImageId: "
      }

      The first variable, ami_tag is the tag we added to AMI when it was built. The second variable, search_string will be described in the Updates to Terraform Root Module section below.

      Data Source

      To retrieve the AMI, we need to define a data source. In this case it will be an aws_ami data source. We are going to call the file data-source.tf.

      data "aws_ami" "ml_ami" {
        filter {
          name = "state"
          values = ["available"]
        }

        filter {
          name = "tag:Name"
          values = ["${var.ami_tag}"]
        }
        owners = ["self"]
        most_recent = true
      }

      So we are filtering the available AMIs, only looking at ones that are owned by our own account (self), tagged with the value that we defined in our variables file, and then if more than one AMI is returned, using the most recent.

      Updates to Terraform Root Module

      Now we are ready to make a couple of updates to our Terraform root module file to integrate the new AMI into our deployment. In our last example, we used the MarkLogic CloudFormation template from its S3 bucket. For this deployment, we are going to use a local copy of the template, mlcluster-template.yaml.

      Replace the template_url line with the following line:

      template_body = replace(file("./mlcluster-template.yaml"), "/${var.search_string}.*/","${var.search_string} ${data.aws_ami.ml_ami.id}")

      When we updated the variables in our Terraform variable file, we created the variable search_string. In the MarkLogic CloudFormation Template, the value for the Image ID is identified by the region and whether you are running the Essential Enterprise or Bring Your Own License version of MarkLogic Server. Here we are taking a regular expression, and using the replace function to manually update the line to reference the AMI we just created with Packer, which we have already retrieved already.

      Deploying with Terraform

      Now we are ready to run Terraform to deploy our cluster. First we want to double check that the template looks correct before we attempt to create the CloudFormation stack. The output of terraform plan will show the CloudFormation template that will be deployed. Check the output to make sure that the value for ImageId shows our desired AMI

      Once we have confirmed our new AMI is being referenced, we can then run terraform apply to create a new stack using the template. This can be validated by opening a command line on one of the new hosts, and checking to see if Git is installed, and if /etc/marklogic.conf exists:

      Wrapping Up

      At this point, we have now customized the official MarkLogic AMI to create our own AMI using Packer. We have then used Terraform to update the MarkLogic CloudFormation Template and to deploy a CloudFormation stack based on the updated template.

      Summary

      Long URI prefix may lead to imbalance in data distribution among the forests. 

      Observation

      Database assignment policy is set to 'Bucket'. Rebalancer is set to enable, and no fragments is pending to be rebalanced; However, data is imbalanced across forests associated with database. Few forests has higher number of fragments compared to other forests in a given database.

      Root cause

      For bucket assignment policy, document uri is hashed to match specific bucket. The bucket policy algorithm maps a document’s URI to one of 16K “buckets,” with each bucket being associated with a forest. A table mapping buckets to forests is stored in memory for fast assignment.

      Bucket algorithm does not consider whole uri length for the calculation while determining bucket based on uri hash. Uri based bucket determination in bucket assignment policy rely largely on initial characters for hashing algorithm.

      If document uri includes long common prefix then all documents uri will result in same hash value and same bucket, even if they different suffix number, and hence result is skewed if there is larger common prefix.

      Analysis

      To confirm if uneven number of fragments between different forests in database, you can run below query which will give 100 sample documents from each forests and you can review if there are common prefix in document uri in forests with higher number of fragments.

      xquery version "1.0-ml";

      for $i in xdmp:database-forests(xdmp:database('<dbname>'))
          let $uri := for $j in cts:uris((),(),(),(), $i)[0 to 100]
                      return <uri>{$j}</uri>
      return <forests><forest>{$i}</forest><uris>{$uri}</uris></forests>

      Recommendation

      We recommend document uri to not have long name and common prefix. Certain common document uri values can be changed to collection.

      Example uri -  /Prime/InternationalTradeDay/Activity/AccountId/ABC0001/BusinessDate/2021-06-14/CurrencyCode/USD/ID/ABC0001-XYZ-123.json

      Can be -  /ABC0001-XYZ-123.json. with collection "USD", "Prime", and doc that have date element with "2021-06-14".

      Above is just an example, but suggestion is to have an URI naming pattern to avoid large common prefix or save under collection. 

      You can use document-assign built-in to verify if URI’s are distributed per the bucket algorithm.

      https://docs.marklogic.com/xdmp:document-assign

      Additional Resources

      Question

      Answer

      Further Reading

      What is Data Hub?

      The MarkLogic Data Hub is an open-source software interface that works to:

      1. ingest data from multiple sources
      2. harmonize that data
      3. master that data
      4. then search and analyze that data

      It runs on MarkLogic Server, and together, they provide a unified platform for mission-critical use cases.

      Documentation:

      How do I install Data Hub?

      Please see the referenced documentation Install Data Hub

      What software is required for Data Hub installation?

      Documentation:

      What is MarkLogic Data Hub Central?

      Hub Central is the Data Hub graphical user interface

      Documentation:

      What are the ways to ingest data in Data Hub?

      • Hub Central (note that Quick Start has been deprecated since Data Hub 5.5)
      • Data Hub Gradle Plugin
      • Data Hub Client JAR
      • Data Hub Java APIs
      • Data Hub REST APIs
      • MarkLogic Content Pump (MLCP)

      Documentation:

      What is the recommended batch size for matching steps?

      • The best batch size for a matching step could vary due to the average number of matches expected
      • Larger average number of matches should use smaller batch sizes
      • A batch size of 100 is the recommended starting point

      Documentation:

      What is the recommended batch size for merging steps?

      The merge batch size should always be 1

      Documentation:

      How do I kill a long running flow in Data Hub?

      At the moment, the feature to stop/kill a long running flow in DataHub isn't available.

      If you encounter this issue, please provide support with the following information to help us investigate further:

      • Error logs and exception traces from the time the job was started
      • The job document for the step in question
        • You can find that document under the "data-hub-JOBS" db using the job ID
          • Open the query console
          • Select data-hub-JOBS db from the dropdown
          • Hit explore
          • Enter the Jobs ID from the screenshot in the search field and hit enter:
            • E.g.: *21d54818-28b2-4e56-bcfe-1b206dd3a10a*
          • You'll see the document in the results

      Note: If you want to force it, you can cycle the Java program and stop the requests from the corresponding app server status page on the Admin UI.

      KB Article:

      What do we do if we are receiving SVC-EXTIME error consistently while running the merging step?

      “SVC-EXTIME” generally occurs when a query or other operation exceeds its processing time limit. There are various reasons behind this error. For example,

      • Lack of physical resources
      • Infrastructure level slowness
      • Network issues
      • Server overload 
      • Document locking issues

      Additionally, you need to review the step where you match documents to see how many URIs you are trying to merge in one go. 

      • Reduce the batch size to a value that gives a balance between processing time and performance (the SVC-EXTIME timeout error)
      • Modify your matching step to work with fewer matches per each run rather than a huge number of matches
      • Turning ON the SM-MATCH and SM-MERGE traces would give a good indication of what it is getting stuck on. Do note, however, to turn them OFF once the issue has been detected/resolved.

      Documentation:

      What are the best practices for performing Data Hub upgrades?

      • Note that Data Hub versions depend on MarkLogic Server versions - if your Data Hub version requires a different MarkLogic Server version, you MUST upgrade your MarkLogic Server installation before upgrading your Data Hub version
      • Take a backup
      • Perform extensive testing with all use-cases on lower environments
      • Refer to release notes (some Data Hub upgrades require reindexing), upgrade documentation, version compatibility with MarkLogic Server

      KB Article:

      How can I encrypt my password in Gradle files used for Data Hub?

      You may need to store the password in encrypted Gradle properties and reference the property in the configuration file. 

      Documentation:

      Blog:

      How can I create a Golden Record using Data Hub?

      A golden record is a single, well-defined version of all the data entities in an organizational ecosystem.

      • In the Data Hub Central, once you have gone through the process of ingest, map and master, the documents in the sm-<EntityType>-mastered collection would be considered as golden records

      KB article:

      What authentication method does Data Hub support?

      DataHub primarily supports basic and digest authentication. The configuration for username/password authentication is provided when deploying your application.

      How do I know the compatible MarkLogic server version with Data Hub version?

      Refer to Version Compatibility matrix.

      Can we deploy multiple DHF projects on the same cluster?

      This operation is NOT supported.

      Can we perform offline/disconnected Data Hub upgrades?

      This is NOT supported, but you can refer to this example to see one potential approach

      TDE Generation in Data Hub

      For production purposes, you should configure your own TDE's instead of depending solely on TDE's generated by Data Hub (which may not be optimized for performance or scale)

      Where does gradle download all the dependencies we need to install DHF from?

      Below is the list of sites that Gradle will use in order to resolve dependencies:

      This tool is helpful to figure out what the dependencies are:

      • It provides a shareable and centralized record of a build that provides insights into what happened and why
      • You can create build scans using this tool and even publish those results at https://scans.gradle.com to see where Gradle is trying to download each dependency from under the "Build Dependencies" section on the results page.



      Introduction

      In the Scalability, Availabilty & Failover Guide, the node communication section describes a quorum as >50% of the nodes in a cluster.

      Is it possible for a database to be available for reads and writes, even if a quorum of nodes is not available in the cluster?

      The answer is yes, there are configurations and sequences of events that can lead to forests remaining online when there are fewer than 50% of the hosts being online.

      Details

      If a single forest in a database is not available, the database is not be accessible. It is also true that as long as all of a database's forests are available in the cluster, the database will be available for reads and writes regardless of any quorum issues.

      Of course, the Security database must also be available in the cluster for the cluster to function.

      Forest Availability: Simple Case

      In the simplest case, if you have a forest that is not configured with either local disk failover or shared disk failover and as long as the forest's host is online and exists in the cluster, the forest will be available regardless of any quorum issues.

      To explain this case in more detail: if we have a 3-node MarkLogic cluster containing 3 hosts (let's call them host-a, host-b and host-c); if we were to then initialize host-a as the primary host (so this is the first host is set up in the cluster and is the host containing the master security database) and we then join host-b and host-c to host-a to complete the cluster. 

      Shortly after that, if we shut both the joiner hosts (host-b and host-c) down, so only host host-a remained online, we would see a chain of messages in the primary host's ErrorLog that indicated there was no longer quorum within the cluster:

      2020-05-21 01:19:14.632 Info: Detected quorum (3 online, 1 suspect, 0 offline)
      2020-05-21 01:19:18.570 Warning: Detected suspect quorum (3 online, 2 suspect, 0 offline)
      2020-05-21 01:19:29.715 Info: Disconnecting from domestic host host-b.example.marklogic.com because it has not responded for 30 seconds.
      2020-05-21 01:19:29.715 Info: Disconnected from domestic host host-b.example.marklogic.com
      2020-05-21 01:19:29.715 Info: Detected suspect quorum (2 online, 1 suspect, 1 offline)
      2020-05-21 01:19:33.668 Info: Disconnecting from domestic host host-c.example.marklogic.com because it has not responded for 30 seconds.
      2020-05-21 01:19:33.668 Info: Disconnected from domestic host host-c.example.marklogic.com
      2020-05-21 01:19:33.668 Warning: Detected no quorum (1 online, 0 suspect, 2 offline)

      Under these circumstances, we would be able to access the host's admin GUI on port 8001 and it would respond without issue.  We would be able to access Query Console on that host on port 8000 and would be able to inspect the primary host's databases.  We would also be able to access the Monitoring History on port 8002 - all directly from the primary host.

      In this scenario, because the primary host remains online and the joining hosts are offline; and because we have not yet set up failover anywhere, there is no requirement for quorum, so host-a remains accessible.

      If host-a also happened to have a database with forests that only resided on that host, these would be available for queries at this time.  However, this is a fairly limited use case because in general, if you have a 3-node cluster, you would have a database whose forests reside on all three hosts in the cluster with failover forests configured on alternating hosts. 

      As soon as you do this, if you lose one host and you don't have failover configured, the database would now become unavailable (due to a crucial forest being offline) and if you had failover forests configured, you would still be able to access the database on the remaining two hosts.

      However, if you then shut down another host, you would lose quorum (which is a requirement for failover).

      Forest Availability: Local Disk Failover

      For forests configured for local disk failover, the sequence of events is important:

      In response to a host failure that makes an "open" forest inaccessible, the forest will failover to the configured forest replica as long as a quorum exists and the configured replica forest was in the "sync replicating" state. In this case, the configured replica forest will transition to the "open" state; the configured replica forest becomes the acting master forest and is available to the database for both reads and writes.

      Additionally, an "open" forest will not go offline in response to another host being evicted from the cluster.

      However, once cluster quorum is lost, forest failovers will no longer occur.

      Conclusion

      Depending on how your forests are distributed in the cluster and depending of the order of host failures, it is possible that a database can remain online even when there is no longer a quorum of hosts in the cluster.

      Of course, databases with many forests spread across many hosts typically can't stay online if you lose quorum because some forest(s) will become unavailable.

      Recommendation

      Even though it is possible to have a functioning cluster with less than a quorum of hosts online, you should not architect your high availability solution to depend on it.

      Summary

      This article discusses what happens when you backup or restore your database after a local disk failover event on one of the database forests.

      Introduction

      MarkLogic Server provides high availability in the event of a data node failure. Data node failures can include operating system crashes, MarkLogic Server restarts, power failures, or persistent system failures; for example hardware failures. With Forest level failover enabled and configured, a machine that hosts a forest can go down and the MarkLogic Server cluster automatically recovers from the outage and keep continuing to process queries without any immediate action needed by an administrator. In MarkLogic Server, if a forest becomes unavailable then the entire database to which this forest is attached becomes unavailable for further query operations. Without failover, such a failure requires a manual intervention (such as administrator) to either reconfigure the forest to another host or to remove this forest from the configuration (cluster). With failover, you can configure the forest to automatically switch to a replica forest on a different host. MarkLogic Server Failover provides for high availability and maintains data and transactional integrity in the event of a data node failure.

      The failover scenarios are well documented on our developer web site.

      Local Disk Failover

      You to configure a forest on another host to serve as a replica forest which will take over when a primary master forest's host goes offline. Local-disk failover allows you to create one or more replica forests for each primary forest. Replica forests contain the exact same data as the primary forest and are kept consistent transactionally. 

      It is helpful to use the following terms to refer to the forest configurations and states:

      • Configured Master is the forest which is originally configured as the primary forest.
      • Configured Replica is a forest on another host that is configured as a replica forest of the primary. 
      • Acting Master is the forest that is server as the master forest, regardless of the configuration.
      • Acting Replica is the forest that is server as the replica forest, regardless of the configuration.

      Database Backup when a forest is failed over

      If you attempt to take a Database back up or perform a database restore when One of the forests of the database had failed over to the replica (i.e. Configured Replica is serving as Acting Master), it may result in XDMP-FORESTNOTOPEN or XDMP-HOSTDOWN errors.

      When a database backup takes place, by default, everything associated with database gets backed up. You can also choose to backup any individual forests (only the forests selected while configuring backup are backed up). T

      Replica Forest will only be backed up when the 'Include replica forests' are enabled.  If you have not configured the backup to include replica forests, then the replica forests will not be backed up even if it is the acting master. If the Configured Master is also not available, then neither forest will be backed up. In this circumstance, you may see a message in the error logs similar to "Warning: Not backing up database test because first forest master is not available, and replica backups aren't enabled."

      Restore when a forest is failed over

      Restore's will fail if executed when a forest is failed over (i.e. Configured Replica is serving as Acting Master). In this circumstance, you may see a message in the error logs similar to "Operation failed with error message. Check server logs." or "XDMP:HOSTDOWN".

      How to detect if a forest is failed over

      In the Admin UI:

      1. Click the Forests icon in the left tree menu;
      2. Click the Summary tab;
      3. You see the configured replica in open state; (This indicates that the Configured Replica is serving as Acting Master).

      At the time of the failover event, you may see messages in the Error Log similar to:
      2013-10-03 12:49:53.873 Info: Disconnecting from domestic host rh6v-intel64-9.marklogic.com in cluster 16599165797432706248 because it has not responded for 30 seconds.
      2013-10-03 12:49:53.873 Info: Disconnected from host rh6v-intel64-9.marklogic.com
      2013-10-03 12:49:53.873 Info: Unmounted forest test_P
      2013-10-03 12:49:53.875 Info: Forest test_R assuming the role of master with new precise time 13808297938747190
      2013-10-03 12:49:53.875 Debug: Recovering undo on forest test_R
      2013-10-03 12:49:53.875 Debug: Recovered undo at endTimestamp 13807844927734200 minQueryTimestamp 0 on forest test_R

      Revert back from the failover state:

      When the configured master is the acting replica, this is considered the "failover state".  In order to revert back, you must either restart the acting master forest or restart the host in which the acting master forest is locally mounted. After restarting, the forest will automatically revert to Configured Master if it's host is online. To check the status of the forests, see the Forests Summary tab in the Admin Interface. 


      Conclusion 

      For backup and restore to work correctly, clusters configured with local disk failover must have no forests in a failed over state. If a cluster is configured with local disk failover, and if some of its forests are failed over to their local disk replicas, the conditions causing the fail over must be resolved, and the cluster must be returned to the original forest configuration before backup and restore operations may resume.

      INTRODUCTION

      From the documentation:

      Queries on a Replica database must run at a timestamp that lags the current cluster commit timestamp due to replication lag. Each forest in a Replica database maintains a special timestamp, called a Non-blocking Timestamp, that indicates the most current time at which it has complete state to answer a query. As the Replica forest receives journal frames from its Master, it acknowledges receipt of each frame and advances its nonblocking timestamp to ensure that queries on the local Replica run at an appropriate timestamp. Replication lag is the difference between the current time on the Master and the time at which the oldest unacknowledged journal frame was queued to be sent to the Replica.

      To read more:

      http://tinyurl.com/7zwq4l2

      SCENARIO

      Consider the following customer scenario:

      • The storage the database resides on at one site fails.
      • This requires the customer to run for a period of time on a single site.
      • The storage / MarkLogic server are recovered at the site where the failure occurred.
      • The customer needs to re-establish replication between the two sites

      QUESTIONS AND ANSWERS

      Q: Should we tune the lag limit to suit our application?

      AWe have found in our own performance testing that increasing the lag limit beyond the default is typically not helpful.

      When the master has a sustained rate of updates, a large lag limit causes it to run quickly ahead of the replica, then stall for an extended period of time until the replica catches up. This pattern repeats over and over and gives inconsistent performance on the master.

      A smaller lag limit causes the master to suspend updates more frequently but for shorter periods of time, resulting in more consistent perceived performance.

      Q: Is there any option to restore the replica database to a point in time from a backup of the master database & re-initiate replication from that point onwards?

      A: It's fine to restore a backup to the failed system when it comes back online and before configuring replication in the reverse direction.

      Q: Is there a limit to how old a backup of the replica database can be (e.g. can a replica be restored from months back in comparison to the master) and will it still sync back to the master without issue? And does this depend on what journal data is available?

      A: There is no limit to how old a backup can be; the system will calculate all the deltas and apply them.

      Q: Are there any documented API built-ins for any of these things?

      A: Indeed; all the replication information is available through a call to xdmp:forest-status()

      xdmp:forest-status( 
        xdmp:database-forests( 
          xdmp:database("MyDatabase"), 
          fn:true()))

      For further information:

      http://tinyurl.com/d6vbpk4

      Q: Can you also advise if the replication lag limit mentioned in section 1.2.5 and the related possibility of transactions stalling on the master database applies during the bulk replication phase?

      A: As long as the replica's forests are in "open replica" state, the replica will respond to queries at any commit timestamp it is able to support irrespective of whether replication is lagged.

      A new feature in MarkLogic 5 is an application server setting for multi-version concurrency control (by default this is set to contemporaneous - meaning it will run from the latest timestamp that any query has committed - irrespective of whether there are still transactions in-flight).

      Conversely, if nonblocking is chosen (i.e. if you create an application server to query a replica database and you set multi-version concurrency control to nonblocking), the server will choose the last timestamp where all pending transactions are known to have successfully committed.

      If you wish to evaluate a query against a replica database you can use xdmp:database-nonblocking-timestamp() to determine the most current query timestamp that will not block.

      Introduction

      Database Replication replicates fragments/documents from a source database to a target database. You may see different database sizes (even when active fragment counts are then same) between Master and Replica Databases. This article provides overview of variables and reasons behind such observation.

      Database Replication:

      Database Replication operates at the forest level by copying journal frames from a forest in the Master database and replaying them on a corresponding forest in the foreign replica database. In other words, this means that when Journal frames are replayed in the replica database, the same group of documents in a single stand of the master database, does not necessarily reside in the same stand on the replica database - i.e. the distribution of fragments within stands are different between the master and replicas. 

      Also, Note that Master and Replica forests can be distributed differently across hosts in each cluster. Even when they are distributed identically (Master DB forest name to Replica DB forest name) you could still see a different number stand between them.

      Database Size, Deleted Fragment and Merge:

      Current Database Size depends on number of factors like number of documents, index, deleted fragments in Stand etc. Deleted Fragments in any stand itself depends on Merge Policy, Background Merge process, Processing Cycle available, Linux Memory Config, Memory Usage at any given time, and application usage pattern.

      Conclusion:

      Master Cluster and Replica Cluster are separate entities. Although connected, they operate independently. Replica Database on target cluster provides data consistency. However how data can be spread across different stands than the corresponding master, including the retention of deleted fragments, will differ between Master and Replica Cluster. Hence you may see different sizes between Master and Replica Databases, even where the active fragments are the same.

      Further Reading

      Introduction

      If your MarkLogic Server has it's logging level set to "Debug", it's common to see a chain of 'Detecting' and 'Detected' messages that look like this in your ErrorLogs:

      2015-01-27 11:11:04.407 Debug: Detected indexes for database Documents: ss, fp, fcs, fds, few, fep, sln
      2015-01-27 11:11:04.407 Debug: Detecting compatibility for database Documents
      2015-01-27 11:11:04.407 Debug: Detected compatibility for database Documents

      This message will appear immediately after forests are unmounted and subsequently remounted by MarkLogic Server. Detecting indexes is a relatively lightweight operation and usually has minimal impact on performance.

      What would cause the forests to be unmounted and remounted

      • Forest failovers
      • Heavy network activity leading to a cluster (XDQP) "Heartbeat" timeout
      • Changes made to forest configuration or indexes
      • Any incident that may cause a "Hung" message

      Apart from the forest state changes (unmount/mount), this message can also appear due to other events requiring index detection.

      What are "Hung" messages?

      Whenever you see a "Hung" message it's very often indicative of a loss of connection to the IO subsystem (especially the case when forests are mounted on network attached storage rather than local disk). Hung messages are explained in a little more detail in this Knowledgebase article:
      https://help.marklogic.com/Knowledgebase/Article/View/35/0/hung-messages-in-the-errorlog

      What do the "Detected" messages mean and what can I do about them?

      Whenever you see a group of "Detecting" messages:

      2015-01-14 13:06:26.016 Debug: Detecting indexes for database XYZ

      There was an event where MarkLogic chose to (or was required to) attempt to unmount and remount forests (and the event may also be evident in your ErrorLogs).

      The detecting index message will occur soon after a remount, indicating that MarkLogic Server is examining forest data to check whether any reindexing work is required for all databases available to the node which have Forests attached:

      2015-01-14 13:06:26.687 Debug: Detected indexes for database XYZ: ss, wp, fp, fcs, fds, ewp, evp, few, fep

      The line immediately below indicates that the scan has been completed and the database has been identified as having been configured with a number of indexes. For the line above, these are:

      ss
      stemmed searches
      wp
      word positions
      fp
      fast phrase searches
      fcs
      fast case sensitive searches
      fds
      fast diacritic sensitive searches
      ewp
      element word positions
      evp
      element value positions
      few
      fast element word searches
      fep
      fast element phrase searches

      From this list, we are able to determine which indexes were detected.  These messages will occur after every remount if you have index detection set to automatic in the database configuration.

      Every time the forest is remounted, in addition to a recovery process (where the Journals are scanned to ensure that all transactions logged were safely committed to on-disk stands), there are a number of other tests the server will do. These are configured with three options at database level:

      • format compatibility
      • index detection
      • expunge locks

      By default, these three settings are configured with the "automatic" setting (in MarkLogic 7), so if you have logging set to "Debug" level, you'll know that these options are being worked through on remount:

      2015-01-14 13:06:26.016 Debug: Detecting indexes for database XYZ (represents the task for "automatic" index detection where the reindexer checks for configuration changes)
      2015-01-14 13:06:26.687 Debug: Detecting compatibility for database XYZ (represents the task for "automatic" format compatibility where the on-disk stand format is detected)

      These default values may change in accross releases of MarkLogic Server. In MarkLogic 8, expunge locks is set to none but the other two are still set to automatic.

      Can these values be changed safely and what happens if I change these?

      Unmounting / remounting times can be made much shorter by configuring these settings away from automatic but there are some caveats involved; if you need to upgrade to a future release of the product, it's likely that the on-disk stand format may change (it's still 5.0 even when MarkLogic 8 is released) and so setting format compatibility to 5.0 should cause the "Detecting compatibility" messages to disappear and speed up remount times.

      The same is true for disabling index detection but it's important to note that changing index settings on the database will no longer cause the reindexer to perform any checks on remount; in this case you would need to enable this for changes to database index settings to be reindexed.

      Related Reading

      How to handle XDQP-TIMEOUT on a busy cluster

      Summary

      This article will provide steps to debug applications using the Alerting API that are not triggering an alert.

      Details

      1) Check that all required components are present in the database where alerting is setup: config, actions, rules.   Run the attached script 'getalertconfigs.xqy' through the Query Console and review the output.  

      2) As documented in our Search Developer's Guide, Test the alert manually with alert:invoke-matching-actions(). 

      Example:

      alert:invoke-matching-actions("my-alert-config-uri", 
            <doc>hello world</doc>, <options/>)

      3) Use the rule's query to test against the database to check that the expected documents are returned by the query.

      Take the query text from the rule and run it through Query Console using a cts:search() on the database.  This will confirm whether the expected documents are a positive match.  If the documents are returned and no alert is triggered, then further debugging will be needed on the configuration or the query may need to be modified.

      Introduction 

      Division operations involving integer or long datatypes may generate XDMP-DECOVRFLW in MarkLogic 7. This is the expected behavior but it may not be obvious upon initial inspection.  

      For example, similar queries with similar but different input values executed in Query Console on Linux/Mac machine running MarkLogic 7 gives the following results

      1. This query returns correct results

      let $estimate := xs:unsignedLong("220")

      let $total := xs:unsignedLong("1600")

      return $estimate div $total * 100

      ==> 13.75

      2. This query returns the XDMP-DECOVRFLOW Error

       

      let $estimate := xs:unsignedLong("227")

      let $total := xs:unsignedLong("1661")

      return $estimate div $total * 100

      ==> ERROR : XDMP-DECOVRFLW: (err:FOAR0002)

      Details

      The following defines relevant behaviors in MarkLogic 7 and previous releases.

      • In MarkLogic 7, if all the operands involved in div operations are integer, long or integer sub-types in XML, then the resulting value of the div operation are stored as xs:decimal.
      • In versions previous to MarkLogic 7, if an xs:decimal value is large and occupies all digits then it was implicitly cast into an xs:double for further operations - i.e. beginning with MarkLogic, implict casting no longer occurs in this situation .
      • xs:decimal can accomodate 18 digits as a datatype.
      • In MarkLogic 7 on Linux & Mac, xs:decimal can occupy all digits depending upon actual value ( 227 div 1661 = 0.1366646598434677905 ), all 18 digits occupied in xs:decimal
      • MarkLogic 7 on Windows does not perform division with full decimal precision ( 227 div 1661 produces 0.136664659843468 ); as a result, not all 18 digits occupied in xs:decimal
      • MarkLogic 7 will generates Overflow Exception : FOAR0002, when an operation is performed on an xs:decimal that is already at full decimal precision

      In the example above, multiplying the result with 100 gives an error in Linux/Mac, while its OK on Windows.

      Recommendations:

      We recommend xs:double be used for all division related operations in order to explicitly cast resulting value to larger data-type.

      For example: These will return results

      xs:double($estimate) div $total * 100

      $estimate div $total * xs:double(100)

      .

       

       

       

      Context:

      There are options 'maintain last modified' and 'maintain directory last modified' on the Admin UI for a database, which when turned on add properties to every document inserted in the database.  There may be a need to remove all the property fragments of all the documents in the database when the properties no longer need to be retained.

      Problem:

      Turning these options off for a database ensure that properties will not be created for new documents. However, existing document properties will not be removed by turning these settings off.

      Solution:

      To delete existing document properties, the following query can be used:

       

      xdmp:node-delete(xdmp:document-properties(“your-document-uri”))

       

      Please make sure that 'maintain last modified' and 'maintain directory last modified' options are turned off for the database, so that the property fragment does not get recreated for the document.

       

       

      Summary

      Terraform from HashiCorp is a deployment tool that many organizations use to manage their infrastructure as code. It is platform agnostic, allowing for the deployment and configuration of on-site physical infrastructure, as well as cloud infrastructure such as AWS, Azure, VSphere and more.

      Terraform uses the Hashicorp Configuration Language (HCL) to allow for concise descriptions of infrastructure. HCL is JSON compatible language, and was designed to be both human and machine friendly.

      This powerful tool can be used to deploy a MarkLogic Cluster to AWS using the MarkLogic CloudFormation Template. The MarkLogic CloudFormation Template is the preferred method recommended by MarkLogic for building out MarkLogic clusters within AWS.

      Setting Up Terraform

      For the purpose of this example, I will assume that you have already installed Terraform, the AWS CLI and you have configured the credentials. You will also need to have a working directory that has been initialized using terraform init.

      Terraform Providers

      Terraform uses Providers to provide access to different resources. The Provider is responsible for understanding API interactions and exposing resources. The AWS Provider is used to provide access to AWS resources.

      Terraform Resources

      Resources are the most important part of the Terraform language. Resource blocks describe one or more infrastructure objects, like compute instances and virtual networks.

      The aws_cloudformation_stack resource, allows Terraform to create a stack from a CloudFormation template.

      Choosing a Template

      MarkLogic provides two templates for creating a managed cluster in AWS.

      • MarkLogic cluster in new VPC
      • MarkLogic cluster in an existing VPC
      I've chosen to deploy my cluster to an VPC. When deploying to an existing VPC, you will need to gather the VPC ID, as well as the Subnet IDs for the public and private subnets.

      Defining Variables

      The MarkLogic CF Template takes a number of input variables, including the region, availability zones, instance types, EC2 keys, encryption keys, licenses and more. We have to define our variables so they can be used as part of the resource.

      Variables in HCL can be declared in a separate file, which allows for deployment flexibility. For instance, you can create a Development resource and a Production resource, but using different variable files.

      Here is a snippet from our variables file:

      variable "cloudform_resource_name" {
      type = string
      default = "Dev-Cluster-CF"
      }
      variable "stack_name" {
      type = string
      default = "Dev-Cluster"
      }
      variable "ml_version" {
      type = string
      default = "10.0-4"
      }
      variable "availability_zone_names" {
      type = list(string)
      default = ["us-east-1a","us-east-1b","us-east-1c"]
      }
      ...

      In the snippet above, you'll notice that we've defined the availability_zone_names as a list. The MarkLogic CloudFormation template won't take a list as an input, so later we will join the list items into a string for the template to use.

      This also applies to any of the other lists defined in the variable files.

      Using the CloudFormation Resource

      So now we need to define the resource in HCL, that will allow us to deploy a CloudFormation template to create a new stack.

      The first thing we need to do, is tell Terraform which provider we will be using, defining some default options:

          provider "aws" {
          profile = "default"
          #access_key = var.access_key
          secret_key = var.secret_key
          region = var.aws_region
          }

      Next, we need to define the `aws_cloudformation_stack` configuration options, setting the variables that will be passed in when the stack is created:

          resource "aws_cloudformation_stack" "marklogic" {
          name = var.cloudform_resource_name
          capabilities = ["CAPABILITY_IAM"]
      
      
          parameters = {
          IAMRole = var.iam_role
          AdminUser = var.ml_admin_user
          AdminPass = var.ml_admin_password
          Licensee = "My User - Development"
          LicenseKey = "B581-REST-OF-LICENSE-KEY"
          VolumeSize = var.volume_size
          VolumeType = var.volume_type
          VolumeEncryption = var.volume_encryption
          VolumeEncryptionKey = var.volume_encryption_key
          InstanceType = var.instance_type
          SpotPrice = var.spot_price
          KeyName = var.secret_key
          AZ = join(",","${var.avail_zone}")
          LogSNS = var.log_sns
          NumberOfZones = var.number_of_zones
          NodesPerZone = var.nodes_per_zone
          VPC = var.vpc_id
          PublicSubnets = join(",","${var.public_subnets}")
          PrivateSubnets = join(",","${var.private_subnets}")
          }
          template_url = "${var.template_base_url}${var.ml_version}/${var.template_file_name}"
          }

      Deploying the Cluster

      Now that we have defined our variables and our resources, it's time for the actual deployment.

      $> terraform apply

      This will show us the work that Terraform is going to attempt to perform, along with the settings that have been defined so far.

      Once we confirm that things look correct, we can go ahead and apply the resource.

      Now we can check the AWS Console to see our stack

      And we can also use the ELB to login to the Admin UI

      Wrapping Up

      We have now deployed a 3 node cluster to an existing VPC using Terraform. The cluster is now ready to have our Data Hub, or other application installed.

      Deploying MarkLogic in AWS with Ansible

      Summary

      Ansible, owned by Red Hat, is an open source provisioning, configuration and application deployment tool that many organizations use to manage their infrastructure as code. Unlike options such as Chef and Puppet, it is agentless, utilizing SSH to communicate between servers. Ansible also does not need a central host for orchestration, it can run from nearly any server, desktop or laptop. It supports many different platforms and services allowing for the deployment and configuration of on-site physical infrastructure, as well as cloud and virtual infrastructure such as AWS, Azure, VSphere, and more.

      Ansible uses YAML as its configuration management language, making it easier to read than other formats. Ansible also uses Jinja2 for templating to enable dynamic expressions and access to variables.

      Ansible is a flexible tool can be used to deploy a MarkLogic Cluster to AWS using the MarkLogic CloudFormation Template. The MarkLogic CloudFormation Template is the preferred method recommended by MarkLogic for building out MarkLogic clusters within AWS.

      Setting Up Ansible

      For the purpose of this example, I will assume that you have already installed Ansible, the AWS CLI, and the necessary python packages needed for Ansible to talk to AWS. If you need some help getting started, Free Code Camp has a good tutorial on setting up Ansible with AWS.

      Inventory Files

      Ansible uses Inventory files to help determine which servers to perform work on. They can also be used to customize settings to indiviual servers or groups of servers. For our example, we have setup our local system with all the prerequisites, so we need to tell Ansible how to treat the local connections. For this demonstration, here is my inventory, which I've named hosts

      [local]
      localhost              ansible_connection=local

      Ansible Modules

      Ansible modules are discreet units of code that are executed on a target. The target can be the local system, or a remote node. The modules can be executed from the command line, as an ad-hoc command, or as part of a playbook.

      Ansible Playbooks

      Playbooks are Ansible's configuration, deployment and orchestration language. Playbooks are how the power of Ansible, and its modules is extended from basic configuration, or manangment, all the way to complex, multi-tier infrastructure deployments.

      Chosing a Template

      MarkLogic provides two templates for creating a managed cluster in AWS.

      1. MarkLogic cluster in new VPC
      2. MarkLogic cluster in an existing VPC

      I've chosen to deploy my cluster to an VPC. When deploying to an existing VPC, you will need to gather the VPC ID, as well as the Subnet IDs for the public and private subnets.

      Defining Variables

      The MarkLogic CF Template takes a number of input variables, including the region, availability zones, instance types, EC2 keys, encryption keys, licenses and more. We have to define our variables so they can be used as part of the resource.

      Variables in Ansible can be declared in a separate file, which allows for deployment flexibility.

      Here is a snippet from our variables file:

      # vars file for marklogic template and version
      ml_version: '10.0-latest'
      template_file_name: 'mlcluster.template'
      template_base_url: 'https://marklogic-template-releases.s3.amazonaws.com/'

       

      # CF Template Deployment Variables
      aws_region: 'us-east-1'
      stack_name: 'Dev-Cluster-An3'
      IAMRole: 'MarkLogic'
      AdminUser: 'admin'
      ...

      Using the CloudFormation Module

      So now we need to create our playbook, and choose the module that will allow us to deploy a CloudFormation template to create a new stack. The cloudformation module allows us to create a CloudFormation stack.

      Next, we need to define the cloudformation configuration options, setting the variables that will be passed in when the stack is created.

      # Use a template from a URL
      - name: Ansible Test
        hosts: local

       

        vars_files:
          - ml-cluster-vars.yml

       

        tasks:
          - cloudformation:
              stack_name: "{{ stack_name }}"
              state: "present"
              region: "{{ aws_region }}"
              capabilities: "CAPABILITY_IAM"
              disable_rollback: true
              template_url: "{{ template_base_url+ml_version+'/'+ template_file_name }}"
            args:
              template_parameters:
                IAMRole: "{{ IAMRole }}"
                AdminUser: "{{ AdminUser }}"
                AdminPass: "{{ AdminPass }}"
                Licensee: "{{ Licensee }}"
                LicenseKey: "{{ LicenseKey }}"
                KeyName: "{{ KeyName }}"
                VolumeSize: "{{ VolumeSize }}"
                VolumeType: "{{ VolumeType }}"
                VolumeEncryption: "{{ VolumeEncryption }}"
                VolumeEncryptionKey: "{{ VolumeEncryptionKey }}"
                InstanceType: "{{ InstanceType }}"
                SpotPrice: "{{ SpotPrice }}"
                AZ: "{{ AZ | join(', ') }}"
                LogSNS: "{{ LogSNS }}"
                NumberOfZones: "{{ NumberOfZones }}"
                NodesPerZone: "{{ NodesPerZone }}"
                VPC: "{{ VPC }}"
                PrivateSubnets: "{{ PrivateSubnets | join(', ') }}"
                PublicSubnets: "{{ PublicSubnets | join(', ') }}"
              tags:
                Stack: "ansible-test"

      Deploying the cluster

      Now that we have defined our variables created our playbook, it's time for the actual deployment.

      ansible-playbook -i hosts ml-cluster-playbook.yml -vvv

      The -i option allows us to reference the inventory file we created. As the playbook runs, it will output as it starts and finishes tasks in the playbook.

      PLAY [Ansible Test] ************************************************************************************************************

       

      TASK [Gathering Facts] *********************************************************************************************************
      ok: [localhost]

       

      TASK [cloudformation] **********************************************************************************************************
      changed: [localhost]

      When the playbook finishes running, it will print out a recap which shows the overall results of the play.

      PLAY RECAP *********************************************************************************************************************
      localhost                  : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

      This recap tells us that 2 tasks ran successfully, resulted in 1 change, and no failed tasks, which is our sign that things worked.

      If we want to see more information as the playbook runs we can add one of the verbose flags (-vor -vvv) to provide more information about the parameters the script is running, and the results.

      Now we can check the AWS Console to see our stack:

      And we can also use the ELB to login to the Admin UI

      Wrapping Up

      We have now deployed a 3 node cluster to an existing VPC using Ansible. The cluster is now ready to have our Data Hub, or other application installed.  We can now use the git module to get our application code, and deploy our code using ml-gradle.

      Deploying REST API Search/Query Options in DHS

      REST API Query Options Overview

      You can use persistent or dynamic query options to customize your queries. MarkLogic Server comes configured with default query options. You can extend and modify the default options using /config/query/default.

      REST API Search options are defined per Group and App Server. When using ml-gradle, they are typically deployed by putting the files defining the options in the src/main/ml-modules/options directory of your gradle project. By default the options will be deployed to the Group/App Server that gradle is pointing at in the data-hub-MODULES database under /[GroupName]/[App Server]/rest-api/options/[name of file].

      REST API Query Options in DHS

      In DHS, query options are created under the Evaluator Group for the data-hub-FINAL app server. One side effect of the permissions for DHS, is that users will not be able to see the files after they are deployed. The default permissions for the options file are rest-reader-internal and rest-admin-internal, which is not provided to the data-hub roles.

      To check that the search options have been deployed you can use the following curl command:

      • curl --anyauth --user username:password -k -X GET -H "Content-type: application/xml" https://myService.a.marklogicsvc.com:8011/v1/search?options=[myOptions]

      If the options exist, you will get results. If the options do not exist, then you will get a 400 return, with a REST-INVALIDPARAM error.

      Deploying Options to Other App Servers and Groups

      Deploying Options to the Staging App Server

      Using src/main/ml-modules/options will only deploy the options to the Final app server. If you want to deploy the options to the Staging app server, then you will need to define the options under src/main/ml-modules/root/Evaluator/data-hub-STAGING/rest-api/options

      Deploying Options to Other Groups

      If the cluster is configured for auto-scaling, the dynamic e-nodes will belong to either the Analyzer, Curator or Operator group, so the search options will not be available for the dynamic e-nodes.

      To set the options for the app servers in other groups, you will also use src/main/ml-modules/root/[Group Name]/[App Server Name]/rest-api/options

      • src/main/ml-modules/root/Analyzer/data-hub-FINAL/rest-api/options
      • src/main/ml-modules/root/Operator/data-hub-FINAL/rest-api/options
      • ...etc

      When deploying the options files in this way, they get different permissions than when they are deployed vi ml-modules/options. The permissions are rest-extension-user, data-hub-module-reader, data-hub-module-writer, tde-admin, and tde-view, but the permission differences do not appear to make a difference in functionality.

      Deployment Failures

      When options are deployed with the rest of the non-REST modules in ml-modules/root/..., it uses the /v1/documents endpoint, which allows you to set the file permissions.

      When options are deployed from ml-modules/options, it uses the /v1/config/query endpoint, which does not allow you to set the file permissions.

      One effect of this difference is if you attempt to deploy the search options using both ml-modules/options and src/main/ml-modules/root/Evaluator/data-hub-FINAL/rest-api/options you will encounter a SEC-PERMDENIED error and the deployment will fail. If you encounter this error, ensure you aren't attempting to deploy the options in both locations.

      Introduction

      This KB article lists some available tools for continuous integration and automatically deploying the MarkLogic Server

      Deployment Options

      ml-gradle is a gradle plugin that can be used for configuration and application deployments. Application deployments are maintained as projects, which can deployed to any environment - Development, QA, Production, etc.

      The MarkLogic Configuration Management API is a RESTful API that allows retrieving, generating, and applying configurations for MarkLogic clusters, databases, and application servers.

      The MarkLogic The Management API is a REST-based API that allows you to administer MarkLogic Server and access MarkLogic Server instrumentation with no provisioning or set-up. You can use the API to perform administrative tasks such as initializing or extending a cluster; creating databases, forests, and App Servers; and managing tiered storage partitions. The API also provides the ability to easily capture detailed information on MarkLogic Server objects and processes, such as hosts, databases, forests, App Servers, groups, transactions, and requests from a wide variety of tools.

      The MarkLogic Admin APIs provide a flexible toolkit for creating new and managing existing configurations of MarkLogic Server.

      Integration Testing

      MarkLogic Unit Test is a testing component that was originally part of the Roxy project. This component enables you to build unit tests that are written in and can test against both XQuery and Server-side JavaScript.

      Implementation Specific Tools

      CloudFormation Templates

      MarkLogic CloudFormation templates enable you to launch clusters with an Elastic Load Balancer, Elastic Block Storage, Auto Scaling Group, and so on. Your cluster can be in either one Availability Zone or three Availability Zones. Multiple nodes can be placed within each Availability Zone. You can choose whether to deploy to an existing VPC, or a new VPC. The templates can also be used with tools like Terraform and Ansible

      Python

      The MarkLogic Python API aims to provide complete coverage of the capabilities in the MarkLogic REST API in idiomatic Python.

      Jenkins

      Jenkins is often used with MarkLogic Server for building deployable artifacts, staging build artifacts, running automated tests, and deploying said artifacts. Jenkins has great REST endpoints that make it easy to get / put job configurations, and enable / disable jobs from scripts.

      Jenkins provides a driver to the continuous integration / continuous delivery process that can integrate with other tools. In combination with ml-gradle, it can be used to run deploy module/unit test on code check-in.

      One pipeline example used with Jenkins is to:

      1. Pull the code from Git
      2. Deploy to DEV with ml-gradle
      3. Run MarkLogic Unit Test
      4. Email a report of the success/failure
      5. Kick off job to deploy to another environment

      Also noted that the most important best practice here would be to make sure Jenkins runs primarily off of a host other than a MarkLogic host.

      SUMMARY

      This article will help MarkLogic Administrators to monitor the health of their MarkLogic cluster. By studying the attached scripts, you will learn how to find out which hosts are down and which forests have failed over, enabling you to take the necessary recovery actions.

      Initial Setup

      On a separate Linux host (not a member of the cluster), download the file attachments from this article, making sure that they all reside within the same directory.

      Here is a general description of each file:

      cluster-name.conf - Example configuration file used by script. Configures information for monitoring one ML cluster. 

      ml-ck-for-life.sh - A very simple, low-load check that all the nodes of a cluster are up and running.

      ml-ck-for-health.sh - A more detailed check for essential cluster functionality with alerting (paging and/or emails to DBAs) if warranted. This script relies on at least one external XQuery file (mon-report-failed-over-forests.xqy) and makes use of the REST MGMT API as well as REST XQuery requests.

      mon-report-failed-over-forests.xqy - External XQuery file used by ml-ck-for-health.sh

       

      Preparing the CONF File for Use on Your Cluster

      Before running the scripts, the cluster-name.conf needs to be customized for your specific cluster. Start by changing the file name to match the name of your cluster, e.g.,

      $ mv cluster-name.conf some-other-name.conf

      Where "some-other-name" is the actual name of the cluster, or of the application that is hosted on that cluster.

      Next, you will need to customize some of the internal variables inside the CONF file itself. Here is the contents of the cluster-name.conf file, as downloaded:

      CLUSTER_NAME="CLUSTER-NAME"
      CLUSTER_NODES=( node1.my-company.com node2.my-company.com node3.my-company.com )
      # MarkLogic Credentials for the REST Management port - 8002
      USER_PW_MGMT=rest-manager-user:re-manager-password
      # MarkLogic Credentials for the XQuery eval port - 8000
      USER_PW_XQ=user-name:user-password
      UNIX_USER=unix-user-name
      PAGE_ADDRESSES=ml.alert.page@my-company.com
      MAIL_ADDRESSES=ml.alert.mail@my-company.com

      ---------  end of listing ---------

      For CLUSTER_NAME, provide the cluster-name listed in the cluster's /var/log/MarkLogic/clusters.xml file.

      For CLUSTER_NODES, write in the host-names for each node in your cluster.

      For USER_PW_MGMT, provide the user-name and password for the REST MANAGEMENT user, the format is name:password.

      For USER_PW_XQ, provide the user-name and password for the user who will execute the XQuery scripts, the format is name:password.

      The UNIX_USER is a local Unix username with the correct rwx access rights for this directory.

      The PAGE_ADDRESSES & MAIL_ADDRESSES are alert email addresses who will be notified whenever there is a failover event.

      Periodicity

      The script ml-ck-for-health.sh was created with the idea it would be run repeatedly at a certain interval to keep tabs on system health. For example, it can be configured to be invoked with a cron job. A frequency of 5 to 120 minutes is a good candidate range. Ten minutes is a good time if you would like to be woken up (on average) within 5 minutes of a failover event.

      Setting up SSH Passwordless Login

      In monitoring script ml-ck-for-health.sh, section (6) FOREST STATUS CHANGE, requires ssh access to the cluster hosts. That is because this section greps through MarkLogic server ErrorLogs. To enable this part of the script to run without prompting the user, "ssh passwordless login" should be setup between the monitoring host and all the cluster hosts.There are many examples of how to do this on the internet, for example: http://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/ Alternatively, this monitoring section can be commented out.

      Also regarding section (6), the “grep” command is setup up to grep the latest 10 minutes from the ErrorLog. If this script is configured to be run less often then every 10 minutes, the “grep” command line should be adapted to cover the desired period between script runs.

      Example Usage

      You are now ready to execute the failover monitoring scripts! Here is how you would execute them:


      $ ./ml-ck-for-health.sh some-other-name.conf MY-CLUSTER-NAME

      $ ./ml-ck-for-life.sh some-other-name.conf

      [where "some-other-name" and MY-CLUSTER-NAME are your actual CONF and cluster-name, as described above]

      Monitoring Multiple Clusters

      So, given a monitoring machine with a directory of cluster configuration files in the style of cluster-name.conf, those configuration files could be iterated through to monitor a suite of clusters from a single monitoring machine. It should be fairly easy to build a custom shell script to iterate through various cluster CONF files.

      Final thought and Limitations

      Please be aware that the ml-ck-for-health.sh script is only partially implemented. In particular, the Replication Lag and Replication Failure sections are left as exercises for the user.

      This script is being presented as a backup, lowest common denominator monitoring solution. For a more complete solution, you should explore other options, such as Splunk or Nagios.

       

       

       

      Introduction

      According to Wikipedia, DevOps is a set of practices that combines software development (Dev) and IT operations (Ops) with the goal of shortening the Systems Development Lifecycle, and providing continuous delivery with high software quality. This KB will provide some guidance for system deployment and configuration, which can be integrated into an organizations DevOps processes.

      For more information on using MarkLogic as part of a Continuous Integration/Continuous Delivery process, see the KB  Deployment and Continuous Integration Tools.

      Deploying a Cluster

      Deploying a MarkLogic cluster that will act as the target environment for the application code being developed is one piece of the DevOps puzzle. The approach that is chosen will depend on many things, including the tooling already in use by an organization, as well as the infrastructure that will be used for the deployment.  We will cover two of the most common environments, On-Premise and Cloud.

      On-Premise Deployments

      On-Premise deployments, which can include using bare metal servers, or Virtual Machine infrastructure (such as VMware), are one common environment. You can deploy a cluster to an on-premise environment using tools such as shell scripts, or Ansible. In the Scripting Administrative Tasks Guide, there is a section on Scripting Cluster Management, which provides some examples of how a cluster build can be automated.

      Once the cluster is deployed, some of the specific configuration tasks that may need to be performed on the cluster can be done using the Management API.

      Cloud Deployments

      Cloud deployments utilize flexible compute resources provided by vendors such as Amazon Web Services (AWS), or Microsoft Azure.

      For AWS, MarkLogic provides an example CloudFormation template, that can be used to deploy a cluster to Amazon's AWS EC2 Environment. Tools like the AWS Command Line Interface (CLI), Terraform or Ansible can be used to extend the MarkLogic CloudFormation template, and automate the process of creating a cluster in the AWS EC2 environment.  MarkLogic has provided an example , which can be utilized to . The template can be used to deploy a cluster using the AWS CLI. The template can also be used to Deploy a Cluster Using Terraform, or it can be used to Deploy a Cluster Using Ansible.

      For Azure, MarkLogic has provided Solution Templates for Azure which can be extended for automated deployments using the Azure CLI, Terraform or Ansible.

      As with the on-premise deployments, configuration tasks can be performed on the cluster using the Management API

      Summary

      This is just a brief introduction into some aspects of DevOps processes for deploying and configuring a MarkLogic Cluster.

      Summary:

      After adding or removing a forest and correspond replica forest in a database, we have seen instances where the Rebalancer does not properly distribute the documents amongst existing and newly added forests.

      For this particular instance, XDMP-HASHLOCKINGRETRY debug level error message reported repeatedly in the error logs.  The messages would look something like: 

      2016-02-11 18:22:54.044 Debug: Retrying HTTPRequestTask::handleXDBCRequest 83 because XDMP-HASHLOCKINGRETRY: Retry hash locking. Forests config hash does not match.

      2016-02-11 18:22:54.198 Debug: Retrying ForestRebalancerTask::run P_initial_p2_01 50 because XDMP-HASHLOCKINGRETRY: Retry hash locking. Forests config hash does not match.

      Diagnosing

      Gather statistics about the rebalancer to see the number of documents being scheduled. If you run attached script “rebalancer-preview.xqy” in the query console of your MarkLogic Server cluster, it will produce rebalancer statistics in tabular format.

      • Note that you must first change the database name (YourDatabaseNameOnWhichNewForestsHaveBeenAdded) on the 3rd line of the XQuery script “rebalancer-preview.xqy”:

      declare variable $DATABASE as xs:string := xdmp:get-request-field("db", "YourDatabaseNameOnWhichNewForestsHaveBeenAdded");

      If experiencing this issue, the newly added forests will show zero in the “Total to be moved” column in the generated html page.

      Resolving

      Perform a cluster wide restart in order to get past this issue.  The restart is required to reload all of the configuration files across the cluster.  The rebalancer will also check to see if additional rebalancing work needs to occur. The rebalancer should work as expected now and the  XDMP-HASHLOCKINGRETRY messages should no longer appear in the logs. If you run the rebalancer-preview.xqy script again, the statistics should now show the the number of documents being scheduled to be moved.

      You can also validate the rebalancer status from the Database Status page in the Admin UI.

      The XDMP-HASHLOCKINGRETRY rebalancer issue has fixed in the latest MarkLogic Server releases.  However, the rebalancer-preview.xqy script can be used to help diagnose other perceived issues with the Rebalancer.

       

      Search fundamentals

       

      Difference between cts:contains and fn:contains

       1) fn:contains is a substring match, where as cts:contains performs query matching

       2) cts:contains therefore can utilize general queries and stemming, where fn:contains does not

       

      For example:-

       

      Example.xml

      <test>daily running makes you fit</test>

       

      •         fn:contains(fn:doc(“Example.xml”),”ning”)

                True

      •          cts:contains(fn:doc(“Example.xml”),”ning”)

               False

       

         

      •         fn:contains(fn:doc(“Example.xml”),”ran”)

                 False

      •         cts:contains(fn:doc(“Example.xml”),”ran”)

                  True

       

       

      Note:-

      The cts:contains examples are checking the document against cts:word-querys.  Stemming reduces words down to their root, allowing for smaller term lists.

       

      1) Words from different languages are treated differently, and will not stem to the same root word entry from another language.

      2) Note: Nouns will not stem to verbs and vice versa. For example, the word “runner” will not stem to “run”.

      References

      Introduction

      MarkLogic Server provides a variety of  disaster recovery (DR) facilities including full backup, incremental backup, and journal archiving that when combined with other ML features can create a complete disaster recovery strategy. This paper shows some examples of how these features can be combined. It is not comprehensive nor does it reflect features offered only in the latest releases.

      Details

      This article will cover three perspectives. First, a quick overview of the metrics used by businesses to measure the quality of their Disaster Recovery strategies will be covered. Next, an overview of how to combine the features that MarkLogic offers in various categories will be given.

      More?: High Availability and Disaster Recovery features ,  High Availability & Disaster Recovery datasheetScalability, Availability, and Failover Guide 

      Disaster Recovery Criteria

      In order to configure MarkLogic Server to perform well in Disaster Recovery situations, we should first define what parameters we will use to measure each possible approach. For most situations, these four measures are used: 

      Long Term Retention Policy (LTR): Long Term Retention Policy can be driven by any number of business, regulatory and other criteria. It is included here because MarkLogic's backup files are often a key part of an LTR strategy. 

      Recovery Point Objective (RPO)The requirement for how up-to-date the database has to be post-recovery with respect to its state immediately before the incident that required recover.

      Recovery Time Objective (RTO)The requirement for the time elapsed between the incident and the recovery to the RPO.

      CostThe storage cost, the computational resource cost and  the operations cost of the overall deployment strategy.

      Flexible Replication Features

      Flexible replication can be used to support LTR objectives but is generally not useful for Disaster Recovery

      More? Flexible Replication Guide

      Platform Support Features

      Flash backup provides a way to leverage backup features of your deployment platform while maintaining transaction integrity. Platform specific solutions can often achieve RPO and RTO targets that would be impossible through other means.

      More? Flash Backup

      High Availability Features

      Forest replication provides recovery from host failures.

      More? Scalability, Availability, and Failover Guide

      Disaster Recovery Features

      Database Replication

      Database Replication is the process of maintaining copies of forests on databases in multiple MarkLogic Server clusters.

      More? Understanding Database Replication

      Backups

      Of all your backup options, full backups restore the quickest, but take the most time to backup and possibly the most storage space. Each full backup is a backup set in that it contains everything you need to restore to the point of the backup.

      Full backups with journal archiving allow restores to a point after the backup, but the journal archive grows in an unbounded way with the number of transactions, and replaying the journals to get to your recovery point takes time proportional to the number of transactions in the journal archive, so over time, this becomes less efficient.

      With full + incremental backups, a backup set is a full backup, plus the incremental backups taken after that full backup. Incremental backups are quick to backup, but take longer to restore, and over time the backup set gets larger and larger, so it may end up consuming more backup space than a full backup alone (depending on your backup retention policy).

      Full + incremental backups with journal archiving have the same characteristics as incremental backups, except that you can roll forward from the most recent incremental. With this strategy, the journal archive doesn't grow in an unbounded way because the archive is purged when you take the next incremental backup. Note that if your RPO is between incremental backups, you must also enable a merge timestamp by setting the merge timestamp to a negative value (see below).

      More?: Administrator’s Guide to Backing Up and Restoring a Database  How does "point-in-time" recovery work with Journal Archiving? 

      Forest Merge Configurations

      Forest merges recover the disk space occupied by deleted documents. A negative merge timestamp delays that permanent deletion. If we want incremental backups to contain all the fragments that were deleted since the last incremental backup then we want to set the delay to a period greater than the incremental backup period. This requires more disk space for the incremental backups and also requires additional space in the live database, but provides the most flexibility.

      Setting retain-until-backup on a given database (thru the Admin UI or thru an API call) has a similar effect by telling the server to keep the deleted fragments until a full backup or an incremental backup completes. Many clients choose to use both the negative merge timestamp and retain until backup options together.

      More?: admin:database-set-merge-timestamp  admin:database-set-retain-until-backup

      Other Features

      The need for a negative merge timestamp can be understood by remembering that forest merges recover the disk space occupied by deleted documents. A negative merge timestamp delays that permanent deletion. If we want incremental backups to contain all the fragments that were deleted since the last incremental backup then we want to set the delay to a period greater than the incremental backup period. This requires more disk space for the incremental backups and also requires additional space in the live database, but provides the most flexibility.

      Setting retain-until-backup on a given database (thru the Admin UI or thru an API call) has a similar effect by telling the server to  keep the deleted fragments until a full backup or an incremental backup. Many clients choose to use both the negative merge timestamp and retain until backup options together.

      More?: admin:database-set-merge-timestamp,  admin:database-set-retain-until-backup 

      Conclusion

      Planning to meet a Long Term Retention (LTR) policy, a Recovery Point Objective (RPO) and a Recovery Time Objective (RTO) and a Cost goal is a key part of developing an overall MarkLogic deployment plan. MarkLogic offers a wealth of tools that can complement each other when they are properly coordinated. As is clear from this article, the choices are many, broad, and interrelated.

      Regardless of the server version, MLCP does not support concurrent jobs if they are importing from/exporting to the same file.

      In general, MLCP jobs will perform best by maximizing the number of threads in a single MLCP job. Before 10.0-4.2, each MLCP job used 4 threads by default. Starting in 10.0-4.2, each MLCP job now uses the maximum number of threads available on the server as the default thread count (you can read more about this change in the 10.0-4.2 release notes).

      Introduction

      In the more recent versions of MarkLogic Server, there are checks in place to prevent the loading of invalid documents (such as documents with multiple root nodes).  However, documents loaded in earlier versions of MarkLogic Server can now result in duplicate URI or duplicate document errors being reported.

      Additionally, under normal operating conditions, a document/URI is saved in a single forest. If somehow the load process gets compromised, then user may see issues like duplicate URI (i.e. same URI in different forests) and duplicate documents (i.e. same document/URI in same forest).

      Resolution

      If the XDMP-DBDUPURI (duplicate URI) error is encountered, refer to our KB article "Handling XDMP-DBDUPURI errors" for procedures to resolve.

      If one doesn't see XDMP-DBDUPURI errors but running fn:doc() on a document returns multiple nodes then it could be a case of duplicate document in same forest.

      To check that the problem is actually duplicate documents, one can either do an xdmp:describe(fn:doc(...)) or fn:count(fn:doc((...)). If these commands return more than 1 e.g. xdmp:describe(fn:doc("/testdoc.xml")) returns (fn:doc("/testdoc.xml"), fn:doc("/testdoc.xml")) or fn:count(fn:doc("/testdoc.xml")) returns 2 then the problem is of duplicate documents in the same forest (and not duplicate URIs).

      To fix duplicate documents, the document will need to be reloaded.

      Before reloading, you can take a look at the two version to see if there is a difference.  Check fn:doc("/testdoc.xml")[1] versus fn:doc("/testdoc.xml")[2] to see if there is a difference, and which one you want to reload.

      If there is a difference, that may also that may point the operation that created the situation.

      Introduction

      This article talks about effects of case sensitivity of search term on search score and thus on final order of search results for a secondary query which is using cts:boost-query and weight. The case-insensitive word term is treated as the lower case word term, so there can be no difference in the frequencies and scores of results for any-case/case-insensitive search term and lowercase search term with “case-sensitive” option or when neither "case-sensitive" nor "case-insensitive" is present. If neither "case-sensitive" nor "case-insensitive" is present, text of search term is used to determine case sensitivity.

      Understanding relevance score

      In MarkLogic Search results are returned in a relevance order. The most relevant results are first in result sequence and least relevant are last.
      More details on relevance score and its calculation are available at, https://docs.marklogic.com/guide/search-dev/relevance

      Of many ways to control this relevance score one way is to use a secondary query to boost relevance score, https://docs.marklogic.com/guide/search-dev/relevance#id_30927 . This article takes advantage of examples using secondary query to boost relevance scores and impact of text case (upper, lower or unspecifed) of search terms on relevance score on order of results returned.

      A few examples to understand this scenario

      Consider a few scenarios where below mentioned queries are trying to boost certain search results up using cts:boost-query and weight for word "washington" in returned results.

      Example 1: Search with lowercase search term and option for case not specified

      Query1:
      xquery version "1.0-ml";
      declare namespace html = "http://www.w3.org/1999/xhtml";

      for $hit in
      ( cts:search(
      fn:doc()/test,

      cts:boost-query(cts:element-word-query(xs:QName("test"),"George" ),
      cts:element-word-query(xs:QName("test"),"washington",(), 10.0) )
      )
      )

      return element hit {
      attribute score { cts:score($hit) },
      attribute fit { cts:fitness($hit) },
      attribute conf { cts:confidence($hit) },
      $hit
      }


      Results for Query1:
      <hit score="28276" fit="0.9393904" conf="0.2769644">
      <test>Washington, George... </test>
      </hit>
      ...
      ...
      <hit score="16268" fit="0.7125317" conf="0.2100787">
      <test>George washington was the first President of the United States of America...</test>
      </hit>
      ...

       

      Example 2: Search with lowercase search term and case-sensitive option

      Query2:
      xquery version "1.0-ml";
      declare namespace html = "http://www.w3.org/1999/xhtml";

      for $hit in
      ( cts:search(
      fn:doc()/test,

      cts:boost-query(cts:element-word-query(xs:QName("test"),"George" ),
      cts:element-word-query(xs:QName("test"),"washington",("case-sensitive"), 10.0) )
      )
      )

      return element hit {
      attribute score { cts:score($hit) },
      attribute fit { cts:fitness($hit) },
      attribute conf { cts:confidence($hit) },
      $hit
      }


      Results for Query2:
      <hit score="28276" fit="0.9393904" conf="0.2769644">
      <test>Washington, George... </test>
      </hit>
      ...
      ...
      <hit score="16268" fit="0.7125317" conf="0.2100787">
      <test>George washington was the first President of the United States of America...</test>
      </hit>
      ...

       

      Example 3: Search with uppercase search term and option case-insensitive, in cts:boost-query like below with rest of query similar to above queries

      Query3:

      cts:boost-query(cts:element-word-query(xs:QName("test"),"George" ),
      cts:element-word-query(xs:QName("test"),"Washington",("case-insensitive"), 10.0) )

      Results for Query3:
      <hit score="28276" fit="0.9393904" conf="0.2769644">
      <test>Washington, George... </test>
      </hit>
      ...
      ...
      <hit score="16268" fit="0.7125317" conf="0.2100787">
      <test>George washington was the first President of the United States of America...</test>
      </hit>
      ...


      Clearly above queries are producing the same scores with same fitness and confidence scores. This is because the case-insensitive word term is treated as the lower case word term, so there can therefore be no difference in the frequencies of those two terms (any-case/case-insensitive and lowercase/case-sensitive), and therefore no difference in scoring. Thus no difference in scores of results for Query3 and Query2.
      And for cases where case sensitivity is not specified, text of search term is used to determine case sensitivity. For Query3 text of search term contains no uppercase hence it treated as "case-insensitive".

       

      Now let us now take look at a query with a word with uppercase and case-sensitive option in query.

      Example 4: Search with uppercase search term and option case-sensitive, in cts:boost-query like below with rest of query similar to above queries

      Query4:

      cts:boost-query(cts:element-word-query(xs:QName("test"),"George" ),
      cts:element-word-query(xs:QName("test"),"Washington",("case-sensitive"), 10.0) )

      Results for Query4:
      <hit score="44893" fit="0.9172696" conf="0.3489831">
      <test>Washington, George was the first... </test>
      </hit>
      ...
      ...
      <hit score="256" fit="0.0692672" conf="0.0263533">
      <test>George washington was the first President of the United States of America...</test>
      </hit>
      ...

       

      As we can clearly see the scores are changed for results for Query4 and thus final order of results is also updated.


      Conclusion:

      While using a secondary query having cts:boost-query and weight, to boost certain search results up, it is important to understand the impact of case of search text on result sequence. A case-insensitive word term is treated as the lower case word term, so there can therefore be no difference in the frequencies of any-case/case-insensitive and lowercase/case-sensitive search terms, and therefore no difference in scoring. For search term with upper case alphabets in text and with “case-sensitive” option scores are boosted up as expected in comparison with a “case-insensitive search”. If neither "case-sensitive" nor "case-insensitive" is present, text of search term is used to determine case sensitivity. If text of search term contains no uppercase, it specifies "case-insensitive". If text of search term contains uppercase, it specifies "case-sensitive".

       

      Background

      MarkLogic Server includes element level security (ELS), an addition to the security model that allows you to specify security rules on specific elements within documents. Using ELS, parts of a document may be concealed from users who do not have the appropriate roles to view them. ELS can conceal the XML element (along with properties and attributes) or JSON property so that it does not appear in any searches, query plans, or indexes - unless accessed by a user with appropriate permissions.

      ELS protects XML elements or JSON properties in a document using a protected path, where the path to an element or property within the document is protected so that only roles belonging to a specific query roleset can view the contents of that element or property. You specify that an element is part of a protected path by adding the path to the Security database. You also then add the appropriate role to a query roleset, which is also added to the Security database.

      ELS uses query rolesets to determine which elements will appear in query results. If a query roleset does not exist with the associated role that has permissions on the path, the role cannot view the contents of that path.

      Notes:

      1. A user with admin privileges can access documents with protected elements by using fn:doc to retrieve documents (instead of using a query). However, to see protected elements as part of query results, even a user with admin privileges will need to have the appropriate role(s).
      2. ELS applies to both XML elements and JSON properties; so unless spelled out explicitly, 'element' refers to both XML elements and JSON properties throughout this article.

      You can read more about how to configure Element Level Security here, and can see how this all works at this Element Level Security Example.

      Node-update

      One of the commonly used document level capabilities is 'update'. Be aware, however, that document level update is too powerful to be used with ELS permissions as someone with document level update privileges could update not only a node, but also delete the whole document. Consequently, a new document-level capability - 'node-update' - has been introduced. 'node-update' offers finer control when combined with ELS through xdmp:node-replace and xdmp:node-delete functions as they can be used to update/delete only the specified nodes of a document (and not the document itself in its entirety).

      Document-level vs Element-level security

      Unlike at the document-level:

      • 'update' and 'node-update' capabilities are equivalent at the element-level. However, at the document-level, if a user only has a 'node-update' privilege to a document, you cannot delete the document. In contrast, 'update' privileges allows that user to delete the document
      • 'Read', 'insert' and 'update' are checked separately at the element level i.e.:
        • read operations - only permissions with 'read' capability are checked
        • node update operations - only permissions with 'node-update' (update) capability are checked
        • node insert operations - only permissions with  'insert' capability are checked

      Note: read, insert, update and node-update can all be used at the element-level i.e., they can be part of the protected path definition.

      Permissions:

      Document-level:

      1. update: A node can be updated by any user that has an 'update' capability at the document-level
      2. node-update:  A node can be updated by any user with a 'node-update' capability as long as they have sufficient privileges at the element-level

      Element-level:

      1. If a node is protected but no 'update/node-update' capabilities are explicitly granted to any user, that node can be updated by any user as long as they have 'update/node-update' capabilities at the document-level
      2. If any user is explicitly granted 'update/node-update' capabilities to that node at the element level, only that specific user is allowed to update/delete that node. Other users who are expected to have that capability must be explicitly granted that permission at the element level

      How does node-replace/node-delete work?

      When a node-replace/node-delete is called on a specific node:

      1. The user trying to update that node must have at least a 'node-update' (or 'update') capability to all the nodes up until (and including) the root node
      2. None of the descendant nodes of the node being replaced/deleted can be protected by a different roles. If they are protected:
        1. 'node-delete' isn’t allowed as deleting this node would also delete the descendant node which is supposed to be protected
        2. 'node-replace' can be used to update the value (text node) of the node but replacing the node itself isn’t allowed

      Note: If a caller has the 'update' capability at the document level, there is no need to do element-level permission checks since such a caller can delete or overwrite the whole document anyway.

      Takeaways:

      1. 'node-update' was introduced to offer finer control with ELS, in contrast to the document level 'update'
      2. 'update' and 'node-update' permissions behave the same at element-level, but differently at the document-level
        1. At document-level, 'update' is more powerful as it gives the user the permission to delete the entire document
        2. All permissions talk to each other at document-level. In contrast, permissions are checked independently at the element-level
          1. At the document level, an update permission allows you to read, insert to and update the document
          2. At the element level, however, read, insert and update (node-update) are checked separately
            1. For read operations, only permissions with the read capability are checked
            2. For node update operations, only permissions with the node-update capability are checked
            3. For node insert operations, only permissions with the insert capability are checked (this is true even when compartments are used).
      3. Can I use ELS without document level security (DLS)?
        1. ELS cannot be used without DLS
        2. Consider DLS the outer layer of defense, whereas ELS is the inner layer - you cannot get to the inner layer without passing through the outer layer
      4. When to use DLS vs ELS?
        1. ELS offers finer control on the nodes of a document and whether to use it or not depends on your use-case. We recommend not using ELS unless it is absolutely necessary as its usage comes with serious performance implications
        2. In contrast, DLS offers better performance and works better at scale - but is not an ideal choice when you need finer control as it doesn’t allow node-level operations 
      5. How does ELS performance scale with respect to different operations?
        1. Ingestion - depends on the number of protected paths
          1. During ingestion, the server inspects every node for ELS to do a hash lookup against the names of the last steps from all protected paths
          2. For every protected path that matches the hash, the server does a full test of the node against the path - the higher the number of protected paths, the higher the performance penalty
          3. While the hash lookup is very fast, the full test it comparatively much slower - and the corresponding performance penalty increases when there are a large number of nodes that match the last steps of the protected paths
            1. Consequently, we strongly recommend avoiding the use of wildcards at the leaf-level in protected paths
            2. For example: /foo/bar/* has a huge performance penalty compared to /foo/*/bar
        2. Updates - as with ingestion, ELS performance depends on the number of protected paths
        3. Query/Search - in contrast to ELS ingestion or update, ELS query performance depends on the number of query rolesets
          1. Because ELS query performance depends on the number of query rolesets, the concept of Protected PathSet was introduced in 9.0-4
          2. A Protected PathSet allows OR relationships between permissions on multiple protected paths that cover the same element
          3. Because query performance depends on the number of relevant query rolesets, it is highly recommended to use helper functions to obtain the query rolesets of nodes configured with element-level security

      Further Reading

      Introduction

      Some customers have reported problems when attempting to access the Configuration Manager application. In the past, this has been attributed to part of the upgrade process failing for some reason (for example: a port required by MarkLogic already being used) or in some cases it was due to a default databases being removed by the customer at some previous stage.

      XDMP-ARGTYPE Error

      If you see this error when you attempt to access the Configuration Manager:

      500 Internal Server Error XDMP-ARGTYPE XDMP-ARGTYPE: (err:XPTY0004) fn:concat( "could not initialize management plugins with scope: ", $reut:PLUGIN-SCOPE, ": ", xdmp:quote($e)) -- arg1 is not of type xs:anyAtomicType?

      Resolving the error

      Ensure you have an Extensions database configured by doing the following:

      • Log into the MarkLogic Admin interface on port 8001 - http://[your-host]:8001/
      • Under "Databases" box, ensure a database called Extensions is listed

      If it does not exist, download and run the script attached to this article (create-extensions-db.xqy).

      Summary

      Does MarkLogic provide encryption at rest?

      MarkLogic 9

      MarkLogic 9 introduces the ability to encrypt 'data at rest' - data that is on media (on disk or in the cloud), as opposed to data that is being used in a process. Encryption can be applied to newly created files, configuration files, or log files. Existing data files can be encrypted by triggering a merge or re-index of the data.

      For more information about using Encryption at Rest, see Encryption at Rest in the MarkLogic Security Guide.

      MarkLogic 8 and Earlier releases

      MarkLogic 8 does not provide support for encryption at rest for its own forests.

      Memory consumption

      Memory consumption patterns will be different when encryption is used:

      • To access unencrypted forest data MarkLogic normally uses memory-mapped files. When files are encrypted, MarkLogic instead decrypts the entire index to anonymous memory.
      • As a result, encrypted MarkLogic forests use more anonymous memory and less file-mapped memory than unencrypted forests.  
      • Without encryption at rest, when available memory is low, the operating system can throw out file pages from the working set and later page them in directly from files.  But with encryption at rest, when memory is low, the operating system must write them to swap.

      Using Amazon S3 Encryption For Backups

      If you are hosting your data locally, would like to back up to S3 remotely, and your goal is that there cannot possibly exist unencrypted copies of your data outside your local environment, then you could backup locally and store the backups to S3 with AWS Client-Side encryption. MarkLogic does not support AWS Client-Side encryption, so this would need to be a solution outside MarkLogic.

      See also: MarkLogic documentation: S3 Storage.

      See also: AWS: Protecting Data Using Encryption.

      Introduction

      Here we compare XDBC servers and the Enhanced HTTP server in MarkLogic 8.

      Details

      XDBC servers are still fully supported in MarkLogic Server version 8. You can upgrade existing XDBC servers without making any changes and you can create new XDBC servers as you did in previous releases.

      The Enhanced HTTP Server is an additional feature on HTTP servers which is protocol and binary transport compatible with XCC clients, as long as you use the xcc.httpcompliant=true system property.

      The XCC protocol is actually just HTTP, but the details of how to handle body, headers, responses, etc., are "built in" to the XCC client libraries and the XDBC server. The HTTP server in MarkLogic 8 now shares the same low-level code and can dispatch XCC-like requests.

      Introduction

      This article talks about best practices for use of external proxies vs using rewriter rules in the Enhanced HTTP server.

      Details

      Whether to use external proxies versus using rewriter rules in the Enhanced HTTP application server is an application design tradeoff not dissimilar to using a single HTTP application server and a XQuery rewriter or endpoint that can dynamically dispatch to different databases and modules (using eval-in).  The Enhanced HTTP server does this type of dispatching much more efficiently, but the concept is similar, with the same pros and cons.

      It is mostly an application and business management issue—by sharing the same port you share the same server configuration (authentication, server settings) and the "outside world" only sees one port, so configuring port-based security on firewalls, routers, or load balancers is more difficult.

      Summary

      A forest reindex timeout error may occur when there are transactions holding update locks on documents for an extended period of time. A reindexer process is started as a result of a database index change or a major MarkLogic Server upgrade.  The reindexer process will not complete until after update locks are released.

      Example error text seen in the MarkLogic Server ErrorLog.txt file:

      XDMP-FORESTERR: Error in reindex of forest Documents: SVC-EXTIME: Time limit exceeded

      Detail

      Long running transactions can occur if MarkLogic Server is participating in a distributed transaction environment. In this case transactions are managed through a Resource Manager. Each transaction is executed in a two phase commit. In the first phase, the transaction will be prepared for a commit or a rollback. The actual commit or rollback will occur in the second phase. More details about XA transactions can be found in the Applicactions Developer Guide - Understanding Transactions in MarkLogic Server

      In a situation where the Resource Manager get's disconnected between the two phases, all transactions may be left in a "prepare" state within MarkLogic Server. The Resource Manager maintains transaction information and will clean up transactions left in "prepare" state after a successful reconnect. In the rare case where this doesn't happen, all transactions left in "prepare" state will stay in the system until they are cleaned up manually. The method to manually intervene is described in the XCC Developers Guide - Heuristically Completing a Stalled Transaction.

      In order for a XA transaction to take place, it needs to prepare the execution for the commit. If updates are being made to pre-existing documents, update locks are held against the URIs for those documents. When reindexing is occuring during this process, the reindexer will wait for these locks to be released before it can successfully reindex the new documents.   Because the reindexer is unable to complete due to these pending XA transactions, the hosts in the cluster are unable to completely finish the reindexing task and will eventually throw a timeout error.

      Mitigation

      To avoid these kind of reindexer timeouts, it is recommended that the database is checked for outstanding XA transactions in "prepare" state before starting a reindexing process. There are two ways to verify if the database has outstanding transactions in "prepare" state:

      • In the Admin UI, navigate  to each forest of the database and review the status page; or
      • Run the following XQuery code (in Query Console):

        xquery version "1.0-ml"; 
        declare namespace fo = "http://marklogic.com/xdmp/status/forest";   

        for $f in xdmp:database-forests(xdmp:database()) 
        return    
          xdmp:forest-status($f)//fo:transaction-coordinator[fo:decision-state = 'prepare']

      In the case where there are transactions in the "prepare" state, a roll-back can be executed:

      • In the Admin UI, click on the "rollback" link for each transaction; or
      • Run the following XQuery code (in Query Console):

        xquery version "1.0-ml"; 
        declare namespace fo = "http://marklogic.com/xdmp/status/forest";

        for $f in xdmp:database-forests(xdmp:database()) 
        return    
          for $id in xdmp:forest-status($f)//fo:transaction-coordinator[fo:decision-state = 'prepare']/fo:transaction-id/fn:string()
          return
            xdmp:xa-complete($f, $id, fn:false(), fn:false())

      Introduction

      Query Console is an interactive web-based query development tool for writing and executing ad-hoc queries in XQuery, Server-Side JavaScript, SQL and SPARQL. Query Console enables you to quickly test code snippets, debug problems, profile queries, and run administrative XQuery scripts.  Query Console uses workspaces to assist users with organizing queries.  A user can have multiple workspaces, and each workspace can have multiple queries.

      Issue

      In MarkLogic Server v9.0-11, v10.0-3 and earlier releases, users may experience delays, lag or latency between when a key is pressed on the keyboard, and when it appears in the Query Console query window.  This typically happens when there are a large number of queries in one of the users workspaces.

      Workaround

      A workaround to improve performance is to reduce the number of queries in each workspace.  The same number of queries can be managed by increasing the number of workspaces and reducing the number of queries in each workspace.  We suggest keeping no more than 30 queries in a workspace to avoid these latency issues.  

      The MarkLogic Development team is looking to improve the performance of Query Console, but at the time of this writing, this performance issue has not yet been resolved. 

      Further Reading

      Query Console User Guide

      Introduction

      Users of Java based batch processing applications, such as CoRB, XQSync, mlcp and the hadoop connector may have seen an error message containing "Premature EOF, partial header line read". Depending on how exceptions are managed, this may cause the Java application to exit with a stacktrace or to simply output the exception (and trace) into a log and continue.

      What does it mean?

      The premature EOF exception generally occurs in situations where a connection to a particular application server connection was lost while the XCC driver was in the process of reading a result set. This can happen in a few possible scenarios:

      • The host became unavailable due to a hardware issue, segfault or similar issue;
      • The query timeout expired (although this is much more likely to yield an XDMP-EXTIME exception with a "Time limit exceeded" message);
      • Network interruption - a possible indicator of a network reliability problem such as a misconfigured load balancer or a fault in some other network hardware.

      What does the full error message look like?

      An example:

      INFO: completed 5063408/14048060, 103 tps, 32 active threads
       Feb 14, 2013 7:04:19 AM com.marklogic.developer.SimpleLogger logException
       SEVERE: fatal error
       com.marklogic.xcc.exceptions.ServerConnectionException: Error parsing HTTP
       headers: Premature EOF, partial header line read: ''
       [Session: user=admin, cb={default} [ContentSource: user=admin,
       cb={none} [provider: address=localhost/127.0.0.1:8223, pool=0/64]]]
       [Client: XCC/4.2-8]
       at
       com.marklogic.xcc.impl.handlers.AbstractRequestController.runRequest(AbstractRequestController.java:116)
       at com.marklogic.xcc.impl.SessionImpl.submitRequest(SessionImpl.java:268)
       at com.marklogic.developer.corb.Transform.call(Unknown Source)
       at com.marklogic.developer.corb.Transform.call(Unknown Source)
       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
       at java.util.concurrent.FutureTask.run(FutureTask.java:166)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
       at java.util.concurrent.FutureTask.run(FutureTask.java:166)
       at
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
       at
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
       at java.lang.Thread.run(Thread.java:679)
       Caused by: java.io.IOException: Error parsing HTTP headers: Premature EOF,
       partial header line read: ''
       at com.marklogic.http.HttpHeaders.nextHeaderLine(HttpHeaders.java:283)
       at com.marklogic.http.HttpHeaders.parseResponseHeaders(HttpHeaders.java:248)
       at com.marklogic.http.HttpChannel.parseHeaders(HttpChannel.java:297)
       at com.marklogic.http.HttpChannel.receiveMode(HttpChannel.java:270)
       at com.marklogic.http.HttpChannel.getResponseCode(HttpChannel.java:174)
       at
       com.marklogic.xcc.impl.handlers.EvalRequestController.serverDialog(EvalRequestController.java:68)
       at
       com.marklogic.xcc.impl.handlers.AbstractRequestController.runRequest(AbstractRequestController.java:78)
       ... 11 more
       2013-02-14 07:04:19.271 WARNING [12] (AbstractRequestController.runRequest):
       Cannot obtain connection: Connection refused

      Configuration / Code: things to try when you first see this message

      A possible cause of errors like this may be due to the JVM starting garbage collection and this process taking long enough as to exceed the server timeout setting. If this is the case, try adding the -XX:+UseConcMarkSweepGC java option

      Setting the "keep-alive" value to zero for the affected XDBC application server will disable socket pooling and may help to prevent this condition from arising; with keep-alive set to zero, sockets will not be re-used. With this approach, it is understood that disabling keep-alive should not be expected to have a significant negative impact on performance, although thorough testing is nevertheless advised.

      Summary

      Here we discuss various methods for sharing metering data with Support:  telemetry in MarkLogic 9 and exporting monitoring data.

      Discussion

      Telemetry

      In MarkLogic 9, enabling telemetry collects, encrypts, packages, and sends diagnostic and system-level usage information about MarkLogic clusters, including metering, with minimal impact to performance. Telemetry sends information about your MarkLogic Servers to a protected and secure location where it can be accessed by the MarkLogic Technical Support Team to facilitate troubleshooting and monitor performance.  For more information see Telemetry.

      Meters database

      If telemetry is not enabled, make sure that monitoring history is enabled and data has been collected covering the time of the incident.  See Enabling Monitoring History on a Group for more details.

      Exporting data

      One of the attached scripts can be used in lieu of a Meters database backup. They will provide the raw metering XML files from a defined period of time and can be reloaded into MarkLogic and used with the standard tools.

      exportMeters.xqy

      This XQuery export script needs to be executed in Query Console against the Meters database and will generate zip files stored in the defined folder for the defined period of time.

      Variables for start and end times, batch size, and output directory are set at the top of the script.

      get-raw.sh

      This bash version will use MLCP to perform a similar export but requires an XDBC server and MLCP installed. By default the script creates output in a subdirectory called meters-export. See the attached script for details. An example command line is

      ./get-raw.sh localhost admin admin "2018-04-12T00:00:00" "2018-04-14T00:00:00"

      Backup of Meters database

      A backup of the full Meters database will provide all the available raw data and is very useful, but is often very large and difficult to transfer, so an export of a defined time range is often requested.

      The best available way for exporting triples from MarkLogic at the moment is via the /v1/rows (POST) endpoint. You can  make an HTTP POST call to this endpoint to which you would attach an op.fromSPARQL query to return the desired set of triples and select the output format (Accept request header or row-format? url parameter) of your choice.

      • Sample op.fromSPARQL payload:
        • op.fromSPARQL('SELECT * FROM <collection_name> WHERE {?s ?p ?o.}')
      • Sample curl POST command:
        • curl --anyauth --user <user_name>:<password> -i -X POST  -H "Content-type: application/vnd.marklogic.querydsl+javascript" -H "Accept: <output_type>" http://<host_name>:8000/v1/rows -d @./<payload_file_name>

      Alternative ways to export triples:

      • MLCP currently doesn’t offer a way to export triples but if you are okay with exporting them as XML files (through a collection name - for managed triples, graph name can be used as a collection name), you can do so by exporting those documents as files through MLCP
        • Note:  If you are working with embedded/unmanaged triples, there is a possibility of the resulting XML files consisting of XML elements that are not triples if you go with this alternative
      • You can also use the /v1/graphs endpoint to export triples but this endpoint only returns managed triples and if you need to export unmanaged triples, this is not an option
        • Note: This is not efficient in terms of performance when working with large sets of data

      Further reading:

      Introduction

      Within a MarkLogic deployment, there can be multiple primary and replica objects. Those objects can be forests in a database, databases in a cluster, nodes in a cluster, and even clusters in a deployment. This article walks through several examples to clarify how all these objects hang together.

      Shared-disk vs. Local-disk failover

      Shared-disk failover requires a shared filesystem visible to all hosts in a cluster, and involves one copy of a data forest, managed by either its primary host, or its failover host (so forest1, assigned to host1, failing over to host2).

      Local-disk failover involves two copies of data in a primary and local disk failover replica forest (sometimes referred to as an "LDF"). Primary hosts manage primary forests, and failover hosts manage the corresponding synchronized LDF (forest1 on host1, failing over to replicaForest1 on host2).

      Database Replication

      In the same way that you can have multiple copies of data within a cluster (as seen in local-disk failover), you can also have multiple copies of data across clusters as seen in either database replication or flexible replication. Within a replicated environment you'll often see reference to primary/replica databases or primary/replica clusters. So this will often look like forest1 on host1 in cluster1, replicating to forest1 on host1 in cluster2. We can shorten forest names here to c1.h1.f1 and c2.h1.f1. Note that you can have both local disk failover and database replication going at the same time - so on your primary cluster, you'll have c1.h1.f1, as well as failover forest c1.h2.rf1, and your replica cluster will have c2.h1.f1, as well as its own failover forest c2.h2.rf1. All of these forest copies should be synchronized both within a cluster (c1.h1.f1 synced to c1.h2.rf1) and across clusters (c1.h1.f1 synced to c2.h1.f1).

      Configured/Intended vs. Acting

      At this point we've got two clusters, each with at least two nodes, where each node has at least one forest - so four forest copies, total (bear in mind that databases can have dozens or even hundreds of forests - each with their own failover and replication copies). The "configured" or "intended" arrangement is what your deployment looks like by design, when no failover or any other kind of events have occurred that would require one of the other forest copies to serve as the primary forest. Should a failover event occur, c1.h2.rf1 will transition from the intended LDF to the acting primary, and its host c1.h2 will transition from the intended failover host to the acting primary host. At this point, the intended primary forest c1.h1.f1 and its intended primary host c1.h1 will likely both be offline. Failing back is the process of reverting hosts and forests from their acting arrangement (in this case, acting primary forest c1.h2.rf1 and acting primary host c1.h2), back to their intended arrangement (c1.h1.f1 is both intended and acting primary forest, c1.h1 is both intended and acting primary host).

      This distinction between intended vs. acting can even occur at the cluster level, where c1 is the intended/acting primary cluster, and c2 is the intended/acting replica cluster. Should something happen to your primary cluster c1, the intended replica cluster c2 will transition to the acting primary cluster while c1 is offline.

      Takeaways

      • It's possible to have multiple copies of your data in a MarkLogic Server deployment
      • Under normal operations, these copies are synchronized with one another
      • Should failover events occur in a cluster, or catastrophic events occur to an entire cluster, you can shift traffic to the available previously synchronized copies
      • Failing back to your intended configuration is a manual operation
        • Make sure to re-synchronize copies that were offline with online copies
        • Shifting previously offline copies to acting primary before re-synchronization may result in data loss, as offline forests can overwrite updates previously committed to forests serving as acting primaries while the intended primary forests were offline

      Introduction

      To avoid index bloat, MarkLogic only records positions in its indexes for words once for word-query fields. When word positions are necessary to accurately match element-word queries, they are normally used from the word-query field. When elements are excluded from the word query field, words under those elements are not indexed - so their positions are not recorded. In MarkLogic 7.0-5 and 8.0-1, a code change was included to avoid false negatives resulting from an element-word query expecting positions from words in elements descended from excluded elements. This code change was to not use positions from the word-query field for element-word searches if the word-query field has exclusions.

      Implications

      Unfortunately, this solution can sometimes result in false positives - which is captured in 7.0-5 bug #33207 and 8.0-1 bug #32686 (you can read more about both of these bugs in our Fixed Bugs Report). Consequently, a follow-up refinement was shipped in 7.0-5.1 & 8.0-2 to allow for the affected queries to be fully resolveable via indexes. To take advantage of this update, three changes are required:

      1) Upgrade to 7.0-5.1 or later, or 8.0-2 or later

      2) Database index settings must be updated to tell MarkLogic Server to use positions in this scenario and therefore avoid the previously seen false positives. There are two changes that could be made. Either:

      2a. The element in the element-word query must be explicitly included in the word-query field

      ...or:

      2b. All the word-query excluded elements must be configured as phrase-around elements.

      3) After the relevant database index settings are updated and the upgrade has been applied, a reindex must be performed

      If these changes are made, positions in the word-query field should then be used, which should then ultimately result in the elimination of false positives.

      Introduction

       A "fast data directory" is configurable for each forest, and can be set to a directory built on a fast file system, such as one using SSDs. Refer to Using a mix of SSD and spinning drives. If configured MarkLogic Server will try to put as many writes and seeks to the Fast Data Directory (FDD) as it can. As such, it will try to put as many on disk stands as possible onto the FDD. Frequently updated documents tend to reside in the smaller stands and thus are more likely to reside on the FDD.

      This article attempts to explain how you should account for the FDD when sizing disk space for your MarkLogic Server.

      Details

      Forest journals will be placed on the fast data directory. 

      Each time an automatic merge is performed, MarkLogic Server will attempt to save the results onto the forest's fast data directory. If there is not sufficient space on the FDD, MarkLogic Server will use the forest's primary data directory. To preserve space for future small stands, MarkLogic Server is conservative in deciding whether to put the merge destination stands on the FDD, which means that even if there is enough available space, it may store the result to the forests regular data directory. For more details, refer to the fundamental of resource consumption white paper. 

      It is also important to know when the Fast Data Directory is not used: Stands created from a manually triggered merges do not get stored on the fast data directory, but in the forest's primary data directory. Manual merges can be executed by calling the xdmp:merge function or from within the Admin UI; Forest-migrate  and Restoring backups do not put stands in the fast data directory.

      Conclusion

      MarkLogic Server maintains some disk space in the FDD for checkpoints and journaling. However, since the Fast Data Directory is not used in some procedures, we should not count the size of the FDD when sizing the disk space needed for forest data.

      Introduction

      The Performance Considerations section of the Loading Content Into MarkLogic Server documentation states 

      "When you load content, MarkLogic Server performs updates transactionally, locking documents as needed and saving the content to disk in the journal before the transaction commits. By default, all documents are locked during an update and the journal is set to preserve committed transactions, even if the MarkLogic Server process ends unexpectedly."

      There are two types of locking which are specified at the database level:

      • Fast locking employs a hashed locking scheme (based on the URI) where each fragment URI has a designated forest, so the lock created during the insert is restricted only to that forest.
      • Setting up a database with "strict" locking will force the coordination of an update lock across all forests in the database (and across the cluster) until the insert has taken place.

      Fast locking has been the default setting for newly created MarkLogic databases since MarkLogic 5 (released October 2011)

      When should I use strict locking?

      If at any point in your code, you are specifying the forest to insert document or fragment into (using a technique commonly referred to as in-forest evaluation), configuring the setting for that database at "strict" is definitely the safest choice. If your code always allows the server to determine the target forest for the document/fragment, you're perfectly safe using fast locking.

      In the situation where two different people create the same document (with the same URI) and where fast locking was taking place, this would result in:

      • A transaction culminating in an insert into a given forest (as assigned by the ML node servicing the request) for the first fragment
      • An "update" transaction (in the same forest) where the first fragment is then marked as deleted
      • A new fragment takes place of the first fragment to complete the second transaction

      Subsequent merges would then remove the stand entry for the first fragment (now deleted/replaced by the subsequent transaction)

      The fast option would not create a dangerous race condition unless your application would allow two different people to insert a document with the same URI into two different forests as two separate transactions and where URI assignment is handled by your XQuery/application layer; if the code responsible for making those transactions were to inadvertently assign the same URI to two different forests in a cluster, this could cause a problem that strict locking would guard against. If your application always allows MarkLogic to assign the forest for the document, there is no danger whatsoever in keeping to the server default of "fast" locking.

      Additionally - consider what kind of failover you system is using. When using fast journaling with local disk replication, the journal disk write needs to fail on both master and replica nodes in order for data loss to occur - so there's no need for strict in this scenario. In contrast, strict journaling should be used with shared-disk failover, as data loss is possible if using fast journaling and a single node fails before the OS flushes the buffer to disk.

      Is there a performance implication in switching to strict locking?

      Fast locking will be faster than strict locking, but the performance penalty is largely going to be dependent on a number of factors; the number of forests in a given database, the number of nodes across which the database forests are spread and the speed at which all nodes in the cluster can coordinate a transaction across the cluster (Network/IO) will all have some (potentially minimal) impact.

      If the conditions of your application suit, we recommend staying with the default of fast locking on all your databases.

      There may be reasons for using 'strict' locking - especially if you are considering loading documents using in-forest-evaluation in your code.

      Further reading

      https://docs.marklogic.com/guide/ingestion/performance

      Summary

      There are situations where the SVC-DIRREM, SVC-DIROPEN and SVC-FILRD errors occur on backups to an NFS mounted drive. This article explains how this condition can occur and describes a number of recommendations to avoid such errors.

      Under normal operating conditions, with proper mounting options for a remote drive, MarkLogic Server does not expect to report SVC-xxxx errors.  Most likely, these errors are a result of improper nfs disk mounting or other IO issues.

      We will begin by exploring methods to narrow down the server which has the disk issue and then list some things to look into in order to identify the cause.

      Error Log and Sys Log Observation

      The following errors are typical MarkLogic Error Log entries seen during an NFS Backup that indicate an IO subsystem error.   The System Log files may include similar messages.

              Error: SVC-DIRREM: Directory removal error: rmdir '/Backup/directory/path': {OS level error message}

              Error: SVC-DIROPEN: Directory open error: opendir '/Backup/directory/path': {OS level error message}

              Error: Backup of forest 'forest-name' to 'Bakup path' SVC-FILRD: File read error: open '/Backup/directory/path': {OS level error message}

      These SVC- error messages include the {OS level error message} retrieved from the underlying OS platform using generic C runtime strerror() system call.  These messages are typically something like "Stale NFS file handle" or "No such file or directory".

      If only a subset of hosts in the cluster are generating these types of errrors ...

      You should compare the problem host's NFS configuration with rest of the hosts in the cluster to make sure all of the configurations are consistent.

      • Compare nfs versions (rpm -qa | grep -i nfs)
      • Compare nfs configurations (mount -l -t nfs, cat /etc/mtab, nfsstat)
      • Compare platform version (uname -mrs, lsb_release -a) 

      NFS mount options 

      MarkLogic recommends the NFS Mount settings - 'rw,bg,hard,nointr,noac,tcp,vers=3,timeo=300,rsize=32768,wsize=32768,actimeo=0'

      • Vers=3 :  Must have NFS client version v3 or above
      • TCP : NFS must be configured to use TCP instead of default UDP
      • NOAC : To improve performance, NFS clients cache file attributes. Every few seconds, an NFS client checks the server's version of each file's attributes for updates. Changes that occur on the server in those small intervals remain undetected until the client checks the server again. The noac option prevents clients from caching file attributes so that applications can more quickly detect file changes on the server.
        • In addition to preventing the client from caching file attributes, the noac option forces application writes to become synchronous so that local changes to a file become visible on the server immediately. That way, other clients can quickly detect recent writes when they check the file's attributes.
        • Using the noac option provides greater cache coherence among NFS clients accessing the same files, but it extracts a significant performance penalty. As such, judicious use of file locking is encouraged instead. The DATA AND METADATA COHERENCE section contains a detailed discussion of these trade-offs.
        • NOTE: The noac option is a combination of the generic option sync, and the NFS-specific option actimeo=0.
      • ACTIME=0 : Using actimeo sets all of acregminacregmaxacdirmin, and acdirmax to the same "0" value. If this option is not specified, the NFS client uses the defaults for each of these options listed above.
      • NOINTR : Selects whether to allow signals to interrupt file operations on this mount point. If neither option is specified (or if nointr is specified), signals do not interrupt NFS file operations. If intr is specified, system calls return EINTR if an in-progress NFS operation is interrupted by a signal.
        • Using the intr option is preferred to using the soft option because it is significantly less likely to result in data corruption.
        • The intr / nointr mount option is deprecated after kernel 2.6.25. Only SIGKILL can interrupt a pending NFS operation on these kernels, and if specified, this mount option is ignored to provide backwards compatibility with older kernels.
      • BG : If the bg option is specified, a timeout or failure causes the mount command to fork a child which continues to attempt to mount the export. The parent immediately returns with a zero exit code. This is known as a "background" mount.
      • HARD (vs soft) : Determines the recovery behavior of the NFS client after an NFS request times out. If neither option is specified (or if the hard option is specified), NFS requests are retried indefinitely. If the soft option is specified, then the NFS client fails an NFS request after retrans retransmissions have been sent, causing the NFS client to return an error to the calling application.
        • Note: A so-called "soft" timeout can cause silent data corruption in certain cases. As such, use the soft option only when client responsiveness is more important than data integrity. Using NFS over TCP or increasing the value of the retrans option may mitigate some of the risks of using the soft option. 

      Issue persists => Further debugging 

      If after checking NFS configuration and after implementing the MarkLogic recommended NFS mount settings, the issue persists, then you will need to debug the NFS connection during an issue period.    You should enable rpcdebug for NFS on the hosts showing the NFS errors, and then analyze the resulting syslogs during a period that is experiencing the issues

              rpcdebug -m nfs -s all

       The resulting logs may give you additional information to help understand what the source of the failures are.

       

      Introduction

      It has long been possible to store binary files in MarkLogic. In the MarkLogic 5 release in 2011, binary support was enhanced to allow for even more control over binary files.

      The purpose of this Knowledgebase article is not to cover MarkLogic's binary support in depth but to demonstrate a technique for retrieving a list of URIs for binary files which are managed in a MarkLogic Database.

      Retrieving a list of binary document URIs from MarkLogic Server

      The following code will use a call to cts:uris to get back a list of all URIs pointing to binary documents for a given MarkLogic database; note that this example assumes that you have the uri lexicon enabled in your database:

      Further reading

      People often want fine-grained entitlement control in the applications they build on top of MarkLogic Server. This article discusses two options and their performance implications.

      Best Practice

      Often, we'll see people attempt an implementation using MarkLogic users and roles. While MarkLogic Server can easily handle a large number of roles in total, you'll run into scalability and performance issues if you have a large number of roles per user. Additionally, you'll want to minimize the number of updates to documents in your Security database as every update requires Security caches to be re-validated, thus incurring a performance penalty.

      Instead, for a more scalable and performant solution, you will want to build your entitlements into your documents at the application level, then query those entitlement values with element range indexes on the elements containing those entitlement values.

      Summary

      When attempting to start MarkLogic Server on older versions of Linux (Non-supported platforms), a "Floating Point Exception" may prevent the server from starting.

      Example of the error text from system messages:

      kernel: MarkLogic[29472] trap divide error rip:2ae0d9eaa80f rsp:7fffd8ae7690 error:0

      Detail

      Older Linux kernels will, by default, utilize older libraries.  When a software product such as MarkLogic Server is built using a newer version of gcc, it is possible that it will fail to execute correctly on older systems.  We have seen it in cases where the glibc library is out of date, and not containing certain symbols that were added in newer versions. Refer to the RedHat bug that explains this issue: https://bugzilla.redhat.com/show_bug.cgi?id=482848

      The recommended solution is to upgrade to a newer version of your Linux distribution.  While you may be able to resolve the immediate issue by only upgrading the glibc library, it is not recommended.

      Introduction

      Attached to this article is an XQuery module: "appserver-status.xqy", which will generate a report on all requests currently "in-flight" across all application servers in your cluster

      Usage

      Run this in Query Console (be sure to display results as html output), it will generate an html table showing all requests currently "in-flight" across all application servers in your cluster. For any transaction taking over 60 seconds, it provides extra detail to help understand and identify bottlenecks where specific modules (or tasks) may be having an adverse effect on the overall performance of the cluster.

      The information generated by this module can be used in conjunction with any ticket opened with the support team where assistance is required to better understand and resolve performance issues relating to specific modules. This module could also be used in a situation where DBAs want to perform routine health checks on their cluster to find and identify slow running queries.

      Introduction

      At the time of this writing (MarkLogic 9), MarkLogic Server cannot perform spherical queries, as the geospatial indexes do not support a true 3D coordinate system.  In situations where cylindrical queries are sufficient, you can create a 2D geospatial index and a separate range index on an altitude value. An "and-query" with these indexes would result in a cylindrical query.

      Example

      Consider the following sample document structure:

      Configure these 2 indexes for your content database:

      1. Geospatial Element Pair index specifying latitude localname as ‘lat’ , longitude localname ‘long’ and ‘parent localname’ as ‘location’ in configuration
      2. Range element index with localname as ‘alt’ with int scalar type

      Assuming you have data in your content database matching above document structure, this query:

      will return all the documents with location i.e., points falling in the cylinder with center at 37.655983, -122.425525 having a radius of 1000 miles and with an altitude of less than 4 miles.

      Note that in MarkLogic Server 9 geospatial region match was introduced, so the above technique can be extended beyond cylinders.

      Introduction

      The MarkLogic Monitoring History dashboard (http://localhost:8002/history/) is probably the easiest way to gather monitoring history data, but almost all of this information available within the monitoring dashboard is also available over our ReST APIs:

      Application Server Status details

      Information on Application Severs can be found at https://docs.marklogic.com/REST/GET/manage/v2/servers and here's an example for getting detailed metrics - http://localhost:8002/manage/v2/servers?group-id=Default&view=metrics&format=xml

      For Application Server status information - https://docs.marklogic.com/REST/GET/manage/v2/servers@view=status and here's an example with detailed metrics http://localhost:8002/manage/v2/servers?view=status&group-id=Default&format=xml&fullrefs=true

      To access status information for a specific Application Server (for example, the TaskServer), you can get the current status by adding the name to the URI - http://localhost:8002/manage/v2/servers/TaskServer?group-id=Default&view=status&format=xml

      You can also get the configuration information for a given application server (for example: "Admin") over the ReST API - http://localhost:8002/manage/v2/servers/Admin/properties?group-id=Default&format=xml

      Database and Forest status details

      For databases and forests, you can similarly use the endpoints for /databases or /forests:

      Database level examples include:

      Forest level examples include:

      MarkLogic default Group Level Cache and Huge Pages settings

      The table below shows the default (and recommended) group level cache settings based on a few common RAM configurations for the 9.0-9.1 release of MarkLogic Server:

      Total RAM List Cache Compressed Tree Cache Expanded Tree Cache Triple Cache Triple Value Cache Default Huge Page Ranges
      8192 (8GB) 1024 (1 partition) 512 (1 partition) 1024 (1 partition) 512 (1 partition) 1024 (2 partitions) 1280 to 1994
      16384 (16GB) 2048 (1 partition) 1024 (2 partitions) 2048 (1 partition) 1024 (2 partitions) 2048 (2 partitions) 2560 to 3616
      24576 (24GB) 3072 (1 partition) 1536 (2 partitions) 3072 (1 partition) 1536 (2 partitions) 3072 (4 partitions) 3840 to 4896
      32768 (32GB) 4096 (2 partitions) 2048 (3 partitions) 4096 (2 partitions) 2048 (3 partitions) 4096 (6 partitions) 5120 to 6176
      49152 (48GB) 6144 (2 partitions) 3072 (4 partitions) 6144 (2 partitions) 3072 (4 partitions) 6144 (8 partitions) 7680 to 8736
      65536 (64GB) 8064 (3 partitions) 4032 (6 partitions) 8064 (3 partitions) 4096 (6 partitions) 8192 (11 partitions) 10080 to 11136
      98304 (96GB) 12160 (4 partitions) 6080 (8 partitions) 12160 (4 partitions) 6144 (8 partitions) 12160 (16 partitions) 15200 to 16256
      131072 (128GB) 16384 (6 partitions) 8192 (11 partitions) 16384 (6 partitions) 8192 (11 partitions) 16384 (22 partitions) 20480 to 21020
      147456 (144GB) 18432 (6 partitions) 9216 (12 partitions) 18432 (6 partitions) 9216 (12 partitions) 18432 (24 partitions)

      23040 to 24096

      262144 (256GB) 32768 (9 partitions) 16384 (11 partitions) 32768 (9 partitions) 16128 (22 partitions) 32256 (32 partitions)

      40320 to 42432

      524288 (512GB)

      65536 (22 partitions) 32768 (32 partitions) 65536 (32 partitions) 32768 (32 partitions) 65536 (32 partitions)

      81920 to 82460

      Note that these values are safe to use for MarkLogic 7 and above.

      For all the databases that ship with MarkLogic Server, the Huge Pages ranges on this table will cover the out-of-the box configuration. Note that adding more forests will cause the second value in the range to increase.

      From MarkLogic Server 9.0-7 and above

      In the 9.0-7 release and above (and all versions of MarkLogic 10), automatic cache sizing was introduced; this setting is usually recommended.

      Note: For RAM size greater than 256GB, group cache settings are configured the same as for 256GB with automatic cache sizing. These can be changed using manual cache sizing.

      Maximum group level cache settings

      Assuming a Server configured with 256GB RAM (and above), these are the maximum sizes for the three main group level caches and will utilise 180GB (184320MB) per host for the Group Level Caches:

      • Expanded Tree Cache - 73728 (72GB) (with 9 8GB partitions)
      • List Cache - 73728 (72GB) (with 9 8GB partitions)
      • Compressed Tree Cache - 36864 (36GB) (with 11 3 GB partitions)

      We have found that configuring 4GB partitions for the Expanded Tree Cache and the List Cache generally works well in most cases; for this you would set the number of partitions to 18

      For the Compressed Tree Cache the number of partitions can be set to 22.

      Important note

      The maximum number of configurable partitions is 32

      Each cache partition should be no more than 8192 MB

      Introduction

      Understanding what are the timeout/time-limit configuration options offered by MarkLogic is important when working with queries. There are a set of options that can be configured at group-level and another set of options that can be configured at app-server-level. One can find an extensive list of what those options are and what each option is used for in the documentation (links at the end), but in this article, we will be discussing two of the important timeout configuration options and how they work with each other - Retry timeout and Default time limit.

      Quick Overview

      The retry-timeout is the time, in seconds, before a MarkLogic Server stops retrying a request whereas the default-time-limit is the default value for any request’s time limit, when otherwise specified.

      A deeper dive

      To elaborate on that, the "retry timeout" (group-level setting) is the total time the server will spend waiting to retry (not executing the request itself), so if a request fails in one millisecond with a retryable error (and in general a retry happens every 2 secs), we end up retrying roughly 90 times if the retry-timeout value is set to the default value 180 (90*2 secs = 180). The "default-time-limit" (appserver-level setting), on the other hand, is for each retry i.e, for each retry, the request time gets reset to 0 secs with the default-time-limit which means it is supposed to timeout at the end of whatever value the default-time-limit is configured with. However, if the request fails with a retryable error, a retry happens after a wait time of around 2 secs which means the request time gets reset again to 0 secs and will be set to timeout at the end of the default-time-limit.

      The above behavior is better explained in a sequence of events listed below:

      -> start request time

           -> request time is set to 0 (with a limit of "default time limit" value)

      -> request fails with retryable error

      -> wait some time t1 before retrying (which is usually 2sec, in general)

      -> retry

           -> request time is reset to 0

      -> request fails with retryable error

      -> wait some time t2 before retrying

      -> retry

           -> request time is reset to 0

      -> request fails with retryable error

      -> wait some time t3 before retrying

      -> retry

           -> request time is reset to 0

      -> request succeeds

      -> end request time

      where "retry-timeout" value is the limit for all retry times combined (t1 + t2 +t3) and "default-time-limit" value is the limit for each retry.

      For instance, if the retry-timeout is 180 secs and the default-time-limit is 120 secs, the request will still retry until the 180sec retry timeout is met because the 120 sec limit is for each retry and wouldn’t affect the retry timeout. However, a single retry (or the original request) will timeout at 120 secs.

      Further reading

      Introduction

      MarkLogic Server has a notion of groups, which are sets of similarly configured hosts within a cluster.

      Application servers (and their respective ports) are scoped to their parent group.

      Therefore, you need to make sure that the host and its exposed port to which you're trying to connect both exist in the group where the appropriate application server is defined. For example, if you attempt to connect to a host defined in a group made up of d-nodes, you'll only see application servers and ports defined in the d-nodes group. If the application server you actually want is in a different group (say, e-nodes), you'll get a connection error, instead.

      Questions

      Can I use any xdmp builtins to show which application servers are linked to particular groups?

      The code example below should help with this:

      Problem:

      The errors 'XDMP-MODNOTFOUND - Module not found' and 'XDMP-NOPROGRAM - Server unable to build program from request' may occur when the requested module does not exist or the user does not have the right permissions on the module.

      Solution:

      When either of these errors is encountered, the first step would be to check if the requested XQuery/JS module is actually present in the modules database. Make sure the the document uri matches the 'root' of the relevant app-server.

      'Modules' field of the app-server configuration specifies the name of the database in which this app-server locates the application code (if it is not set to 'File-system'). When it is set to a specific database, then only documents in that database whose URI begin with the specified root directory are executable. For example, if 'root'  of the database is set to "/codebase/xquery/", then only documents in the database which start with this uri "/codebase/xquery/" are executable.

      If set to 'File-system' make sure the requested module exists in the location specified in the 'root' directory of the app-server. 

      Defining a 'File-system' location is often used on single node DEV systems but not recommended on a clustered environment. To keep the deployment of code simple it is recommended to use a Modules database in clustered production system.

      Once you made sure that the module does exist, the next step is to check if the user has the right permissions to execute the database. More often, it is likely that the error is caused because of a permissions issue.

      (i) Check app-server privileges

      The 'privilege' field in the app-server configuration, when set, specified the execute privilege required to access the server. Only users who are assigned this privilege can access the server and the application code. Absence of this privilege may cause the XDMP-NOPROGRAM error.

      Make sure the user accessing the app-server has the specified privileges. This can be checked by using sec:user-privileges() (Should be run against the Security database).

      The documentation here - http://docs.marklogic.com/guide/admin/security#id_63953 contains more detailed information about privileges.

      (ii) Check permission on the requested module

      The user trying to access the application code/modules is required to have the 'execute' permission on the module. Make sure all the xquery documents have 'read' and 'execute' permissions for the user trying to access them. This can be verified by executing the following query against your 'modules' database:

                       xdmp:document-get-permissions("/your-module")

      This returns a list of permission on the document - with the capability that each role has, in the below format:

                    <sec:permission xmlns:sec="http://marklogic.com/xdmp/security">
                    <sec:capability>execute</sec:capability>
                    <sec:role-id>4680733917602888045</sec:role-id>
                    </sec:permission>
                    <sec:permission xmlns:sec="http://marklogic.com/xdmp/security">
                    <sec:capability>read</sec:capability>
                    <sec:role-id>4680733917602888045</sec:role-id>
                    </sec:permission>

      You can then map the role-ids to their role names as below: (this should be done against the Security database)

                    import module namespace sec="http://marklogic.com/xdmp/security" at "/MarkLogic/security.xqy";
                    sec:get-role-names((4680733917602888045))

      If you see that the module does not have execute permission for the user, the required permissions can be added as below: (http://docs.marklogic.com/xdmp:document-add-permissions)

                   xdmp:document-add-permissions("/document/uri.xqy",

                    (xdmp:permission("role-name","read"),
                   xdmp:permission("role-name", "execute")))

       

       

           

       

       

       

      Introduction

      Recent exploits in the TLS protocol, such as POODLE, FREAK, LogJam, and SLOTH, have rendered TLSv1.0, TLSv1.1 and SSLv3 largely obsolete.  Additionally, standards councils such as PCI (Payment Card Industry) and NIST (National Institute of Standards & Technology) are moving to disallow the use of these protocols.

      This article will describe the MarkLogic configuration changes needed to harden a MarkLogic HTTP Application Server so that only secure versions of TLSv1.2 are used and where clients attempting to connect with TLSv1.1 or earlier protocols are rejected.

      Configuration

      The TLS protocol versions accepted and the Cipher suites selected are controlled by the specification list set in the "SSL Ciphers" field on the HTTP App Server Configuration panel:

      The format of the specification list follows the OpenSSL format described in the OpenSSL Cipher suite documentation and comprises one or more colon ":" separated ciphers strings which control which cipher suites are enabled or disabled. 

      The default specification used by MarkLogic enables ALL ciphers except those that are considered of LOW encryption and places them in order of @STRENGTH 

      ALL:!LOW:@STRENGTH

      While sufficient for a lot of needs, the default settings still allow for cipher negotiations that are no longer considered secure or weak signature algorithms, such as MD2 and MD5. The following cipher specification string enhances security by only permitting High strength ciphers and disabling weak or vulnerable ciphers.

      EECDH+ECDSA+AESGCM:EECDH+aRSA+AESGCM:EECDH+ECDSA+SHA384:EECDH+ECDSA+SHA256:EECDH+aRSA+SHA384:EECDH+aRSA+SHA256:EECDH:!aNULL:!eNULL:!LOW:!3DES:!MD5:!EXP:!PSK:!SRP:!DSS:!EDH:!RC4

      For sites requiring even higher levels of security, restricting the ciphers available to a specific list can provide a more advanced level of control of the ciphers used. For example, the following cipher suite list restricts algorithms to those used by TLSv1.2 only. You should therefore disable all other TLS protocols as described below before using this setting in MarkLogic.

      DHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256

      The following string restricts the algorithms available by only permitting TLSv1.2 ciphers using a 256bit key. However, while this increases security even further, it is at risk of being incompatible with many browsers and applications and should only be used after thorough testing.

      DHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384

      At this stage, while the MarkLogic HTTP Application Server is now using more robust security, it will still permit a client to connect using TLSv1.0 or TLsv1.1. To comply with PCI DSS 3.2 and other recommended security standards, compliant sites must stop using TLSv1.0 before 30th June 2018 while NIST SP 800-52 requires that sites only use TLSv1.1 with a recommendation to use TLSv1.2 where possible.

      Note: Since this article was written, the MarkLogic server has added an administrator function to disable individual SSL and TLS protocol versions. If you are still running MarkLogic version 8.0-5 or earlier, you can continue to use the solution outlined below. Otherwise, users of MarkLogic 9 or later should use the new AppServer Set SSL Disabled Protocols function to control which SSL and TLS protocol versions are available. The following XQuery code, when run against the Security Database, will disable all but TLSv1.2 on MarkLogic 9 or later.

      xquery version "1.0-ml";

      import module namespace admin = "http://marklogic.com/xdmp/admin"
            at "/MarkLogic/admin.xqy";

      let $config := admin:get-configuration()
      let $appServer := admin:appserver-get-id($config,
          admin:group-get-id($config, "Default"),"ssl-project-appserver")
      return
         admin:save-configuration(admin:appserver-set-ssl-disabled-protocols($config, $appServer, ("SSLv3","TLSv1","TLSv1_1")))

      Run the following Xquery code to confirm the weaker TLS Protocols have been disabled

      xquery version "1.0-ml";

      import module namespace admin = "http://marklogic.com/xdmp/admin" at "/MarkLogic/admin.xqy";
      let $config := admin:get-configuration()
      let $appServer := admin:appserver-get-id($config,admin:group-get-id($config, "Default"), "mh-photos-test")
      return
        admin:appserver-get-ssl-disabled-protocols($config, $appServer)

      Output

      SSLv3
      TLSv1
      TLSv1_1

      Warning: Disabling all but TLSv1.2 and restricting available ciphers may break connectivity with applications and browsers configured to use SSLv3, TLSv1.0 or TLSv1.1. MarkLogic recommends that you test thoroughly in a lower QA environment before disabling any algorithms and protocols in a production environment.

      HTTP Strict-Transport-Security

      The HTTP Strict-Transport-Security response header (often abbreviated as HSTS) informs browsers that the site should only be accessed using HTTPS and that any future attempts to access it using HTTP should automatically be converted to HTTPS.

      Set the "enable hsts header" to True to enable HSTS for the AppServer when dictated by your Security requirements.

      TLSv1.2 and browser support (MarkLogic 8 only)

      For TLSv1.2, older browsers should be upgraded to current versions.

      These changes may require users accessing your application to upgrade older browsers such as Firefox < 27.0 or Internet Explorer < 11.0, as these versions do not support TLSv1.2 by default.

      The MarkLogic App Server utilizes OpenSSL, which does not explicitly support enabling or disabling a specific TLS protocol version. However, you effectively get the same outcome by disabling all cipher suites associated with a particular version.

      SSLv3, TLSv1.0 & TLSv1.1 share the same common ciphers, so adding "!SSLv3" and "!TLSv1.0 "to the cipher specification will cause all client connection attempts using any of these protocols to fail, including "TLSv1.1".

      EECDH+ECDSA+AESGCM:EECDH+aRSA+AESGCM:EECDH+ECDSA+SHA384:EECDH+ECDSA+SHA256:EECDH+aRSA+SHA384:EECDH+aRSA+SHA256:EECDH:!aNULL:!eNULL:!LOW:!3DES:!MD5:!EXP:!PSK:!SRP:!DSS:!EDH:!RC4:!SSLv3:!TLSv1.0

      Testing using the OpenSSL s_client utility shows that attempts to connect using TLSv1.0 fail with SSL alert 40, indicating no common cipher was available.

      openssl s_client -connect 192.168.99.100:8010 -debug -tls1
      CONNECTED(00000003)
      ..
      140735283961936:error:14094410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure:s3_pkt.c:1472:SSL alert number 40
      140735283961936:error:1409E0E5:SSL routines:ssl3_write_bytes:ssl handshake failure:s3_pkt.c:656:

      While connecting using TLSv1.2 is successful.

      openssl s_client -connect 192.168.99.100:8010 -debug -tls1_2
      CONNECTED(00000003)
      ...
      ---
      New, TLSv1/SSLv3, Cipher is AES256-GCM-SHA384
      Server public key is 2048 bit
      Secure Renegotiation IS supported
      Compression: NONE
      Expansion: NONE
      No ALPN negotiated
      SSL-Session:
      Protocol : TLSv1.2
      Cipher : AES256-GCM-SHA384

      Further reading

      On MarkLogic Security Certification

      How does MarkLogic Server's high-availability work in AWS?

      AWS provides fault tolerance within a geographic region through the use of Availability Zones (AZs) while MarkLogic gives that ability through Local Disk Failover (LDF). If you’re using AWS, the best practice is to place each MarkLogic node/EC2 instance in a different Availability Zone within a single region, where a given data forests is in one AZ (AZ A), while its LDF forest is in a different AZ (AZ B). This way, in the event where access to Availability Zone A is lost, the host in the Availability Zone A will failover to its LDF on the host in Availability Zone B, thereby ensuring high-availability within your MarkLogic cluster.

      Further reading:

      Should failover be configured for the Security forest?

      A cluster is not functional without its Security database. Consequently, it’s important to ensure high-availability of the Security database’s forest by configuring failover for that forest.

      Further reading:

      Should my forests have more than one Local Disk Failover forest?

      High-availability through Local Disk Failover with one LDF forest is designed to allow the cluster to survive the failure of a single host. If you're using AWS, careful forest placement across AWS availability zones can provide high-availability even in the event of an entire availability zone going down. With rare exceptions, additional LDF forests are typically not worth the additional complexity and cost for the vast majority of MarkLogic deployments.

      If you configure Local Disk Failover with one LDF coupled with Database Replication and Backups, you would have enough copies of your data to survive the failure of a single host to an entire availability zone.

      Do I still have high-availability post failover? What happens to the data forest? How can I fail back my forests to the way they were?

      When a failover event occurs, the LDF forest takes over as the acting data forest and the configured data forest will assume the role of the acting LDF forest as soon as it is successfully restarted. At this point, as long as both forests are still available, the cluster continues to be high availability but with forests reversing their originally intended roles. To fail back the forests into the roles they were originally intended, you will need to wait until the acting data forest (the originally intended LDF) and acting LDF (the originally intended data forest) are synchronized, then manually restart the acting data forest/intended LDF. At that point, the acting LDF/intended data forest “fails back” to take over its original role of acting data forest, and the acting data forest/intended LDF will once again assume its original role of acting LDF. In short, failover is automatic, but failing back requires a manual restart of the acting data forest/intended LDF. When failing back, it's very important to wait until the forests are synchronized - if you fail back before the forests are fully synchronized, you'll lose any data in the acting data forests that's yet to be propagated back to the acting LDF/intended data forest.

      Further reading:

      Where does the hostname come from?

      • If there is a MARKLOGIC_HOSTNAME environmental variable, it is used as the hostname
      • If there is no environment variable configured, the gethostname() library function is called (instead of the gethostname() system call since we use the GNU C Library - see notes here for more info) which internally calls uname() function 
        • This uname() function looks for and returns the nodename which does or does not have a '.' in it (you can also get the output of the uname() call by running the uname --nodename command on the terminal)
          • If the response from the uname() call has a '.' in it, we consider it a complete name and use it as the hostname
          • Otherwise, we look at the resolv.conf for a domain entry/search entry and we take the first entry and to get the complete hostname, we add this entry from resolv.conf to the uname output from the above step followed by a '.' and use that as a hostname
            • E.g.: <uname_output>.<resolv.conf_entry>
            • Note: the resolv.conf file could have both a domain and a search entry and usually domain entry takes priority over search

      Troubleshooting:

      If you experience a hostname mismatch or any hostname issue in general, you can check the following:

      • The following commands/functions are different ways to return the hostname (and you can verify if there is a mismatch)
        • Functions:
        • Commands:
          • hostname
          • hostname -f  (returns FDQN with '.')
          • hostname -d  (lists all the domains)
      • Check the resolv.conf file (under /etc) to see if it contains the right hostname
        • If yes and the issue still persists, restarting ML server would help because if ML is getting the hostname from this file, it will do so at startup

      Note: When you want to open a support ticket in this context, attaching the above information (the outputs of the above commands/functions and the contents of resolv.conf file) along with it would help speed up the investigation

      Possible issues with hostname mismatch:

      Introduction: getting more information about the bugs fixed between releases

      As a general recommendation, we encourage customers to keep the server up-to-date with patch releases at any case.

      If you would like a list of some of the published bugs that were addressed between two releases of the server (for example: 5.0-3 and 5.0-4.1), you can perform the following steps:

      - Log into the support portal at http://help.marklogic.com
      - Click on the "Fixed bugs" icon to take you to the bugtrack list
      - Select 5.0-3 in the From: dropdown box
      - Select 5.0-4.1 in the To: dropdown box
      - Click 'Show' to generate an HTML table or View PDF to export the results in a PDF document

      Step one: login

      Provide your credentials and use the form on the left-hand side to log in to access the support portal

      Log into the support portal

      Step two: select the "Fixed bugs" link from the icons on the page

      Select 'Fixed Bugs' to go to the bugtrack list

      Step three: select the release 'range' from the two dropdown lists on the Fixed Bugs page

      Use the Show button to update the page or download the list in PDF format as required

      Select the versions from the 'From' and 'To' lists to generate the report

      Introduction

      In Amazon Web Services, AMIs have unique ids based on their region. There will be many cases when you want to use multiple regions (for example: maintenance of two clusters in separate geographical regions). Below is an example of how to find the list of current AMIs.

      Log in to Amazon Web Services

      Example image showing the AWS Login Page

      Find your MarkLogic instance on Amazon AWS Marketplace

      Example image showing the MarkLogic 8 HVM in Amazon's Marketplace

      For example: https://aws.amazon.com/marketplace/pp/B00U36DS6Y

      Click continue

      Example Continue button

      View the table

      Choose the version of MarkLogic Server that you're planning to use from the version dropdown.

      Image of a table showing all AMI IDs available for this item in the AWS Marketplace

      You will see a table containing a list of all current regions and the corresponding AMI ID for our instances for each available region.

      Further reading

      Summary

      MarkLogic Server has several different features that can help manage data across multiple database instances. Those features differ from each other in several important ways - this article will focus on high-level distinctions and will provide pointers to other materials to help you decide which of these features could work best for your particular use case.

       Details

      Backup/Restore - database backup and restore operations in MarkLogic Server provide consistent database-level views of your data. Propagating data from one instance to another via backup/restore involves a MarkLogic administrator using a completed backup from the source instance as the restore archive on the destination instance. You can read more about Backup/Restore here: http://docs.marklogic.com/guide/admin/backup_restore.

      Flexible Replication - can be used to maintains copies of data on multiple MarkLogic Servers. Unlike backup/restore (which relies on taking a consistent, database level view of the data at a particular timestamp), Flexible Replication creates a copy of a document in another database and keeps that copy in sync (possibly with some time-lag/latency) with the original in the course of normal operations. You can read more about Flexible Replication here: http://docs.marklogic.com/guide/flexrep/rep_intro. Do note that:

      • Flexible Replication is asynchronous. Asynchronous Replication refers to a configuration in which the Master does not wait for confirmation that the update has been received by the Replica before sending further updates.
      • Flexible Replication does not use the same transaction boundaries on the replica as on the master. For example, 10 documents might be inserted in a single transaction on a Flexible Replication master. Those 10 documents will eventually be inserted on a Flexible Replication replica, but there is no guarantee that the replica instance will also use a single transaction to do so.

      Database Replication - is used maintains copies of data on multiple MarkLogic Servers. Database Replication creates a copy of a document in another database and keeps that copy in sync (possibly with some time-lag/latency) with the original in the course of normal operations. You can read more about Database Replication here: http://docs.marklogic.com/guide/database-replication/dbrep_intro. Note that:

      a. Database Replication is, like Flexible Replication, asynchronous.

      b. In contrast to Fleixble Replication, Database Replication operates by copying journal frames from the Master database and replays the transactions described by those journal frames on the foreign Replica database.

      XA Transactions - MarkLogic Server can participate in distributed transactions by acting as a Resource Manager in an XA/JTA transaction. If there are multiple MarkLogic Server instances participating as XA resources in a given XA transaction, then it's possible to use that XA transaction as a synchronized means of replicating data across those multiple MarkLogic instances. You can read more about XA Transactions in MarkLogic Server here: http://docs.marklogic.com/guide/xcc/concepts#id_57048.

      Introduction

      Upgrading individual MarkLogic instances and clusters is generally very easy to do and in most cases requires very little downtime. In most cases, shutting down the MarkLogic instance on each host in turn, uninstalling the current release, installing the updated release and restarting each MarkLogic instance should be all you need to be concerned about...

      However, unanticipated problems do sometimes come to light and the purpose of this Knowledgebase article is to offer some practical advice as to the steps you can take to ensure the process goes as easily as possible - this is particularly important if you're planning an upgrade between major releases of the product.

      Prerequisites

      While the steps outlined under the process heading below offer practical advice as to what to do to ensure your data is safeguarded (by recommending that backups are taken prior to upgrading), another very useful step would be to ensure you have your current configuration files backed up.

      Each host in a MarkLogic cluster is configured using parameters which are stored in XML Documents that are available on each host. These are usually relatively small files and will zip up to a manageable size.

      If you cd to your "Data" directory (on Linux this is /var/opt/MarkLogic; on Windows this is C:\Program Files\MarkLogic\Data and on OS X this is /Users/{username}/Library/Application Support/MarkLogic), you should see several xml files (assignments, clusters, databases, groups, hosts, server).

      Whenever MarkLogic updates any of these files, it creates a backup using the same naming convention used for older ErrorLog files (_1, _2 etc). We recommend backing up all configuration files before following the steps under the next heading.

      Process

      1) Take a backup for each database in your cluster

      2) Turn reindexing off for each database in your cluster

      3) Starting with the node hosting your Security and Schemas forests, uninstall the current maintenance release MarkLogic version on your cluster, then install the latest maintenance release in that feature release (for example, if you're currently running version 10.0-2, you'll want to update to the latest available MarkLogic 10 maintenance release - at the time of this writing, it is 10.0-4).

      4) Start up the host in your cluster hosting your Security and Schemas forests, then the remaining hosts in the cluster.

      5) Access the Admin UI on the node hosting your Security and Schemas forests and accept the license agreement, either for just that host (Accept button) or for all of the hosts in the cluster (Accept for Cluster button). If you choose the Accept for Cluster button, a summary screen appears showing all of the hosts in the cluster. Click the Accept for Cluster button to confirm acceptance (all of the hosts must be started in order to accept for the cluster). If you accepted the license just for the one host in the previous step, you must go to all of the Admin Interface for all of the other hosts and accept the license for each host before each host can operate.

      6) If you're upgrading across feature releases, you may now repeat steps #3-5 until you reach the desired feature and maintenance release on your cluster (for example, if trying to upgrade from MarkLogic 8 to MarkLogic 10,  after installing 8.0-latest, you'll repeat steps 3-5 for version 9.0-latest).

      7) After you've finished upgrading across all the relevant feature releases, re-enable reindexing for each database in your cluster.

      For more details, please go through Section  “Upgrading a Cluster to a New Maintenance Release of MarkLogic Server” of “Scalability, Availability, and Failover” guide.

      If you've got database replication in place across both a master and replica cluster, then be aware that:

      1) You do not need to break replication between the clusters

      2) You should plan to upgrade both the master cluster and replica cluster. If you upgrade just the master, connectivity between the two clusters will stop due to different XDQP versions. 

      3) If the Security database isn't replicated, then there shouldn't be anything special you need to do other than upgrade the two clusters.

      4) If the security database is replicated, do the following:

      • Upgrade the Replica cluster and run the upgrade scripts. This will update the Replica's Security database to indicate that it is current. It will also do any necessary configuration upgrades.
      • Upgrade the Master cluster and run the upgrade scripts. This will update the Master's Security database to indicate that it is current. It will also do any necessary configuration upgrades.

      For more here Updating Clusters Configured with Database Replication

      Back-out Plan

      MarkLogic does not support restoring a backup made on a newer version of MarkLogic Server onto an older version of MarkLogic Server. Your Back-out plan will need to take this into consideration.

      See the section below for recommendations on how this should be handled.

      Further reading

      Backing out of your upgrade: steps to ensure you can downgrade in an emergency

      Product release notes

      The "Upgrade Support" section of the release notes.

      All known incompatibilities between releases

      The "Upgrading from previous releases" section of the documentation

      MarkLogic Support Fixed Bug List

      Introduction

      spell:suggest() and spell:suggest-detailed aren't simply looking for character differences between the provided strings and the strings in your dictionaries - they're also factoring in differences in the resulting phonetics represented by these strings.

      Detail

      There is an undocumented option that can be passed along to increase the phonetic-distance threshold (which is 1, by default). For example, consider the following:

      xquery version "1.0-ml";

      spell:suggest-detailed(('customDictionary.xml'),'acknowledgment', <options xmlns="http://marklogic.com/xdmp/spell"> <phonetic-distance>2</phonetic-distance> </options> )

      =>

      <spell:suggestion original="acknowledgment"
      dictionary="customDictionary.xml"
      xmlns:xml="http://www.w3.org/XML/1998/namespace"
      xmlns:spell="http://marklogic.com/xdmp/spell"> <spell:word distance="9" key-distance="2" word-distance="45"
      levenshtein-distance="1">acknowledgement</spell:word> </spell:suggestion>

      Note that the option "distance-threshold" corresponds to "distance" in the result, and "phonetic-distance" corresponds to "key-distance."

      Also note that increasing the phonetic-distance may cause spell:suggest() and spell:suggest-detailed() to use significantly more CPU. Metaphones are short keys, so a larger distance may match a very large fraction of the dictionary, which would then mean each of those matches would need to be checked in the distance algorithms.

      Background

      A database consists of one or more forests. A forest is a collection of documents (mostly XML trees, thus the name), implemented as a physical directory on disk. Each forest holds a set of documents and all their indexes. 

      When a new document is loaded into MarkLogic Server, the server puts this document in an in-memory stand and writes the action to an on-disk journal to maintain transactional integrity in case of system failure. After enough documents are loaded, the in-memory stand will fill up and be flushed to disk, written out as an on-disk stand. As more document are loaded, they go into a new in-memory stand. At some point this in-memory stand fills up as well, and the in-memory stand gets written as yet another new on-disk stand.

      To read a single term list, MarkLogic must read the term list data from each individual stand and unify the results. To keep the number of stands to a manageable level where that unification isn't a performance concern, MarkLogic runs merges in the background. A merge takes some of the stands on disk and creates a new singular stand out of them, coalescing and optimizing the indexes and data, as well as removing any previously deleted fragments
      Each forest has its own in-memory stand and set of on-disk stands. Loading and indexing content is a largely parallelizable activity so splitting the loading effort across forests and potentially across machines in a cluster can help scale the ingestion work.

      Deletions and Multi-Version Concurrency Control (MVCC)

      What happens if you delete or change a document? If you delete a document, MarkLogic marks the document as deleted but does not immediately remove it from disk. The deleted document will be removed from query results based on its deletion markings, and the next merge of the stand holding the document will bypass the deleted document when writing the new stand. MarkLogic treats any changed document like a new document, and treats the old version like a deleted document.

      This approach is known in database circles as which stands for Multi-Version Concurrency Control (or MVCC).
      In an MVCC system changes are tracked with a timestamp number which increments for each transaction as the database changes. Each fragment gets its own creation-time (the timestamp at which it was created) and deletion-time (the timestamp at which it was marked as deleted, starting at infinity for fragments not yet deleted).

      For a request that doesn't modify data the system gets a performance boost by skipping the need for any URI locking. The query is viewed as running at a certain timestamp, and throughout its life it sees a consistent view of the database at that timestamp, even as other (update) requests continue forward and change the data.

      Updates and Deadlocks

      An update request, because it isn't read-only, has to use read/write locks to maintain system integrity while making changes. Read-locks block for write-locks; write-locks block for both read and write-locks. An update has to obtain a read-lock before reading a document and a write-lock before changing (adding, deleting, modifying) a document. Lock acquisition is ordered, first-come first-served, and locks are released automatically at the end of a request.

      In any lock-based system you have to worry about deadlocks, where two or more updates are stalled waiting on locks held by the other. In MarkLogic deadlocks are automatically detected with a background thread. When the deadlock happens on the same host in a cluster, the update farthest along (with the most locks) wins and the other update gets restarted. When it happens on different hosts, because lock count information isn't in the wire protocol, both updates start over. MarkLogic differentiates queries from updates using static analysis. Before running a request, it looks at the code to determine if it includes any calls to update functions. If so, it's an update. If not, it's a query. Even if at execution time the update doesn't actually invoke the updating function, it still runs as an update.

      For the most part it's not under the control of the user. The one exception is there's an xdmp:lock-for-update($uri) call that requests a write-lock on a document URI, without actually having to issue a write and in fact without the URI even having to exist.

      When a request potentially touches millions of documents (such as sorting a large data set to find the most recent items), a query request that runs lock-free will outperform an update request that needs to acquire read-locks and writelocks. In some cases you can speed up the query work by isolating the update work to its own transactional context. This technique only works if the update doesn't have a dependency on the outer query, but that turns out to be a common case. For example, let's say you want to execute a content search and record the user's search string to the database for tracking purposes. The database update doesn't need to be in the same transactional context as the search itself, and would slow things down if it were. In this case it's better to run the search in one context (read-only and lock-free) and the update in a different context. See the xdmp:eval() and xdmp:invoke() functions for documentation on how to invoke a request from within another request and manage the transactional contexts between the two.

      Document Lifecycle

      Let's track the lifecycle of a document from first load to deletion until the eventual removal from disk. A document load request acquires a write-lock for the target URI as part of the xdmp:document-load() function call. If any other request is already doing a write to the same URI, our load will block for it, and vice versa. At some point, when the full update request completes successfully (without any errors that would implicitly cause a rollback), the actual insertion work begins, processing the queue of update work orders. MarkLogic starts by parsing and indexing the document contents, converting the document from XML to a compressed binary fragment representation. The fragment gets added to the in-memory stand. At this point the fragment is considered a nascent fragment, a term you'll see sometimes on the administration console status pages. Being nascent means it exists in a stand but hasn't been fully committed. (On a technical level, nascent fragments have creation and deletion timestamps both set to infinity, so they can be managed by the system while not appearing in queries prematurely.) If you're doing a large transactional insert you'll accumulate a lot of nascent fragments while the documents are being processed. They stay nascent until they've been committed. Once the fragment is placed into the in-memory stand, the request is ready to commit. It obtains the next timestamp value, journals its intent to commit the transaction, and then makes the fragment available by setting the creation timestamp for the new fragment to the transaction's timestamp. At this point it's a durable transaction, replayable in event of server failure, and it's available to any new queries that run at this timestamp or later, as well as any updates from this point forward (even those in progress). As the request terminates, the write-lock gets released.

      Our document lives for a time in the in-memory stand, fully queryable and durable, until at some point the in-memory stand fills up and gets written to disk. Our document is now in an on-disk stand. Sometime later, based on merge algorithms, the on-disk stand will get merged with some other on-disk stands to produce a new on-disk stand. The fragment will be carried over, its tree data and indexes incorporated into the larger stand. This might happen several times.

      At some point a new request makes a change to the document, such as with an xdmp:node-replace() call. The request making the change first obtains a read-lock on the URI when it first accesses the document, then promotes the read-lock to a write-lock when executing the xdmp:node-replace() call. If another write-lock were already present on the URI from another executing update, the read-lock would have blocked until the other write-lock released. If another read-lock were already present, the lock promotion to a write-lock would have blocked. Assuming the update request finishes successfully, the work runs similar to before: parsing and indexing the document, writing it to the in-memory stand as a nascent fragment, acquiring a timestamp, journaling the work, and setting the creation timestamp to make the fragment live. Because it's an update, it has to mark the old fragment as deleted also, and does that by setting the deletion timestamp of the original fragment to the transaction timestamp. This combination effectively replaces the old fragment with the new. When the request concludes, it releases its locks. Our document is now deleted, replaced by the new version.

      The old fragment still exists on disk, of course. In fact, any query that was already in progress before the update incremented the timestamp, or any query doing time travel with an old timestamp, can still see it. Eventually the on-disk stand holding the fragment will be merged again, at which point the old fragment will be completely removed from the system. It won't be written into the new on-disk stand. That is, unless the administration "merge timestamp" was set to allow deep time travel. In that case it will live on, sticking around in case any new queries want to time travel to see old fragments.

      Summary

      When interacting with the MarkLogic Technical Support Team, there will be times when you will need to either submit or retrieve large data content. MarkLogic maintains an SFTP (SSH file transfer protocol) server for this purpose. This article describes how to interact with the SFTP server at sftp.support.marklogic.com. 

      Requirements

      Our SFTP service is a self-managed service and requires a public key to be uploaded to the support account profile for activation.  

      • Step 1. Generate an SSH key pair or re-use an existing one 
      • Step 2. Login to the MarkLogic Support portal https://help.marklogic.com and click on the "My Profile" link
      • Step 3. Scroll down to the "Public Keys" section and click on "Add Key."
      • Step 4. Copy paste the content of your public key into the text field
      • Step 5. Update the profile by clicking on "Update."

      Our key upload accepts RSA, DSA and ECDSA public keys in OpenSSH, Putty, PKCS1 or PKCS8 format. The uploaded key will be converted to OpenSSH format automatically. After a public key has been uploaded, it will be used for any ticket created in our Support Portal to login to the SFTP service. We recommend rotating the keys on regular bases for security reasons.

      Connection is on the default port 22.  It is advised to follow your IT/Security department about security requirements.

      Customer Access

      The account details to login to our SFTP service will be provided in all email responses or directly from the Support Portal after selecting an open ticket from the "My Ticket" list. The account details will be of the following format.

      "xyz-123-4567@sftp.support.marklogic.com" and are different between each ticket. In general, the creation of an SFTP account happens fully automated in the backend and takes only a few minutes to be ready. No contact is required for the setup, but please reach out if there are any questions or problems.

      Sharing any data requires only a few steps.

      Logging In

      1. Open your preferred SFTP client.
      2. Provide user / account details "xyz-123-4567@sftp.support.marklogic.com". Replace xyz-123-4567 with ticket meta details provided in the email or from our support portal
      3. Verify the private key selection of your client
      4. Login or connect 

      You are now logged in to the MarkLogic SFTP site and in the ticket home directory, where all data will be shared for this ticket.

      Submitting Content to MarkLogic

      Uploading files doesn't require changing to any directory as they will be placed directly into the home folder.

      • To upload a file, use
        • drag and drop for UI-based clients
        • Put command for command line-based clients.
      • In case an upload becomes disconnected, it can also be resumed at any time by using the resume feature of your SFTP client or the "reput" SFTP command.
      • listing and deleting any file is supported the same way
      • After files have been uploaded, our system will scan them, calculate the MD5 checksum, send a notification and add them to the ticket.

      Retrieving Content from MarkLogic

      Downloading files is similar to uploading; no change of directory is required.

      • In case MarkLogic Support uploads/places a file as a notification will be send
      • To download a file use
        • drag and drop for UI-based clients
        • Get command for command line-based clients
      • In case the download is interrupted, it can be resumed at any time using the resume feature of your SFTP client or the "reget" SFTP command.

      Data life cycle

      Any data uploaded to the SFTP site will be available during the lifetime of an open ticket. After a ticket has been closed, the data will be deleted shortly after. It can, of course, be deleted at any time during the ticket work, either on request or by using the SFTP client. In case it is deleted manually, we appreciate getting notified in advance as it might still be needed for further investigations.

      Summary

      The following article explains the way in-memory caches are used by MarkLogic Server and how can they be utilized to improve query execution.

       

      Detail

      MarkLogic Server provides several caches that are used to improve the performance during query execution. When a query executes for the first time, the Server will populate these caches to store termlist and data fragments in memory.

      MarkLogic Server keeps a lot of its configuration information in databases, and has a lot of caches to make it run faster, but those caches get populated the first time things are accessed. The server also uses book-keeping terms in the indexes to keep track of whether all documents have been indexed with the current settings. MarkLogic caches this information, but has to query the indexes on the first request to warm the cache.

      The in-memory cache in MarkLogic Server holds data that was recently added to the system and is still in an in-memory stand; that is, it holds data that has not yet been written to disk.

      For updates, if there is no in-memory stand on a forest when a new document is inserted, the server will create it. This stand is big enough for thousands of documents, but the cost of creating it will be seen in the time taken for the first document added to it.

       


      How will the in-memory cache help improve query execution

      When a query is executed, the in-memory data structures like range indexes and lexicons get pinned into RAM the first time they are used.  The easiest way to speed things up is to "warm the caches” by running a small sample program that exercises the type-ahead prior to starting production. You can also keep the server warm by doing a non-time-critical stub update at time intervals (every 30 sec to 1 minute). If the server is idle, then it will serve to keep caches and in-memory stand warm. If the server is really busy then it would only take a small amount of extra work. Once this is done, the functionality will be fast for all users in all future sessions.

      Introduction

      MarkLogic does not recommend having more than one forest for the Security database.

      The Security database is typically fairly small and there is no reason to have more than one forest for the Security database. Having more than one Security forest causes additional complexity during failover events, server upgrades, and restarts. A functioning Security database is critical to the stability of a MarkLogic Cluster and it is easier to recover from a host failure if the Security database is configured with only a single forest and a single replica forest. 

      In terms of high availability and forest failover, one local disk failover forest should be configured. In terms of database replication, a replica forest in the replica cluster should be configured.

      If you have more than one Security forest(s):

      We have seen incidents where customers attached more than one Security forest either intentionally or inadvertently (scripting bug or user error) and run into issues while detaching them.

      When the database rebalancer is enabled for the database (default setting) and when a new forest is attached, the database will automatically redistribute the content across all attached forests. Problems can then arise when security forests are detached without preserving their content. This is true for any database, but is problematic when dealing with the Security database. 

      When a Security database forest is detached without first retiring it (and verifying documents are moved out of it), some Security documents will be removed from the database. This may lead to users being locked out of the cluster or render the cluster unusable.  If this occurs on your MarkLogic cluster, please contact MarkLogic Support to help with the repair.

      Best Practice

      • Do not configure more than one forest for any system database, including the Security database.
      • If you have multiple forests in your Security database, and need to come back in line with our one forest recommendation
        • Retire the extra Security database forests;
        • Verify all extra forests are drained of content (zero documents / zero fragments);
        • Detach the extra forests.
      • Once your cluster is in line with our one forest recommendation, disable the rebalancer for the Security database.
      • Configure a single replica forest to achieve high availability.

      Further reading

      Administering Security in MarkLogic

      Database Rebalancing in MarkLogic

      Restoring Security Database

      Security Database restore leading to lingering Certificate Template id in Config files

      The target for range indexes in a MarkLogic database should be about 100. This is because:

      • In the interests of performance, MarkLogic Server indexes your content on ingest, then memory maps those indexes to serialized data structures on disk. Each of those memory maps requires some amount of RAM.
        • If you've got many thousands of indexes you may run into a situation where system monitoring is reporting you've got RAM to spare, but MarkLogic Server is reporting "SVC-MAPINI: Mapped file initialization error." In which case you're likely running up against Linux's default vm.max_map_count value.
        • Independent of SVC-MAPINI errors, the more range indexes you've configured, the longer it will take to perform forest operations.
      • If you find yourself configuring many hundreds or even thousands of range indexes, you should migrate your data modeling scheme to take advantage of Template Driven Extraction (TDE), which was specifically engineered to address this scenario.

      Additional Reading:

      Introduction

      This Knowledgebase article is a general guideline for backups using the journal archiving feature for both free space requirements and expected file sizes written to the archive journaling repository when archive journaling is enabled and active.

      The MarkLogic environment used here was an out-of-the box version 9.x with one change of adding a new directory specific to storing the archive journal backup files.

      It is assumed that the reader of this article already has a basic understanding of the role of Journal Archiving in the Backup and Restore feature of MarkLogic Server. See references below for further details(below).

      How much free space is needed for the Archive Journal files in a backup?

      MarkLogic Server uses the forest size of the active forest to confirm whether the journal archive repository has enough free space to accommodate that forest, but if additional forests already exist on the same volume, then there may be an issue in the Server's "free-space" calculation as the other forests are never used in the algorithm that calculates the free space available for the backup and/or archive journal repositories. Only one forest is used in the free-space calculation.

      In other words, if multiple forests exist on the same volume, there may not be enough free space available on that specific volume due to the additional forests; especially during a high rate of ingestion. If that is the case, then it is advised to provide enough free space on that volume to accommodate the sizes of all the forests. Required Free Space(approximately) = (Number of Forests) x (Size of largest Forest).

      What can we expect to see in the journal archiving repository in terms of files sizes for specific ingestion types and sizes? That brings us to the other side.

      How is the Journal Archive repository filling up?

      1 MByte of raw XML data loaded into the server (as either a new document ingestion or a document update) will result in approximately 5 to 6 MBytes of data being written to the corresponding Journal Archive files.  Additionally, adding Range Indexes will contribute to a relatively small increase in consumed space.

      Ingesting/updating RDF data results in slightly less data being written to the journal archive files.

      In conclusion, for both new document ingestion and document updates, the typical expansion ratio of Journal Archive size to Input file size is between 5 an 6 but can be higher than that depending on the document structure and any added range indexes.

      References:

      Problem summary: Sometimes, it is required to use the license key acquired from MarkLogic instead of the one that comes out of box by subscribing to the AMIs on AWS. In such case, the below are the steps to follow to change the license key.

      Please note that if the cluster was created using an enterprise AMI(pay as you go), it is not possible to change the license key details on the instances manually.

      However, if the cluster was created using cloud formation templates, we have an option of updating the stack. In order to change the license key,  please perform the below steps

      1. Modify the cloud formation template by changing the AMI ID to a developer AMI ID or a custom AMI based out of the developer AMI in the cloud formation template.
      2. Go to cloud formation and then update the stack with the new modified template and while updating the stack, please provide the new license key details.
      3. Once the stack update was successful, terminate the existing nodes from EC2 dashboard so that new nodes get created with the new developer AMI.
      4. Once the cluster is back to running state, verify if the new license key is updated through the admin UI. If it still is not updated, please change the license details through the admin UI so that the cluster uses your own license key.

       If the cluster is created using Developer AMI or a custom AMI based out of the developer AMI without using cloud formation templates, you can follow the below steps to update the license on every node

      1. SSH into instance using the key that was used while creating the instance
      2. Stop the MarkLogic server on node.
      3. remove/ take a back up of /var/local.mlcmd.conf
      4. Create a file named marklogic.conf under /etc with the below entries
        MARKLOGIC_LICENSEE="******"
        MARKLOGIC_LICENSE_KEY="**********"
      5. Complete the above steps on all the nodes.
      6. Start the Cluster by starting MarkLogic server on each node.
      7. Log into the ML's admin GUI , then navigate to license key for every host and click "OK" button. You will observe the server restarting with the new license key.

       Please make sure you test the above in any of your lower environment before implementing in production. Please feel free to get back to us if you have any questions.

      Introduction

      Content processing applications often require multi-step processing. Each step in the process performs a particular task or set of tasks. The Content Processing Framework in MarkLogic Server supports these types of multi-step conversion processes. Sometimes during document delete operation, it is possible that the CPF action might fail with 'XDMP-CONFLICTINGUPDATES' error, which can be seen in document-properties file like:

      Sample message:

      <error:format-string>XDMP-CONFLICTINGUPDATES: xdmp:document-set-property("FILE-NAME", <cpf:state xmlns:cpf="http://marklogic.com/cpf">http://marklogic.com/states/deleted</cpf:state>) -- Conflicting updates xdmp:document-set-property("FILE-NAME", /cpf:state) and xdmp:document-delete("FILE-NAME")</error:format-string>

      This error message indicates that an update statement (for e.g. xdmp:document-set-property) is trying to update a document that is conflicting with other update occurring (e.g. xdmp:document-delete) in the same transaction.

       

      Detail

      Actions that want to delete the target URI need special handling because MarkLogic CPF also wants to keep track of progress in the properties, and just having document-delete [ xdmp:document-delete($cpf:document-uri) ]can't do that.

      Following are ways to achieve the expected behavior and get past the XDMP-CONFLICTINGUPDATES error:

      1) Performing a "soft delete" on the document and then let CPF take care of deleting the document. This can be done by setting the document status to "deleted" via cpf:document-set-processing-status API function. Setting the document's processing status to "deleted" will tell CPF to clean up the document and not update properties at the same time.

      cpf:document-set-processing-status( $uri-to-delete, "deleted" )

      Additional details can be found at: http://docs.marklogic.com/cpf:document-set-processing-status


      2) If you want to keep a record of the URI that is being deleted, you can delete its root node instead of the document. The CPF state will be able be recorded in document-properties, even if the document is gone.

      xdmp:node-delete(doc($uri-to-delete))

      Details at: http://docs.marklogic.com/xdmp:node-delete

      Problem Statement : AWS has updated the lambda python runtime version to python:3.9.v19 in us-east regions and it fails to satisfy some dependencies that we package with our Managed Cluster Lambda code and fails to create the Managed ENI stack and also NodeManager stack. Stack creation works perfectly fine in other AWS region (us-west-2, eu-central-1) as lambda runtime still uses python:3.9.v18

      Proposed Solution: 

      1. For the newly creating Clusters that use custom templates with ML Managed ENI and NodeManager as reference. Below is what needs to be changed.

      Managed ENI and NodeManager Template Reference: (Code highlighted in blue need to be added, region "us-east-2" should be edited based on the region where stack is created)

      Managed ENI

      ManagedEniFunction:
          Type: 'AWS::Lambda::Function'
          DependsOn: ManagedEniExecRole
          Properties:
            Code:
              S3Bucket: !Ref S3Bucket
              S3Key: !Join ['/', [!Ref S3Directory,'managed_eni.zip']]
            Handler: managedeni.handler
            Role: !GetAtt [ManagedEniExecRole, Arn]
            Runtime: python3.9
            RuntimeManagementConfig:
              RuntimeVersionArn: 'arn:aws:lambda:us-east-2::runtime:edb5a058bfa782cb9cedc6d534ac8b8c193bc28e9a9879d9f5ebaaf619cd0fc0'
              UpdateRuntimeOn: 'Manual'
            Timeout: '180'

      NodeManager

      NodeManagerFunction:
          Type: 'AWS::Lambda::Function'
          DependsOn: NodeManagerExecRole
          Properties:
            Code:
              S3Bucket: !Ref S3Bucket
              S3Key: !Join ['/', [!Ref S3Directory,'node_manager.zip']]
            Handler: nodemanager.handler
            Role: !GetAtt [NodeManagerExecRole, Arn]
            Runtime: python3.9
            RuntimeManagementConfig:
              RuntimeVersionArn: 'arn:aws:lambda:us-east-2::runtime:edb5a058bfa782cb9cedc6d534ac8b8c193bc28e9a9879d9f5ebaaf619cd0fc0'
              UpdateRuntimeOn: 'Manual'
            Timeout: '180'

      2. For the newly creating clusters with default lambda templates that are offered by MarkLogic "ml-managedeni.template" and "ml-nodemanager.template". Marklogic Team patched the templates already. It will be from 10.0-9.2 to 10.0-9.5 and 11.0.0 to 11.0.2. For any ML 10 older versions customers needs to raise support ticket and we will address it.

      3. For the customers who have existing stack and perform upgrades on regular basis. Please follow the below steps on the existing managedENI and NodeManager Lambda functions manually one time before performing any upgrades.

      Look for Managed ENI function AWS Lambda console in the region where stack was deployed

      Under Runtime Settings → Edit runtime management configuration

      Select Manual option and input the ARN of the previous runtime  python:3.9.v18(arn:aws:lambda:us-west-2::runtime:edb5a058bfa782cb9cedc6d534ac8b8c193bc28e9a9879d9f5ebaaf619cd0fc0). Region must be edited accordingly based on where your lambda function is located.

      Repeat the same steps for the NodeManager Lambda function as well and save it before performing any upgrades.

      Introduction

      Sometimes, when a host is removed from a cluster in an improper manner -- e.g., by some means other than the Admin UI or Admin API, a remote host can still try to communicate with its old cluster, but the cluster will recognize it as a "foreign IP" and will log a message like the one below:

      2014-12-16 00:00:20.228 Warning: XDQPServerConnection::init(10.0.80.7:7999-10.0.80.39:44247): SVC-SOCRECV: Socket receive error: wait 10.0.80.7:7999-10.0.80.39:44247: Timeout

      Explanation: 

      XDQP is the internal protocol that MarkLogic uses for internal communications amongst the hosts in a cluster and it uses port 7999 by default. In this message, the local host 10.0.80.7 is receiveng socket connections from foreign host 10.0.80.39.

       

      Debugging Procedure, Step 1

      To find out if this message indicates a socket connection from an IP address that is not part of the cluster, the first place is to look is in the hosts.xml files. If the IP address in not found in the hosts.xml, then it is a foreign IP. In that case, the following are the steps will help to identify the the processes that are listening on port 7999.

       

      Debugging Procedure, Step 2

      To find out who is listening on XDQP ports, try running the following command in a shell window on each host:

            $ sudo netstat -tulpn | grep 7999

      You should only see MarkLogic as a listner:

           tcp 0 0 0.0.0.0:7999 0.0.0.0:* LISTEN 1605/MarkLogic

      If you see any other process listening on 7999, yopu have found your culprit. Shot down those processes and the messages will go away.

       

      Debugging Procedure, Step 3

      If the issue persists, run tcpdump to trace packets to/from "foreign" hosts using the following command:

           tcpdump -n host {unrecognized IP}

      Shutdown MarkLogic on those hosts. Also, shutdown any other applications that are using port 7999.

       

      Debugging Procedure, Step 4

      If the cluster are hosts on AWS, you may also want to check on your Elastic Load Balancer ports. This may be tricky, because instances will change IP addresses if they are rebooted, so  work with AWS Support to help you find the AMI or load balancer instance that is pinging your cluster.

      In the case that the "foreign host" is an elastic load balancer, be sure to remove port 7999 from its rotation/scheduler. In addition, you should set the load balancer to use port 7997 for the heartbeat functionality.

      Introduction

      Sometimes, when a cluster is under heavy load, your cluster may show a lot of XDQP-TIMEOUT messages in the error log. Often, a subset of hosts in the cluster may become so busy that the forests they host get unmounted and remounted repeatedly. Depending on your database and group settings, the act of remounting a forest may be very time-consuming, due to the fact that that all hosts in the cluster are being forced to do extra work of index detection.

      Forest Remounts

      Every time a forest remounts, the error log will show a lot messages like these:

      2012-08-27 06:50:33.146 Debug: Detecting indexes for database my-schemas
      2012-08-27 06:50:33.146 Debug: Detecting indexes for database Triggers
      2012-08-27 06:50:35.370 Debug: Detected indexes for database Last-Login: sln
      2012-08-27 06:50:35.370 Debug: Detected indexes for database Triggers: sln
      2012-08-27 06:50:35.370 Debug: Detected indexes for database Schemas: sln
      2012-08-27 06:50:35.370 Debug: Detected indexes for database Modules: sln
      2012-08-27 06:50:35.373 Debug: Detected indexes for database Security: sln
      2012-08-27 06:50:35.485 Debug: Detected indexes for database my-modules: sln
      2012-08-27 06:50:35.773 Debug: Detected indexes for database App-Services: sln
      2012-08-27 06:50:35.773 Debug: Detected indexes for database Fab: sln
      2012-08-27 06:50:35.805 Debug: Detected indexes for database Documents: ss, fp

      ... and so on ...

      This can go on for several minutes and will cost you more down time than necessary, since you already know the indexes for each database.

      Improving the situation

      Here are some suggestions for improving this situation:

      1. Browse to Admin UI -> Databases -> my-database-name
      2. Set ‘index detection’ to ‘none’
      3. Set ‘expunge locks’ to ‘none’

      Repeat steps 1-4 for all active databases.

      Now tweak the group settings to make the cluster less sensitive to an occasional busy host:

      1. Browse to Admin UI -> Groups -> E-Nodes
      2. Set ‘xdqp timeout’ to 30
      3. Set ‘host timeout’ to 90
      4. Click OK to make this change effective.

      The database-level changes tell the server to speed up cluster startup time when a server node is perceived to be offline. The group changes will cause the hosts on that group to be a little more forgiving before declaring a host to be offline, thus preventing forest unmounting when it's not really needed.

      If after performing these changes, you find that you are still experiencing XDQP-TIMEOUT's, the next step is to contact MarkLogic Support for assistance. You should also alert your Development team, in case there is a stray query that is causing the data nodes to gather too many results.

      Related Reading

      XML Data Query Protocol (XDQP)

      Introduction

      Under normal operations, only a single user object is created for a user-name. However, when users are migrated from another security database and if the recommend checking is not performed, duplicate user-names might be created.

      Resolution

      When there are duplicate user-names in the database, you may see the following message on the Admin UI or in the error logs:

      500: Internal Server Error
      XDMP-AS: (err:XPTY0004) get-element($col, "sec:user", "sec:user-name", $user-name, "SEC-USERDNE") -- Invalid coercion: (fn:doc("http://marklogic.com/xdmp/users/*******")/sec:user, fn:doc("http://marklogic.com/xdmp/users/*******")/sec:user) as element()?

       

      To fix duplicate user-names, the extra security object that is created needs to be removed. You can delete one of the extra security objects, which should have a URI similar to:


      http://marklogic.com/xdmp/users/******* where "*******" represents the user-id's.

       

      To resolve the issue, follow the below steps:

      1. Perform a backup of your Security database in case manual recovery is required.

      2. Login to the QConsole with admin credentials.

      3. Select "Security" database as the content-source

      4. Delete the security object by executing xdmp:document-delete($uri) with $uri set to the Uri of the duplicate user.

      Summary

      When configuring a server to add a foreign cluster you may encounter the following error:

      Forbidden
      Host does not match origin or inferred origin, or is otherwise untrusted.
      

      This error will typically occur when using MarkLogic Server versions prior to 10.0-6, in combination with Chrome versions newer than 84.x.

      Our recommendation to resolve this issue is to upgrade to MarkLogic Server 10.0-6 or newer. If that is not an option, then using a different browser, such as Mozilla Firefox, or downgrading to Chrome version 84.x may also resolve the error.

      Changes to Chrome

      Starting in version 85.x of Chrome, there was a change made to the default Referrer-Policy, which is what causes the error. The old default was no-referrer-when-downgrade, and the new value is strict-origin-when-cross-origin. When no policy is set, the browser's default setting is used. Websites are able to set their own policy, but it is common practice for websites to defer to the browser's default setting.

      A more detailed description can be found at developers.google.com

      Introduction

      For hosts that don't use a standard US locale (en_US) there are instances where some lower level calls will return data that cannot be parsed by MarkLogic Server. An example of this is shown with a host configured with a different locale when making a call to the Cluster Status page (cluster-status.xqy):

      lexval-exception.gif

      The problem

      The problem you have encountered is a known issue: MarkLogic Server uses a call to strtof() to parse the values as floats:

      http://linux.die.net/man/3/strtof

      Unfortunately, this uses a locale-specific decimal point. The issue in this environment is likely due to the Operating System using a numeric locale where the decimal point is a comma, rather then a period.

      Resolving the issue

      The workaround for this is as follows:

      1. Create a file called /etc/marklogic.conf (unless one already exists)

      2. Add the following line to /etc/marklogic.conf:

      export LC_NUMERIC=en_US.UTF-8

      After this is done, you can restart the MarkLogic process so the change is detected and try to access the cluster status again.

      Summary

      This Knowledgebase article outlines the necessary steps required in importing an existing (pre-signed) Certificate into MarkLogic Server and configuring a MarkLogic Application Server to utilize that certificate.

      Existing (Pre-signed) Certificate vs. Certificate Request Generated by MarkLogic

      MarkLogic will allow you to use an existing certificate or will allow you to generate a Certificate Request. The key difference between above two lies in who generates public-private keys and other fields in the certificate.

      For a Pre-Signed Certificate: In this instance, the keys already exist outside of MarkLogic Server, and 3rd party tool would have populated CN (Common Name) and other subject fields to generate Certificate Request File (.csr) containing a public key.

      For a Certificate Request Generated by MarkLogic: In this instance, new keys are generated by MarkLogic Server (it does this while creating the new template), while CN and other fields are added by the MarkLogic Server Administrator (or user) through the web-based MarkLogic admin GUI during New Certificate Template creation.

      The section in MarkLogic's online documentation on Creating a Certificate Template covers the steps required to generate a certificate template from within MarkLogic Server: http://docs.marklogic.com/guide/security/SSL#id_35140

        

      Steps to Import Pre-Signed Certificate and Key into MarkLogic

      1) Create a Certificate Template 

      Create a new Certificate Template with the fields similar to your existing Pre-Signed Certificate

      For example, your current Certificate file - presigned.marklogic.com.crt

      [amistry@engrlab18-128-026 PreSignedCert]$ openssl x509 -in ML.pem -text 
      Certificate:
          Data:
              Version: 1 (0x0)
              Serial Number: 7 (0x7)
          Signature Algorithm: sha1WithRSAEncryption
              Issuer: C=US, ST=CA, L=San Carlos, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic CA
              Validity
                  Not Before: Nov 30 04:12:33 2015 GMT
                  Not After : Nov 29 04:12:33 2017 GMT
              Subject: C=US, ST=NJ, L=Princeton, O=DemoLab Corporation, OU=Engineering, CN=presigned.engrlab.marklogic.com
              Subject Public Key Info:
                  Public Key Algorithm: rsaEncryption
                      Public-Key: (1024 bit)
       
       
      For above Certificate we will create below Custom Template in Admin GUI -> Configure-> Security -> Certificate Template  Create Tab as below.
      We will save our new template as - "DemoLab Corporation Template"
       
       
       Template.jpg

      Note - Above fields are placeholders only for signed Certificate, and MarkLogic mainly uses above fields to generate Certificate Signing Request (.csr). For Certificate request generated by 3rd party tool, it does NOT matter if template field matches exactly with final signed Certificate or not.

      Once we have Signed Certificate imported, App Server will use the Signed Certificate, and the SSL Client will only see field values from the Signed Certificate (even if they are different from Template Config page ).

      2) Create an HTTPS App Server

      Please follow Procedures for Enabling SSL on App Servers except for the "Creating Certificate Template" part as we have created the Template to match our existing pre-signed Certificate. 

      3) Verify Pre-signed Certificate and Private Key file 

      Prior to installing a pre-signed certificate and private key the following verification should be performed to ensure that both certificate and key are valid and are in the correct format. 

      * Generate and display the certificate checksum using the OpenSSL utility

      [admin@sitea ~]# openssl x509 -noout -modulus -in cert.pem | openssl md5

      (stdin)= 2ddd2ca48ad2eb4eba082f5da3fd33ab

      * Generate and display the private key checksum

      [admin@siteaa ~]# openssl rsa -noout -modulus -in key.key | openssl md5

      (stdin)= 2ddd2ca48ad2eb4eba082f5da3fd33ab

      The checksum from both commands should return identical values, if the values do not match or if you are prompted for additional information such as the private key password then the certificate and private keys are not valid and should be corrected before proceeding.

      Note: Proceeding to the next step without verifying the certificate and the private key could lead to the MarkLogic server being made inaccessible. 

      Advisory: Private Key's with a key length of 1024 and less are now considered insecure. When generating a Private Key you should ensure a key length of 2048 or higher is used.

      4) Install Pre-signed Certificate and Key file to Certificate Template using Query Console

      Now since Certificate was pre-signed, MarkLogic does not have a key that goes along with that Pre-signed Certificate. We will install Pre-signed Certificate and Key into MarkLogic using below XQuery in Query Console.

      Note: Query Must be run against Security Database. 

      Please change the Certificate Template-Name, and Certificate/Key File location in below XQuery to reflect values from your environment.

      xquery version "1.0-ml";
      import module namespace pki = "http://marklogic.com/xdmp/pki" at "/MarkLogic/pki.xqy";
      import module namespace admin = "http://marklogic.com/xdmp/admin" at "/MarkLogic/admin.xqy";
      
      (: Update Template name for your environment :)
      let $templateid := pki:template-get-id(pki:get-template-by-name("TemplateName"))
      (: Path on the MarkLogic host that is readable by the MarkLogic server process (default daemon) :)
      (:   File suffix could also be .txt or other format :)
      let $path-to-cert := "/cert.pem"
      let $path-to-key := "/key.key"
      
      return
      pki:insert-host-certificate($templateid,
        xdmp:document-get($path-to-cert,
          <options xmlns="xdmp:document-get"><format>text</format></options>),
        xdmp:document-get($path-to-key,
          <options xmlns="xdmp:document-get"><format>text</format></options>)
      )
      

       Above will associate our pre-signed Certificate and Key into Template created earlier, which is linked to HTTPS App Server.

      Important note: pki:insert-trusted-certificates can also be used in place of pki:insert-host-certificate in the above example.

      Introduction

      This article discusses the effects of the incremental backup implementation on Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO).

      Details

      With MarkLogic 8 you can have multiple daily incremental backups with minimal impact on database performance.

      Incrementals complete more quickly than full backups reducing the backup window. A smaller backup window enables more frequent backups, reducing the RPO of the database in case of disaster.

      However, RTO can be longer when using incremental backups compared to just full backups, because multiple backups must be restored to recover.

      There are two modes of operation when using incremental backups:

      Incremental since last full. Here, each incremental has to store all the data that has changed since the last full backup. Since a restore only has to go through a single incremental data set, the server is able to perform a faster restore.  However, each incremental data set is bigger and takes longer to complete than the previous data set because it stores all changes that were included in the previous incremental.

      Please note when doing “Incremental since last full”:-

      - Create a new incremental backup directory for each incremental backup

      - Call database-incremental-backup with incremental-dir set to the new incremental backup directory

       

      Incremental since last incremental.  In this case, a new incremental stores only changes since the last incremental, also known as delta backups. By storing only the changes since the last incremental, the incremental backup sets are smaller in size and are faster to complete.  However, a restore operation would have to go through multiple data sets.

      Please note when doing “Incremental since last incremental”:-

      - Create an incremental backup directory ONCE

      - Call database-incremental-backup with the same incremental backup directory.

      See also the documentation on Incremental Backup.

       

       

      Topic FAQ Link

      General Questions 

      General Questions

      MarkLogic FAQs - General Questions

      How do I work with MarkLogic Support?

      How to work with MarkLogic Support FAQ

      Customer Success

      MarkLogic FAQs - Customer Success

      Training and Community

      MarkLogic FAQs - Training & Community

      Product Support Matrix

      Product Support/Compatibility Matrix

      MarkLogic Server Fundamentals

      MarkLogic Fundamentals

      MarkLogic Support FAQ

      MarkLogic Server

      MarkLogic FAQs - MarkLogic Server

      Data Ingestion

      MarkLogic Content Pump (MLCP) FAQ

      Upgrades

      MarkLogic Upgrade FAQ

      Common Error Messages

      Common Error Messages FAQ

      Database Replication

      Database Replication/Disaster Recovery FAQ

      Backup/Restore

      MarkLogic Backup/Restore FAQ

      Local Disk Failover

      Local Disk Failover FAQ

      Search

      Search FAQ

      Template Driven Extraction (TDE)

      Template Driven Extraction FAQ

      Semantics

      Semantics FAQ

      Hadoop

      Hadoop FAQ

      Geospatial Double Precision

      Geospatial Double Precision FAQ

      Geospatial Region Search

      Geospatial Region Search FAQ

      MarkLogic on Cloud

      MarkLogic on Amazon Web Services (AWS)

      MarkLogic on AWS FAQ

      Data Hub

      Data Hub Support FAQ

      Data Hub Support FAQ

      Data Hub 

      Data Hub FAQ

      Data Hub Service

      MarkLogic FAQs - Data Hub Service

      Indexing Best Practices

      MarkLogic Server indexes records (or documents/fragments) on ingest. When a database's index configuration is changed, the server will consequently reindex all matching records.

      Indexing and reindexing can be a CPU and I/O intensive operation. Reindexing creates a lot of new fragments, with the original fragments being marked for deletion. These deleted fragments will then need to be merged out. All of this activity can potentially affect query performance, especially in systems with under-provisioned hardware.

      Reindexing in Production

      If you need to add or modify an index on a production cluster, consider scheduling the reindex during a time when your cluster is less busy. If your database is too large to completely reindex during a single period of low usage, consider running the reindex over several periods of time. For example, if your low usage period is during a weekend, the process may look like:

      • Change your index configuration on a Friday night
      • Let the reindex run for most of the weekend
      • To pause the reindex, set the reindexer-enable field to 'false' for the database being reindexed. Be sure to allow sufficient time for the associated merging to complete before system load comes back.
      • If needed, reindexing can continue over the next weekend - the reindexer process will pick up where it left off before it was disabled.

      You can refer to https://help.marklogic.com/Knowledgebase/Article/View/18/15/how-reindexing-works-and-its-impact-on-performance for more details on invoking reindexing on production.

            When you have Database Replication Configured

      If you have to add or modify indexes on a database which has database replication configured, make sure the same changes are made on the Replica cluster as  well. Starting with ML server version 9.0-7, index data is also replicated from the Master to the Replica, but it does not automatically check if both sides have the same index settings. Reindexing is disabled by default on a replica cluster. However, when database replication configuration is removed (such as after a disaster),  the replica database will reindex as necessary. So it is important that the Replica database index configuration matches the Master’s to avoid unnecessary reindexing.

      Further reading -

      Master and Replica Database Index Settings

      Database Replication - Indexing on Replica Explained

      Avoid Unused Range Indexes, Fields, and Path Indexes

      In addition to taking up extra disk space, Range, Field, and Path Indexes require extra work when it's time to reindex. Field and Path indexes may also require extra indexing passes.

      Avoid Using Namespaces to Implement Multi-Tenancy

      It's a common use case to want to create some kind of partition (or multiple partitions) between documents in a particular database. In such a scenario it's far better to 1) constrain the partitioning information to a particular element in a document (then include a clause over that element in your searches), than it is to 2) attempt to manage partitions via unique element namespaces corresponding to each partition. For example, given two documents in two different partitions, you'll want them to look like this:

      1a. <doc><partition>partition1</partition><name>Joe Smith</name></doc>

      1b. <doc><partition>partition2</partition><name>John Smith</name></doc>

      ...vs. something like this:

      2a. <doc xmlns:p="http://partition1"><p:name>Joe Smith</p:name></doc>

      2b. <doc xmlns:p="http://partition2"><p:name>John Smith</p:name></doc>

      Why is #1 better? In terms of searching the data once it's indexed, there's actually not much of a difference - one could easily create searches to accommodate both approaches. The issue is how the indexing works in practice. MarkLogic Server indexes all content on ingest. In scenario #2, every time a new partition is created, a new range element index needs to defined in the Admin UI, which means your index settings have changed, which means the server now needs to reindex all of your content - not just the documents corresponding to the newly introduced partition. In contrast, for scenario #1, all that would need to be done is to ingest the documents corresponding to the new partition, which would then be indexed just like all the other existing content. There would be a need, however, to change the searches in scenario #1, as they would not yet include a clause to accommodate the new partition (for example: cts:element-value-query(xs:QName("partition"), "partition2")) - but the overall impact of adding a partition is changing the searches in scenario #1, which is ultimately far, far less intrusive a change than reindexing your entire database as would be required in scenario #2. Note that in addition to a database-wide reindex, searches would also need to change in scenario #2, as well.

      Keep an Eye on I/O Throughput

      Reindexing can lead to heavy merge activity and may lead to disk I/O bottlenecks if not managed carefully. If you have a system that is available 24-7 with no downtime window, then you may need to throttle the reindexer in order to keep the disk I/O to a minimum. We suggest the following database settings for reindexing a system that must always remain in use:

      • reindexer-throttle = 3
      • large-size-threshold = 1048576

      You can also adjust the following group settings to help limit background I/O:

      • background-io-limit = 100

      This will limit the background I/O for that group to 100 MB/sec per host across all hosts in that group. This should only be configured if merges are causing problems—it is a way of throttling back the I/O used by the merging process.This is good starting point, and may be increased in increments of 50 if you find that your merges are progressing too slowly.  Proceed with caution as too low of a background IO limit can have negative performance or even catastrophic consequences

      General Recommendations

      In general, your indexing/reindexing and subsequent search experience will be better if you

      Summary

      MarkLogic Admin GUI is convenient place to deploy the Normal Certificate infrastructure or use the Temporary Certificate generated by MarkLogic. However for certain advance solutions/deployment we need XQuery based admin operations to configure MarkLogic.

      This knowledgebase discusses the solution to deploy SAN or Wildcard Certificate in 3 node (or more) cluster.

       

      Certificate Types and MarkLogic Default Config

      Certificate Types

      In general, When browsers connect to a Server using HTTPS, they check to make sure your SSL Certificate matches the host name in the address bar. There are three ways for browsers to find a match:

      a).The host name (in the address bar) exactly matches the Common Name in the certificate's Subject.

      b).The host name matches a Wildcard Common Name. Please find example at end of article. 

      c).The host name is listed in the Subject Alternative Name (SAN) field as part of X509v3 extensions. Please find example at end of article.

      The most common form of SSL name matching is for the SSL client to compare the server name it connected to with the Common Name (CN field) in the server's Certificate. It's a safe bet that all SSL clients will support exact common name matching.

      MarkLogic allows this common scenario (a) to be configured from Admin GUI, and we will discuss the Certificate featuring (b) and (c) deployment further.

      Default Admin GUI based Configuration 

      By default, MarkLogic generates Temporary Certificate for all the nodes in the group for current cluster when Template is assigned to MarkLogic Server ( Exception is when Template assignment is done through XQuery ).

      The Temporary Certificate generated for each node do have hostname as CN field for their respective Temporary Certificate - designed for common Secnario (a).

      We have two path to install CA signed Certificate in MarkLogic

      1) Generate Certificate request, get it signed by CA, import through Admin GUI

      or 2) Generate Certificate request + Private Key outside of MarkLogic, get Certificate request signed by CA, import Signed Cert + Private Key using Admin script

      Problem Scenario

      In both of the above cases, while Installing/importing Signed Certificate, MarkLogic will look to replace Temporary Certificate by comparing CN field of Installed Certificate with Temporary Certificaet CN field.

      Now, if we have WildCard Certificate (b) or SAN Certificate (c), our Signed Certificate's CN field will never match Temporary Certificate CN field, hence MarkLogic will Not remove Temporary Certificates - MarkLogic will continue using Temporary Certificate.

       

      Solution

      After installing SAN or wildcard Certificate, we may run into AppServer which still uses Temporary installed Certificate ( which was not replaced while installing SAN/wild-card Certificate).

      Use below XQuery against Security DB to remove all Temporary Certificates. XQuery needs uri lexicon to be enabled (default enabled). [Please change the Certificate Template-Name in below XQuery to reflect values from your environment.] 

      xquery version "1.0-ml";
      
      import module namespace pki = "http://marklogic.com/xdmp/pki"  at "/MarkLogic/pki.xqy";
      import module namespace admin = "http://marklogic.com/xdmp/admin"  at "/MarkLogic/admin.xqy";
            
      
      let $hostIdList := let $config := admin:get-configuration()
                         return admin:get-host-ids($config)
                           
      for $hostid in $hostIdList
      return
        (: FDQN name matching Certificate CN field value :)
        let $fdqn := "TestDomain.com"
      
        (: Change to your Template Name string :)
        let $templateid := pki:template-get-id(pki:get-template-by-name("YourTemplateName"))
      
        for $i in cts:uris()
        where 
        (   (: locate Cert file with Public Key :)
            fn:doc($i)//pki:template-id=$templateid 
            and fn:doc($i)//pki:authority=fn:false()
            and fn:doc($i)//pki:host-name=$fdqn
        )
        return <h1> Cert File - {$i} .. inserting host-id {$hostid}
        {xdmp:node-insert-child(doc($i)/pki:certificate, <pki:host-id>{$hostid}</pki:host-id>)}
        {
            (: extract cert-id :)
            let $certid := fn:doc($i)//pki:certificate/pki:certificate-id
            for $j in cts:uris()
            where 
            (
                (: locate Cert file with Private key :)
                fn:doc($j)//pki:certificate-private-key/pki:template-id=$templateid 
                and fn:doc($j)//pki:certificate-private-key/pki:certificate-id=$certid
            )
            return <h2> Cert Key File - {$j}
            {xdmp:node-insert-child(doc($j)/pki:certificate-private-key,
              <pki:host-id>{$hostid}</pki:host-id>)}
            </h2>
        } </h1>
      

      Above will remove all Temporary Certificates (including Template CA) and their private-key, leaving only Installed Certificate associated with Template, forcing all nodes to use Installed Certificate. 

       

      Example: SAN (Subject Alternative Name) Certificate

      For 3 node cluster (engrlab-128-101.engrlab.marklogic.com, engrlab-128-164.engrlab.marklogic.com, engrlab-128-130.engrlab.marklogic.com)

      $ opensl x509 -in ML.pem -text -noout
      Certificate:
          Data:
              Version: 3 (0x2)
              Serial Number: 9 (0x9)
              Signature Algorithm: sha1WithRSAEncryption
              Issuer: C=US, ST=NY, L=NewYork, O=MarkLogic, OU=Engineering, CN=Support CA
              Validity
                  Not Before: Apr 20 19:50:51 2016 GMT
                  Not After : Jun  6 19:50:51 2018 GMT
              Subject: C=US, ST=NJ, L=Princeton, O=MarkLogic, OU=Eng, CN=TestDomain.com
              Subject Public Key Info:
                  Public Key Algorithm: rsaEncryption
                  RSA Public Key: (1024 bit)
                      Modulus (1024 bit):
                          00:97:8e:96:73:16:4a:cd:99:a8:6a:78:5e:cb:12:
                          5d:e5:36:42:d2:b8:52:51:53:6c:cf:ab:e4:c6:37:
                          2c:15:12:80:c1:1b:53:29:4c:52:76:84:80:1d:ee:
                          16:41:a6:31:c5:7b:0d:ca:d7:e5:da:d7:67:fe:80:
                          89:9f:0d:bc:46:4f:f0:7e:46:88:26:d5:a0:24:a6:
                          06:d1:fa:c0:c7:a2:f2:11:7f:5b:d5:8d:47:94:a8:
                          06:d9:46:8f:af:dd:31:d5:15:d2:7a:13:39:3e:81:
                          32:bd:5c:bd:62:9d:5a:98:1d:20:0e:30:d4:57:3f:
                          7f:89:e6:20:ae:88:4d:85:d7
                      Exponent: 65537 (0x10001)
              X509v3 extensions:
                  X509v3 Key Usage: 
                      Key Encipherment, Data Encipherment
                  X509v3 Extended Key Usage: 
                      TLS Web Server Authentication
                  X509v3 Subject Alternative Name: 
                      DNS:engrlab-128-101.engrlab.marklogic.com, DNS:engrlab-128-164.engrlab.marklogic.com, DNS:engrlab-128-130.engrlab.marklogic.com
          Signature Algorithm: sha1WithRSAEncryption
              52:68:6d:32:70:35:88:1b:70:df:3a:56:f6:8a:c9:a0:9d:5c:
              32:88:30:f4:cc:45:29:7d:b5:35:18:a0:9a:45:37:e9:22:d1:
              c5:50:1d:50:b8:20:87:60:9b:c1:d6:a8:0c:5a:f2:c0:68:8d:
              b9:5d:02:10:39:40:b3:e5:f6:ae:f3:90:31:57:4c:e0:7f:31:
              e2:79:e6:a8:c0:e6:3f:ea:c5:75:67:3e:cd:ea:88:5d:60:d6:
              01:59:3c:dc:e0:47:96:3b:59:4a:13:85:bb:87:70:d0:a2:6b:
              0f:d4:84:1d:d1:be:e8:a5:67:c3:e3:59:05:0d:5d:a5:86:e6:
              e4:9e

      Example: Wild-Card Certificate

      For 3 node cluster (engrlab-128-101.engrlab.marklogic.com, engrlab-128-164.engrlab.marklogic.com, engrlab-128-130.engrlab.marklogic.com). 

      $ openssl x509 -in ML-wildcard.pem -text -noout
      Certificate:
          Data:
              Version: 1 (0x0)
              Serial Number: 7 (0x7)
              Signature Algorithm: sha1WithRSAEncryption
              Issuer: C=US, ST=NY, L=NewYork, O=MarkLogic, OU=Engineering, CN=Support CA
              Validity
                  Not Before: Apr 24 17:36:09 2016 GMT
                  Not After : Jun 10 17:36:09 2018 GMT
              Subject: C=US, ST=NJ, L=Princeton, O=MarkLogic Corporation, OU=Engineering Support, CN=*.engrlab.marklogic.com
       

      Introduction

      Okta provides secure identity management and single sign-on to any application, whether in the cloud, on-premises or on a mobile device.

      The following procedure describes the procedure required to integrate MarkLogic with Okta identity management and Microsoft Windows Active Directory using the Okta AD Agent.

      This document assumes that the users accessing MarkLogic are defined in the Windows Active Directory only and do not currently have Okta User Profiles defined.

      Authentication Flow

       The authentication flow in this scenario will be as follows:

      1. The user opens a Browser connection to the Site Single Sign-On Portal page.
      2. The user enters their Active Directory credentials
      3. Okta verifies the user credentials using the Okta LDAP Agent
      4. If successful, the user is presented with a selection of applications they can sign-on to.
      5. The user selects the required application and Okta completes the sign-on using the stored user credentials.

      Requirements

      • MarkLogic Server version 8 or 9
      • Okta Admin account access
      • Okta AD Agent
      • Active Directory Server

      For the purpose of this document the following Active Directory user entry will be used as an example:

      # LDAPv3
      # base <dc=MarkLogic,dc=Local> with scope subtree
      # filter: (sAMAccountName=martin.warnes)
      # requesting: *
      #
      
      # Martin Warnes, Users, marklogic.local
      dn: CN=Martin Warnes,CN=Users,DC=marklogic,DC=local
      objectClass: top
      objectClass: person
      objectClass: organizationalPerson
      objectClass: user
      cn: Martin Warnes
      sn: Warnes
      givenName: Martin
      distinguishedName: CN=Martin Warnes,CN=Users,DC=marklogic,DC=local
      sAMAccountName: martin.warnes
      memberOf: CN=mladmins,CN=Users,DC=marklogic,DC=local
      sAMAccountType: 805306368
      userPrincipalName: martin.warnes@marklogic.local

      Notes

      1. By default, Okta uses the email address as the username, however, MarkLogic usernames cannot contain certain special characters such as the @ symbol so the sAMAccountName will be used to sign-on on to MarkLogic. This will be configured later during the Okta Application definition.
      2. One or more memberOf attributes should be assigned to the Active Directory user entry and these will be used to assign MarkLogic Roles without requiring the need to configure duplicate user entries in the MarkLogic security database.

      Step 1. Create a MarkLogic External Security definition

       An External Security definition is required to authenticate and authorize Okta users against a Microsoft Windows Active Directory server.

       Full details on configuring an external security definition can be found at:

       https://docs.marklogic.com/8.0/guide/security/external-auth

       You should ensure that both “authentication” and “authorization” are set to “ldap”, for details on the remaining settings you should consult your Active Directory administrator.

      Step 2. Assign Active Directory group membership to MarkLogic Roles

      In order to assign the correct Roles and Permission to Okta users, you will need to map Active Directory memberOf attributes to MarkLogic rolls.

      In my example Active Directory user entry martin.warnes belongs to the following Group:

       memberOf: CN=mladmins,CN=Users,DC=marklogic,DC=local

      To ensure that all members of this Group are assigned MarkLogic Admin roles you simply need to add the memberOf attribute value as an external name in the admin role as below:

      Step 3. Configure the MarkLogic AppServer

      For each App Server that you wish to integrate with Okta, you will need to set the “authentication” to “basic” and select the “external security” definition.

      As HTTP Basic Authentication is considered insecure it is highly recommended that you secure the AppServer connection using HTTPS by configuring and selecting a “SSL certificate template”.

       Further details on configuring SSL for AppServers can be found at:

       https://docs.marklogic.com/8.0/guide/admin/SSL

      Step 4. Install and Configure Okta AD Integration

      In order for Okta to authenticate your Active Directory users, you will first need to download and install the Okta AD Agent using the following instructions supplied by Okta

      https://support.okta.com/help/Documentation/Knowledge_Article/Install-and-Configure-the-Okta-Active-Directory-Agent-1689483166

       Once installed your Okta Administrator will be able to complete the AD Agent configuration to select which AD users to import into Okta.

      Step 5. Create Okta MarkLogic application

      From the Okta Administrator select “Add Application”, search for the Basic Authentication template and click “Add

      On the “General Settings” tab, enter the MarkLogic AppServer URL, ensure to use HTTP or HTTPS depending on whether you have chosen to secure the listening port using TLS.

       Check the “Browser plugin auto-submit” option.

      On the Sign-On options panel select “Administrator sets username, password is the same as user’s Okta password

       For “Application username format” select “AD SAM Account name” from the drop-down selection.

      Once the Okta application is created you should assign the users permitted to access the application

      When assigning a user, you will be prompted to check the AD Credentials, at this point you should just check that Okta has selected the correct "sAMAccountName" value, the password will not be modifiable.

      Repeat Step 5. for each AppServer you wish to access via the Okta SSO portal.

      Step 6. Sign-on to Okta SSO Portal

      All assigned MarkLogic applications should be shown:

      Selecting one of the MarkLogic applications should automatically log you in using your AD Credentials stored within Okta.

      Additional Reading

      Introduction

      MarkLogic Server provides pre-commit or post-commit triggers and these triggers listen for certain events to occur and then invokes a configured XQuery module to run after the event occurs. It is a common use case to create a common function in a library module which is shared among different trigger modules called by various triggers. This article shows an example to create and use such a shared library module in a post-commit trigger.

      Example

      This example shows a simple post commit trigger that fires when a new document is created.

      1. For this example create a database 'minidb' and after that set its triggers database as self (minidb). Also, create another database 'minimodule' to store all modules.

      2. Using Query Console, create a trigger using trigger definition by evaluating below XQuery against triggers database (minidb):

      TriggersExample_Triggers_Definition.xqy

      3. Create a module by running below XQuery against modules database:

      TriggerExample_Trigger_Module.xqy

      4. Insert a library module into the modules database (minimodules):

      TriggerExample_Library_Module.xqy

      5. Now insert the sample document into the content database (minidb):

      TriggerExample_Simple_Insert.xqy

      6. Check the output in logs:

      After a new document having its URI prefixed with "/mini" is inserted into the content database, the TaskServer Logs file logs the below message:

      2018-04-25 11:40:50.224 Info: *****Document with /mini root /mini/test-25-1-1.xml was created.*****2018-04-25T11:40:50+05:30

      NOTE: Module imports are relative to root.

      References:

      1. Creating and Managing Triggers With triggers.xqy - https://docs.marklogic.com/guide/app-dev/triggers

      Introduction

      We are always looking for ways to understand and address performance issues within the product and we are addressing this by adding the following new diagnostic features to the product.

      New Trace Events in MarkLogic Server

      Some new diagnostic trace events have been added to MarkLogic Server:

      • Background Time Statistics - Background thread period and further processing timings are added to xdmp:host-status() output if this trace event is set.
      • Journal Lag 30 - A forest will now log a warning message if a frame takes more than 30 seconds to journal.
        • Please note that this limit can be adjusted down by setting the Journal Lag # trace event (where # is {1, 2, 5 or 10} seconds).
      • Canary Thread 10 - A new "canary thread" that does nothing but sleep for a second and check how long is was since it went to sleep.
        • It will log messages if the interval between sleeping has exceeded 10 seconds.
        • This can be adjusted down by setting the Canary Thread # trace event (where # is {1, 2, 5 or 10} seconds).
      • Canary Thread Histogram - Adding this trace event will cause MarkLogic to write to the ErrorLog a histogram of timings once every 10 minutes.
      • Forest Fast Query Lag 10 - By default, a forest will now warn if the fast query timestamp is lagging by more than 30 seconds.
        • This can be adjusted down by setting the Forest Fast Query Lag # (where # is {1, 2, 5, or 10} seconds).
        • Note that Warning level messages will be repeatedly logged at intervals while the lag limit is exceeded, with the time between logged messages doubling until it reaches 60 seconds.
        • There will be a final warning when the lag drops below the limit again as a way to bracket the period of lag.

      Examples of some of new statistics can be viewed in the Admin UI by going to the following URL in a browser (replacing hostname with the name of a node in your cluster and replacing TheDatabase with the name of the database that you would like to monitor):

      You can clear the forest insert and journal statistics by adding clear=true to your request; executing the following in a browser:

      These changes now feature in the current releases of both MarkLogic 7 and MarkLogic 8 and are available for download from our developer website:

      Hints for interpreting new diagnostic pages

      Here's some further detail on what the numbers mean.

      First, a note about how bucketing is performed on these diagnostic pages:

      For each operation category (e.g. Timestamp Wait, Semaphore, Disk), the wait time will fall into a range of values, which need to be bucketed.

      The bucketing algorithm starts with 1000 buckets to cover the whole range, but then collapses them into a small set of buckets that cover the whole span of values. The algorithm aims to

      1. End up with a small number of buckets

      2. Include extreme (out-liers) values

      3. Spread out multiple values so that they are not too "bunched-up" and are therefore easier to interpret.

      Forest Journal Statistics (http://hostname:8001/forest-journal-statistics.xqy?database=TheDatabase)

      When we journal a frame, there are a sequence of operations.

      1. Wait on a semaphore to get access to the journal.
      2. Write to the journal buffer (possibly waiting for I/O if exceeding the 512k buffer)
      3. Send the frame to replica forests
      4. Send the frame to journal archive/database replica forests
      5. Release the semaphore so other threads can access the journal
      6. Wait for everything above to complete, if needed.
        1. If it's a synchronous op (e.g. prepare, commit, fast query timestamp), we wait for disk I/O
        2. If there are replica forests, we wait for them to acknowledge that they have journaled and replayed.
        3. If the journal archive or database replica is lagged, wait for it to no longer be lagged.

      We note the wall clock time before/after these various options, so we can track how long they're taking.

      On the replica side, we also measure the "Journal Replay" time which would be inserting into the in-memory stand, committing, etc.

      Here's an example for a master and its replica.

      Forest F-1-1

      Timestamp Wait
      Bucket (ms)Count%CumulativeCumulative %
      0..9 280 99.64 280 99.64
      50..59 1 0.36 281 100.00
      Semaphore
      Bucket (ms)Count%CumulativeCumulative %
      0..9 816 100.00 816 100.00
      Disk
      Bucket (ms)Count%CumulativeCumulative %
      0..9 204 99.51 204 99.51
      10..19 1 0.49 205 100.00
      Local-Disk Replication
      Bucket (ms)Count%CumulativeCumulative %
      0..9 804 99.26 804 99.26
      10..119 6 0.74 810 100.00
      Journal Archive

      No Information

      Database Replication

      No Information

      Journal Total
      Bucket (ms)Count%CumulativeCumulative %
      0..9 810 99.26 810 99.26
      10..119 6 0.74 816 100.00
      Journal Replay

      No Information

      Forest F-1-1-R

      Timestamp Wait

      No Information

      Semaphore
      Bucket (ms)Count%CumulativeCumulative %
      0..9 811 100.00 811 100.00
      Disk
      Bucket (ms)Count%CumulativeCumulative %
      0..9 203 99.02 203 99.02
      10..59 2 0.98 205 100.00
      Local-Disk Replication

      No Information

      Journal Archive

      No Information

      Database Replication

      No Information

      Journal Total
      Bucket (ms)Count%CumulativeCumulative %
      0..9 809 99.75 809 99.75
      10..59 2 0.25 811 100.00
      Journal Replay
      Bucket (ms)Count%CumulativeCumulative %
      0..9 807 99.63 807 99.63
      10..119 3 0.37 810 100.00

      Forest Insert Statistics (http://hostname:8001/forest-insert-statistics.xqy?database=TheDatabase)

      When we're inserting a fragment into an in-memory stand, we also have a sequence of operations.

      1. Wait on a semaphore to get access to the in-memory stand.
      2. Wait on the insert throttle (e.g. if there are too may stands)
      3. Wait for the stand's journal semaphore, to serialize with the previous insert if needed.
      4. Release the stand insert semaphore.
      5. Journal the insert.
      6. Release the stand journal semaphore.
      7. Start the checkpoint task if the stand is full.

      As with the journal statistics, we note the wall clock time between these operations so we can track how long they're taking.

      On the replica side, the behavior is similar, although the journal and insert are in reverse order (we journal before inserting into the in-memory stand). If it's a database replica forest, we also have to regenerate the index information (Filled IPD).

      Here is a example for a master and its replica.

      Forest F-1-1

      Journal Throttle
      Bucket (ms)Count%CumulativeCumulative %
      0..9 606 100.00 606 100.00
      Insert Sem
      Bucket (ms)Count%CumulativeCumulative %
      0..9 604 99.67 604 99.67
      80..199 2 0.33 606 100.00
      Filled IPD

      No Information

      Stand Throttle
      Bucket (ms)Count%CumulativeCumulative %
      0..9 606 100.00 606 100.00
      Stand Insert
      Bucket (ms)Count%CumulativeCumulative %
      0..9 605 99.84 605 99.84
      100..109 1 0.17 606 100.00
      Journal Sem
      Bucket (ms)Count%CumulativeCumulative %
      0..9 604 99.67 604 99.67
      10..119 2 0.33 606 100.00
      Journal
      Bucket (ms)Count%CumulativeCumulative %
      0..9 603 99.50 603 99.50
      10..119 3 0.50 606 100.00
      Total
      Bucket (ms)Count%CumulativeCumulative %
      0..9 597 98.51 597 98.51
      10..19 6 0.99 603 99.50
      200..229 3 0.50 606 100.00

      Forest F-1-1-R

      Journal Throttle

      No Information

      Insert Sem
      Bucket (ms)Count%CumulativeCumulative %
      0..9 606 100.00 606 100.00
      Filled IPD

      No Information

      Stand Throttle
      Bucket (ms)Count%CumulativeCumulative %
      0..9 606 100.00 606 100.00
      Stand Insert
      Bucket (ms)Count%CumulativeCumulative %
      0..9 605 99.84 605 99.84
      110..119 1 0.17 606 100.00
      Journal Sem
      Bucket (ms)Count%CumulativeCumulative %
      0..9 606 100.00 606 100.00
      Journal

      No Information

      Total
      Bucket (ms)Count%CumulativeCumulative %
      0..9 605 99.84 605 99.84
      110..119 1 0.17 606 100.00

      Further reading

      To learn more about diagnostic trace events, please refer to our documentation and Knowledgebase articles and note that some trace events may only log information if logging is set to debug:

      Data Hub Framework allows you to model your data according to business entities. And Template Driven Extraction (TDE) allows you to view these entities through a relational or a semantic lens. With Data Hub Framework (DHF), TDE Templates are now created automatically so you can create data as rows using SQL or Optic API (see this video for more information). Template Driven Extraction feature has been available in MarkLogic for a while now whereas the DHF Generated TDE feature came out in DHF 4.

      Recently, we have been receiving reports of a couple of issues with respect to the DHF generated TDE feature and we are currently working on investigating and resolving those issues. Although this feature is fully functional for the most part, while our investigation is in progress, if you are seeing issues with your DHF generated TDE feature, our recommendation is to consider the DHS generated TDE as an example only and based on that, create your own TDE in the meantime to be able to handle the queries that you would like to run.

      Helpful resources:

      Summary

      The jemalloc library is included with the MarkLogic install and is recommended to use as it has shown a performance boost over the default Linux malloc library.  It is included with the MarkLogic server install and is configured to be used by default. 

      There have been cases where even if configured, the library is not used.  This article will give possible solutions to debug that.

      Diagnostics

      ErrorLog message on startup if jemalloc is not allocated:

      Warning: Memory allocator is not jemalloc; check /etc/sysconfig/MarkLogic

      Solutions

      1) Make sure to use superuser shell or sudo and run the 'service MarkLogic restart'

      2) Verify that the jemalloc library is present in the install directory (ie /opt/MarkLogic/lib/libjemalloc.so.1).

      3) Has the /etc/sysconfig/MarkLogic configuration file been modified from the default?  Try setting the configuration file back to the default and restarting the server.

      4) Confirm that /etc/sysconfig/MarkLogic contain the following lines:
      # preload jemalloc
      if [ -e $MARKLOGIC_INSTALL_DIR/lib/libjemalloc.so.1 ]; then
         export LD_PRELOAD=$MARKLOGIC_INSTALL_DIR/lib/libjemalloc.so.1
      fi

      Details

      For more information on the jemalloc library, please review the article provided by Facebook Engineering

      https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919/

      Introduction

      This article compares JSON support in MarkLogic Server versions 6, 7, and 8, and the upgrade path for JSON in the database.

      How is native JSON different than the previous JSON support?

      Previous versions of MarkLogic Server provided XQuery APIs that converted between JSON and XML. This translation is lossy in the general case meaning developers were forced to make compromises on either or both ends of the transformation. Even though the transformation was implemented in C++ it still added significant overhead to ingestion. All of these issues go away with JSON as a native document format. 

      How do I upgrade my JSON façade data to native JSON?

      For applications that use the previous JSON translation façade (for example: through the Java or REST Client APIs), MarkLogic 8 comes with sample migration scripts to convert JSON stored as XML into native JSON.

      The migration script will upgrade a database’s content and configuration from the XML format that was used in MarkLogic 6 and 7 to represent data to native JSON, specifically converting documents in the http://marklogic.com/xdmp/json/basic namespace.
       
      If you are using the MarkLogic 7 JSON support, you will also need to migrate your code to use the native JSON support. The resulting application code is expected to be more efficient, but it will require application developers to make minor code changes to your application.
       
      See also:
       
      Version 8 JSON incompatibilities
       

      Introduction

      MarkLogic Server provides a couple of useful techniques for keeping values in memory or resolving values without having to scan for documents on-disk.

      Options

      There are a few options available:

      1. cts:element-values performs a lexicon lookup so it's directly getting those values from the range indexes; you can add an options node and use the "map" parameter to get the call to return a map directly as per the documentation, which may give you what you need without having to do any further work.

      See: http://docs.marklogic.com/cts:element-values

      2. Storing a map as a server field is a popular approach and is widely used for storing data that needs to be accessed routinely by queries.

      Bear in mind that there is a catch to this approach as the map is not available to all nodes in a cluster - it is only available to the node responsible for evaluating the original request, so if you're using this technique in a clustered environment, the results may not be what is expected.

      Also note that if you're planning on storing a large number of maps in server fields on nodes on the cluster, it's important to make sure the hosts are provisioned with enough memory to accommodate these maps on top of group level caches and memory for query allocation, stands, range indexes document retrieval and the like.

      See: http://docs.marklogic.com/map:map

      And: http://docs.marklogic.com/xdmp:set-server-field

      3. xdmp:set only allows you to set a value for the life of a single query but this technique can be useful in some circumstances - especially in situations where you're interested in keeping track of certain values throughout the processing of a module or a function within a module.

      See: http://docs.marklogic.com/xdmp:set

      4. If you have a situation where you have a large number of complex queries - particularly ones where lexicon lookups or calls to range indexes won't resolve the data you need and where lots of documents will need to be retrieved from disk, you should consider using registered queries.

      See: http://docs.marklogic.com/cts:registered-query

      Note that registered queries utilise the List Cache so, if you plan to adopt this method, we recommend careful testing to ensure your caches are sized sufficiently to suit the needs of your application.

      Summary

      This article explains how to kill Long Running Query and related timeout configurations.

      Problem Scenario

      At some point, we've all run into an inefficient long running query. What should we do if we don't want to wait for the query to complete? If we cancel the browser request, that would end the connection, but it wouldn't end the program invocation (called a "request") on the MarkLogic Server side. On the server side, that program invocation would continue to run until the execution is complete.

      Most of the time, this isn't really an issue. The server, of course, is multi-threaded, handling many concurrent transactions. We can just cancel the browser request, move on, and let the query finish when it finishes. However, sometimes it becomes necessary to free up server resources by killing the query and starting over. To do this, we need access to the Admin interface. 

      Sample Long running Query 

      Example only, please don't try this on any production machines!

      for $x in 1 to 1000000
      return collection()[1 + xdmp:random(1000)]
       
      This query is asking for 1,000,000 random documents, and will take a long time to execute. How can we cancel this query?

      How to Cancel/Kill the Query

      Go to the Administrative interface (at http://localhost:8001/ if you're running MarkLogic locally). At the top of the screen, you'll see a tab labeled "Status." Click that:

      screenshot1.jpg

      This will take you to the "System Status" screen. This page reveals status information about hosts, databases, forests, and app servers. The App Server section is what we're concerned with. Scanning down the "Queries" column, we see that the "Admin" server is processing a query (namely, the one that generated the page we see). Everything looks okay so far. But just below that, we see that the "App-Services" server is just over 3 minutes into processing a query. That's our slow one. Query Console runs on the "App-Services" app server, which explains why we see it there. Go ahead and click the "App-Services" link:

      screenshot2.jpg

      This takes us to the "App-Services" status page. So far, there's still no "cancel" button. One more click will reveal it ("show more"):

      screenshot3.jpg

      We can now see an individual entry for the currently running query. Here we see it's called "eval.xqy"; that's the query module that Query Console invokes when you submit a query. If you were running your own query module (instead of using Query Console), then you would see its name here instead. To cancel the query, click the "[cancel]" link:

      screenshot4.jpg

      One more click (on the confirmation page).

      screenshot5.jpg

      This takes us back to the status page, where we see MarkLogic Server is in the process of canceling our query:

      screenshot6.jpg

      Above page will continue to say "cancelling..." even though query is already killed and no longer exist till we refresh the page.

      A quick refresh of the above page shows that the query is no longer present.

      screenshot7.jpg

       

      What happens if you forget to cancel a query?

      MarkLogic will continue to execute the query until a time limit is reached, at which point the Server will cancel the query for you. For example, here's what Query Console eventually returns back if we don't bother to cancel the query:

      screenshot8.jpg

      How long is this time limit?

      This depends on your server configuration. We can actually set the timeout in the query itself, using the xdmp:set-request-time-limit() function, but even that will be limited by your server's "max time limit."

      For example, on the "Configure" tab of my "App-Services" app server, you can see that the "default time limit" is set to 10 minutes (600 seconds), and the longest any query can allow itself to run (by setting its own request time limit) is one hour (3600 seconds):

      screenshot9.jpg

       

      Update and delete operations can be performance intensive and have negative effects on search performance when done in a conventional way, where data is updated or deleted in-place. To avoid these performance impacts during update and delete operations, MarkLogic Server updates and deletes "lazily."

      In MarkLogic Server, when you delete a document, it is not removed from disk immediately as that document's fragments are instead marked as "obsolete." Marking a document as obsolete tags its fragments for later removal, and also hides its fragments from subsequent query results. Updates happen in a similar way, where instead of updating in-place, MarkLogic Server marks the old versions of the fragments in an old stand as "obsolete" for later deletion, while also creating new versions of those fragments in a new stand (initially an in-memory stand, which is eventually written down as a new on-disk stand).

      Eventually, merges occur to move any unchanged fragments from an old stand into a new stand. Old fragments marked obsolete are ultimately deleted after the merge creating the new stand finishes, where the old stands that were used as input into that merge are finally removed from disk. Merging is very important - this is the mechanism by which MarkLogic Server both frees up disk space and optimizes its on-disk data structures, as well as reduces the number of fragments evaluated during its queries and searches.

      Note that for a merge-min-ratio of n, you can expect up to 1/(n+1) of a stand to be deleted fragments before the stand is automatically merged.  See Overview of the Merge Policy Controls.

      While lazy deletion results in faster updates and deletes, be aware that residual impacts can be seen in terms of both disk space and query performance if merges are not done in a timely manner.

      Further reading:

      Multi-Version Concurrency Control
      How do updates work in MarkLogic Server?
      ML Performance: Understanding System Resources

      Introduction

      MarkLogic Server allows you to configure MarkLogic Server so that users are authenticated using an external authentication protocol, such as Lightweight Directory Access Protocol (LDAP) or Kerberos. These external agents serve as centralized points of authentication or repositories for user information from which authorization decisions can be made. If, after following the configuration instructions in our documentation, the authentication does not work as expected, this article gives some additional debugging ideas.

      Details

      The following are areas should be checked when your LDAP Authentication is not working as expected:

      1. Verify that cyrus-sasl-md5 library is installed on MarkLogic Server node.

      2. Run the following LDAP search command to check if LDAP server is properly setup.

      ldapsearch -H ldap://{Your LDAP Serevr URI}:389 -x -s base

      a. Once you run the ldap search command, make sure digest-md5 is supported. 

      supportedSASLMechanisms: DIGEST-MD5

      b. Identify the correct LDAP Service name:

      e.g ldapServiceName: MLTEST1.LOCAL:dc1$@MLTEST1.LOCAL


      3. On Windows platforms, the services.keytab file is created using Active Directory Domain Services (AD DS) on a Windows server. If you are using Active Directory Domain Services (AD DS) on a computer that is running Windows Server 2008 or Windows Server 2008 R2, be sure that you have installed the hot fix described in http://support.microsoft.com/kb/975697.

      Introduction: the issue

      MarkLogic performs Nested lookups on the LDAP Groups assigned to a user to determine which roles the user will be assigned. If the groups belong to multiple Active Directory Domains within a federated Active Directory Forest then MarkLogic user authorization could fail with a subordinate Referral error, as seen below:

      2019-07-30 13:27:23.002 Notice: XDMP-LDAP: ldap_search_s failed on ldap server ldap://ad1.myhost.com:389: Referral (10)

      Cause

      MarkLogic has been configured to connect to the Local Domain Controller LDAP ports 389 (LDAP) or 636 (LDAPs), however, a Local Domain Controller can only search domains to which it has access.

      Example

      A user is a member of the following groups which belong to two separate Active Directory domains, subA, and subC.

      Using a Local Domain Controller for subA for external authorization would result in a login failure when attempting to perform the nested group lookup for the domain subC

      member=CN=Group Onw,OU=OrgUnitAGroups,OU=OrgUnitA,DC=subA,DC=domain
      member=CN=Group Two,OU=OrgUnitAGroups,OU=OrgUnitA,DC=subA,DC=domain
      member=CN=Group Three,OU=OrgUnitCGroups,OU=OrgUnitC,DC=subC,DC=domain

      Solution

      If you have multiple Active Directory Domains federated into an Active Directory forest you should use the Global Catalog port 3278 (LDAP) or 3279 (LDAPS) to prevent failures when searching for group memberships that are defined in other domains.

      Optional workaround

      A large number of nested groups can potentially lead to a decrease in login time performance, if you do not need to really on nested lookups to determine group membership for MarkLogic roles, i.e. all groups required are returned from the initial user search request then you should consider disabling setting the "ldap nested lookup" parameter to false in the External Security configuration.

      Doing this would also prevent subordinate domain searches and allow you to continue to use an Active Directory Domain Controller instead of switching to the Global Catalog.

      Further reading

      Summary

      A leap second, as defined by wikipedia is "a one-second adjustment that is occasionally applied to Coordinated Universal Time (UTC) in order to keep its time of day close to the mean solar time. Without such a correction, time reckoned by Earth's rotation drifts away from atomic time because of irregularities in the Earth's rate of rotation."  At the time of this writing, the next leap second to be inserted is on June 30, 2015 at 23:59:60 UTC.

      For systems that use the Network Time Protocol (NTP) to synchronize the network time across all the host in their MarkLogic Cluster, the Marklogic Server Software is not impacted by the leap second (i.e. we expect everything to work fine at the MarkLogic layer)

      For systems where the synchronization of their system clocks require UTC time to be set backwards, then anywhere time dependent data is stored, it must be accounted for. In this case, we recommend that our customers implement NTP in their environment.  Otherwise, the application layer will need to handle discontinuous time. 

      Transactional Consistency

      The algorithm that MarkLogic Server uses to maintain transactional consistency of data is not wall clock dependent and, as such, is not affected by the leap second.

      Network Time Protocol (NTP)

      NTP generally works really really hard not to make time go backwards as clock readings are constrained to always increase - every reading increases the NTP clock. NTP adjusts things gradually by slowing down or speeding up the clock and not by making discrete changes unless time is off by a lot. A second is not a lot.  An hour is a lot. Regardless of the leap second, adjustments for computer clock drift can easily be more than a second and happen frequently. 

      When Time Goes Backwards

      Without NTP and left on their own, computer clocks are really not that accurate. If synchronization of the system clocks on the hosts of a MarkLogic cluster require the clocks to be set backwards, then the application layer will need to account for and handle discontinuous date-time in their data. 

      Beginning with MarkLogic Server version 8,  the temporal feature was introduced.  If the system clock is adjusted backwards, there are conditions where temporal document inserts and updates will fail with an appropriate error code.  This is by design and expected.

      Our recommendation is to implement NTP on all hosts of a MarkLogic cluster to eliminate the need to handle discontinuous time at the application layer. 

      Further Reading

      Redhat article on the Leap Second - https://access.redhat.com/articles/15145 ;

      Microsoft Support article on the Leap Second - http://support.microsoft.com/kb/909614 ;

       

      Summary

      The internal mechanisms MarkLogic Server uses to implement security are query constraints. Lexicon search performance may be impacted by security query contraints.  If performed with admin credentials, Lexicon searches will not be impacted by the security query constraints.  

      Detail

      Query time grows proportionately with the number of matches from a given search across a set of documents (not the actual number of documents in your database). The presence of security constraints will contribute a significantly larger number of matches than if the same lexicon search was performed with admin credentials.  In order to minimize the number of matches (and therefore query time) for a given lexicon search, you'll want to amp your lexicon searches to an admin user.

      Summary

      If, as recommended in the Optic security advisory, you are not able to upgrade straight away, the following steps can be followed to disable the Optic query functionality. 

      Note: This will disable the ability to run all Optic and SPARQL queries so this can only be done if applications do not rely on those features.

      Solution

      The Optic and SPARQL query engines can be disabled via a script, or via the administration user interface.

      In both cases the sem:sparql privilege will be removed from all the relevant roles.

      Scripted privilege removal

      Run the script listed below to remove the sem:sparql privilege from all roles. The script removes the sem:sparql privilege from the four out-of-the-box roles, then prompts the user to remove the privilege from the custom roles, if any are found. Please make sure to take good note of the affected roles If you intend to re-enable the privilege after upgrading your deployment.

      xquery version "1.0-ml";
      import module namespace sec="http://marklogic.com/xdmp/security" at 
          "/MarkLogic/security.xqy";
      let $ootb-sem-sparql-roles:=  ("optic-reader-internal",
                                    "qconsole-internal",
                                    "rest-writer-internal",
                                    "rest-reader-internal")
      let $remove-privilege:="http://marklogic.com/xdmp/privileges/sem-sparql"
      return xdmp:invoke-function(function() {
          let $sem-sparql-priv:=sec:get-privilege($remove-privilege,"execute")  
          let $_ := if (fn:count($sem-sparql-priv) eq 0 ) then fn:error(xs:QName("PRIV-NOT-FOUND"),"sem-sparql privilege not found. Contact MarkLogic Support.") else ()
          let $_ := if (fn:count($sem-sparql-priv) gt 1 ) then fn:error(xs:QName("MULTIPLE-PRIVS-FOUND"),"Multiple sem-sparql privileges found. Contact MarkLogic Support.") else ()
          let $role-ids-having-sem-sparql:=$sem-sparql-priv/sec:role-ids/sec:role-id/xs:unsignedLong(.)
          let $role-names:=sec:get-role-names($role-ids-having-sem-sparql)/xs:string(.)
          let $ootb-roles-having-sem-sparql:=$role-names[. = $ootb-sem-sparql-roles]
          let $custom-roles-having-sem-sparql:=$role-names[fn:not(. = $ootb-sem-sparql-roles)]
          let $_ := if (fn:count($ootb-roles-having-sem-sparql) gt 0) then
                       xdmp:invoke-function(function() {
                         sec:privilege-remove-roles($remove-privilege,"execute",$ootb-roles-having-sem-sparql)
                        },map:map()=>map:with('update',"true"))
                    else ()
          return  if (fn:count($role-names) eq 0) then
                       "No roles have the sem:sparql privilege." 
                     else
                       ("Removed sem:sparql from the following MarkLogic Server out-of-the-box roles:",
                        if (fn:count($ootb-roles-having-sem-sparql) eq 0) then "No OOTB roles have sem:sparql" else $ootb-roles-having-sem-sparql,
                        "The following non OOTB roles have sem:sparql which should be removed manually:",
                        if (fn:count($custom-roles-having-sem-sparql) eq 0) then "No custom roles present having sem:sparql" else $custom-roles-having-sem-sparql)
      },map:map()=>map:with("database",xdmp:security-database())
       =>map:with("update","false"))

      Manual privilege removal

      Alternatively, the sem:sparql privilege can be removed manually via the Admin UI. From the side menu, select Security > Execute Privileges. Scroll to the sem:sparql privilege, click on it and then uncheck any roles that are selected and click "OK". Please make sure to take good note of the affected roles If you intend to re-enable the privilege after upgrading your deployment.

      For MarkLogic Server v6.0, the absolute maximum number of MarkLogic Servers in a Cluster is 256, but the optimum is around 64.

      Summary

      MarkLogic recommends the default "ordered" option for Linux ext3 and ext4 file-systems.

      File System administrators in Linux are tempted to use the data=writeback option to achieve higher throughput from their file-system, but this comes with the side-effects of potential data corruption and data-secuity breach. This article explains both file system options with respect to MarkLogic Server. 

      "data=ordered"

      Linux ext3 and ext4 file system has default data option of "ordered", which writes to the main file system before committing to the journal.

      https://www.kernel.org/doc/Documentation/filesystems/ext4.txt

      https://www.kernel.org/doc/Documentation/filesystems/ext3.txt

      Both of these file-system goes the extra mile to protect your files and writes data associated with that meta data by default with data=ordered, thus assuring file-system integrity to application layer - essential for MarkLogic Server data integrity. 

      "data=writeback"

      Other journaled file systems like XFS and JFS write meta data to the disk;  to make ext3 and ext4 behave like XFS and other journal file system, an administrator could set 'data=writeback' in their mount options.

      The 'data=writeback' mode does not preserve data ordering when writing to the disk, so commits to the journal may happen before the data is written to the file system. This method is faster because only the meta data is journaled, but is not good at protecting data integrity in the face of a system failure.

      If there is a crash between the time when metadata is commited to the journal and when data is written to disk, the post-recovery metadata can point to incomplete, partially written or incorrect data on disk; which can lead to corrupt data files. Additionally, data which was supposed to be overwritten in the filesystem could be exposed to users - resulting in a security risk.

      Linus Torvalds comments on 'data=writeback'

      "it makes things much smoother, since now the actual data is no longer in the critical path for any journal writes, but anybody who thinks that's a solution is just incompetent.  We might as well go back to ext2 then. If your data gets written out long after the metadata hit the disk, you are going to hit all kinds of bad issues if the machine ever goes down."   - http://thread.gmane.org/gmane.linux.kernel/811167/focus=811654

       

      Introduction

      Here we discuss management of temporal documents.

      Details

      In MarkLogic, a temporal document is managed as a series of versioned documents in a protected collection. The ‘original’ document inserted into the database is kept and never changes. Updates to the document are inserted as new documents with different valid and system times. A delete of the document is also inserted as a new document.

      In this way, a temporal document always retains knowledge of when the information was known in the real world and when it was recorded in the database.

      API's

      By default the normal xdmp:* document functions (e.g., xdmp:document-insert) are not permitted on temporal documents.

      The temporal module (temporal:* functions; see Temporal API) contains the functions used to insert, delete, and manage temporal documents.

      All temporal updates and deletes create new documents and in normal operations this is exactly what will be desired.

      See also the documentation: Managing Temporal Documents.

      Updates and deletes outside the temporal functions

      Note: normal use of the temporal feature will not require this sort of operation.

      The function temporal:collection-set-options can be used with the updates-admin-override option to specify that users with the admin role can change or delete temporal documents using non-temporal functions, such as xdmp:document-insert and xdmp:document-delete.

      For example, if you need to do a corb or other administrative transform, but do not want to update the system dates on the documents; say, you want to change the values M/F to Male/Female.

       

      Introduction

      This article outlines different manual procedures to failback after a failover event

      What is failover?

      Failover in MarkLogic Server provides high availability for data nodes in the event of a d-node or forest-level failure. With failover enabled and configured, a host can go offline or unresponsive and a MarkLogic Server cluster automatically and gracefully recovers from the outage, continuing to process queries without any immediate action needed by an administrator.

      MarkLogic offers support for two varieties of failover at the forest level, both of which provide a high-availability solution for data nodes.

      • Local-disk failover: Allows you to specify a forest on another host to serve as a replica forest which will take over in the event of the forest's host going offline. Multiple copies of the forest are kept on different nodes/filesystems in local-disk failover
      • Shared-disk failover: Allows you to specify alternate nodes within a cluster to host forests in the event of a forest's primary host going offline. A single copy of the forest is kept in shared-disk failover

      More information can be found at:

      How does failover work?

      The mechanism for how MarkLogic Server automatically fails over is described in our documentation at: How Failover Works

      When does failover occur?

      Scenarios that trigger a forest to failover are discussed in detail at:

      High level overview of failing back after a failover event

      If failover is configured, other hosts in the cluster automatically assume control of the forests (or replicas of the forests) of the failed host. However, when the failed host comes back up, the transfer of control back to their original host does not happen automatically. Manual intervention is required to failback. If you have a failed over forest and want to fail back, you'll need to:

      • Restart either the forest or the current host of that forest, if using shared-disk failover
      • Restart the acting data forest or restart the host of that forest, if using local-disk failover. You should only do this if the original primary forest is in the sync replicating state, which indicates that it is up-to-date and ready to take over. Updates written to an acting primary forest must be synchronized to acting replicas, else those updates will be lost after failing back. After restarting the acting data forest, the intended primary data forest will automatically open on the intended primary host.

      Make sure the primary host is safely back online before attempting to fail back the forest.

      You can read more about this procedure at: Reverting a Failed Over Forest Back to the Primary Host

      Local disk failover procedure for attaching replicas directly to the database and clearing the intended primary forests error states

      If your primary data forests are in an error state, you'll need to clear those errors before failing back. This will usually require unmounting the primary forest copy, then directly mounting the local disk failover forest copy (or "LDF") to the relevant database. That procedure looks like:

      1. Make sure to turn OFF the rebalancer/reindexer at the database level - you don't want to unintentionally move data across forests when manually altering your database's forest topology.
      2. Break forest level replication between forests (i.e. - between the intended LDF replica (aka "acting primary") and intended primary forest currently in an error state)
      3. Detach the intended primary forest from database
      4. Attach the intended LDF replica (aka acting primary) forest directly to the database
      5. Make sure the database is online
      6. Delete the intended primary forest in error state
      7. Create a new forest with the same name as the now deleted intended primary forest
      8. Re-establish forest-level replication between the intended LDF replica (aka acting primary) forest and the newly created intended primary forest
      9. Let bulk replication repopulate the intended primary forest
      10. After bulk replication is finished, fail back as described above, so the intended primary forest is once again the acting primary forest, and the intended LDF replica is once again the acting LDF replica forest

      What is the procedure for failing forests back to the primary host in cases where the replicas are directly attached to the database?

      If intended LDF replicas are instead directly attached to the relevant database, forest or host restarts will not fail back correctly. Instead, you must rename the relevant forests:

      1. Forests that are currently attached to the database can be renamed - from their LDF replica naming scheme, to the desired primary forest naming scheme.
      2. Conversely, unattached primary forests can be renamed as LDF replicas, then configured as LDF replicas for the relevant database
      3. At this point, the server should detect that the current primary (which was previously the LDF replica) will have more recent data than the current LDF replica (which was previously the primary), which should then cause the server to populate the current LDF replica from the current primary

      What should be done in case of a disk failure?

      In the unlikely event a logical volume is lost, you'll want to restore from a copy of your data. That copy can take the form of:

      1. Local disk failover (LDF) replicas within the same cluster (assuming those copies are fully synchronized)
      2. Database Replication copies in your replication cluster (again, assuming those copies are fully synchronized)
      3. Backups, which might be missing updates made since the backup was taken

      You can restore from backups if you can afford to lose updates subsequent to that backup's timestamp and/or can re-apply whatever updates happened after the backup was taken.

      If you would instead prefer not to lose updates, then use LDF replicas to sync back to replacement primary forests created on new volumes, failing back manually when done. In the event that data was moved across forests in some way after the backup was taken, it would be best to use LDF replicas instead, which avoids the possibility database corruption in the form of duplicate URIs.

      Database Replication will allow you to maintain copies of forests on databases in multiple MarkLogic Server clusters. Once the replica database in the replica cluster is fully synchronized with its primary database, you may break replication between the two and then go on to use the replica cluster/database as the primary. Note: To enable Database Replication, a license key that includes Database Replication is required. You'll also need to ensure that all hosts are:

      1. Running the same maintenance release of MarkLogic Server
      2. Using the same Operating System
      3. Have Database Replication correctly configured

      Takeaways

      • It's possible to have multiple copies of your data in a MarkLogic Server deployment
      • Under normal operations, these copies are synchronized with one another
      • Should failover events occur in a cluster, or catastrophic events occur to an entire cluster, you can shift traffic to the available previously synchronized copies
      • Failing back is a manual operation
        • Make sure to re-synchronize copies that were offline with online copies
        • Shifting previously offline copies to acting primary before re-synchronization may result in data loss, as offline forests can overwrite updates previously committed to LDF forests serving as acting primaries while the intended primary forests were offline

      Related materials:

      Introduction

      When CPF is installed, a number of new documents are created for the nominated Triggers database associated with that database.

      This Knowledgebase article is designed to show you what CPF creates on install, in the event that you want to safely disable and remove it from your system.

      Getting started

      Below is a layout of all databases and their associated document counts with a clean install of MarkLogic 9.0-2:

      Database IDDatabase NameDocument Count
      8723423541597683063 App-Services 14
      12316032390759111212 Modules 0
      1695527226691932315 Fab 0
      11723073009075196192 Security 1526
      15818912922008798974 Triggers 0
      5212638700134402198 Documents 0
      4320540002505594119 Extensions 0
      9023394855382775954 Last-Login 0
      11598847197347642387 Schemas 0
      12603105430027950215 Meters 48

      Adding CPF

      After installing CPF on the Documents database (with conversion enabled), we now see:

      Database IDDatabase NameDocument Count
      8723423541597683063 App-Services 15
      12316032390759111212 Modules 0
      1695527226691932315 Fab 0
      11723073009075196192 Security 1526
      15818912922008798974 Triggers 39
      5212638700134402198 Documents 0
      4320540002505594119 Extensions 0
      9023394855382775954 Last-Login 0
      11598847197347642387 Schemas 0
      12603105430027950215 Meters 498

      If we ignore Meters and App-Services, we can see that by default, A CPF install will create a number of documents in the Triggers database:

      /cpf/domains.css
      /cpf/pipelines.css
      http://marklogic.com/cpf/configuration/configuration.xml
      http://marklogic.com/cpf/domains/4361761515557042908.xml
      http://marklogic.com/cpf/pipelines/10451885084298751684.xml
      http://marklogic.com/cpf/pipelines/11486027894562997537.xml
      http://marklogic.com/cpf/pipelines/1182872541253698578.xml
      http://marklogic.com/cpf/pipelines/11925472395644624519.xml
      http://marklogic.com/cpf/pipelines/12665626287133680551.xml
      http://marklogic.com/cpf/pipelines/12977232154552215987.xml
      http://marklogic.com/cpf/pipelines/13371411038103584886.xml
      http://marklogic.com/cpf/pipelines/13468360248543629252.xml
      http://marklogic.com/cpf/pipelines/13721894103731640519.xml
      http://marklogic.com/cpf/pipelines/14473927355946353823.xml
      http://marklogic.com/cpf/pipelines/16071401642383641119.xml
      http://marklogic.com/cpf/pipelines/17008133204004114953.xml
      http://marklogic.com/cpf/pipelines/1707825679528566193.xml
      http://marklogic.com/cpf/pipelines/17486255598951175231.xml
      http://marklogic.com/cpf/pipelines/1789191734187967847.xml
      http://marklogic.com/cpf/pipelines/2145494300111008849.xml
      http://marklogic.com/cpf/pipelines/2272288885870389220.xml
      http://marklogic.com/cpf/pipelines/2585221667797881502.xml
      http://marklogic.com/cpf/pipelines/4684095308382280821.xml
      http://marklogic.com/cpf/pipelines/6055693256331806191.xml
      http://marklogic.com/cpf/pipelines/7250675434061295808.xml
      http://marklogic.com/cpf/pipelines/7354167915842037706.xml
      http://marklogic.com/cpf/pipelines/7492839190910743342.xml
      http://marklogic.com/cpf/pipelines/8329675320036351600.xml
      http://marklogic.com/cpf/pipelines/8537493622930387355.xml
      http://marklogic.com/cpf/pipelines/8877791654658876902.xml
      http://marklogic.com/cpf/pipelines/8988716724908642408.xml
      http://marklogic.com/cpf/pipelines/9432621469736814202.xml
      http://marklogic.com/xdmp/triggers/10905847201437369653
      http://marklogic.com/xdmp/triggers/11663386212502595308
      http://marklogic.com/xdmp/triggers/12471659507809075185
      http://marklogic.com/xdmp/triggers/15932603084768890631
      http://marklogic.com/xdmp/triggers/16817738273312375366
      http://marklogic.com/xdmp/triggers/17731123999892629453
      http://marklogic.com/xdmp/triggers/6779751200800194600

      Files created by CPF

      http://marklogic.com/cpf/configuration

      One of these files is the CPF configuration.xml file

      http://marklogic.com/cpf/domains

      One of these documents describes the default domain which is created when CPF is installed:

      Default Documents
      http://marklogic.com/cpf/pipelines

      Of the 39 files created, we can see from the URI listing above that the majority (28) of these are prefaced with http://marklogic.com/cpf/pipelines. These files describe each of the standard conversion pipelines that ship with the server. These are:

      Alerting
      Alerting (spawn)
      Calais Entity Enrichment Sample
      Conversion Processing
      Conversion Processing (Basic)
      Data Harmony Enrichment Sample
      DocBook Conversion
      Document Filtering (Properties)
      Document Filtering (XHTML)
      Entity Enrichment
      Flexible Replication
      HTML Conversion
      Janya Entity Enrichment Sample
      MS Office Conversion
      Office OpenXML Extract
      PDF Conversion
      PDF Conversion (Image Batching)
      PDF Conversion (Page Layout with Reblocking)
      PDF Conversion (Page Layout, Image Batching)
      PDF Conversion (Page Layout)
      PDF Conversion (Paged Text, No Rendering)
      Schema Validation
      SRA NetOwl Entity Enrichment Sample
      Status Change Handling
      Temis Entity Enrichment Sample
      WordprocessingML Process
      XHTML Conversion Processing
      XInclude Processing
      http://marklogic.com/xdmp/triggers

      Seven of the files are triggers - all of which are namespaced with the cpf prefix:

      cpf:any-property Default Documents
      cpf:create Default Documents
      cpf:delete Default Documents
      cpf:restart
      cpf:state Default Documents
      cpf:status Default Documents
      cpf:update Default Documents

      Removing the core files created when CPF was initially installed will disable it from further functioning in your environment.

      Scripting the removal of default CPF components

      This GitHub gist demonstrates a method for removing CPF configuration from a given database - in the example below, the "Triggers" database is specfied:

      Introduction

      If you have an existing MarkLogic Server cluster running on EC2, there may be circumstances where you need to upgrade the existing AMI with the latest MarkLogic rpm available. You can also add a custom OS configuration.

      This article assumes that you have started your cluster using the CloudFormation templates with Managed Cluster feature provided by MarkLogic.

      Procedure
      To upgrade manually the MarkLogic AMI, follow these steps:

      1. Launch a new small MarkLogic instance from the AWS MarketPlace, based on the latest available image. For example, t2.small based on MarkLogic Developer 9 (BYOL). The instance should be launched only with the root OS EBS volume.
      Note: If you are planning to leverage the PAYG-PayAsYouGo model, you must choose MarkLogic Essential Enterprise.
      a. Launch a MarkLogic instance from AWS MarketPlace, click Select and then click Continue:

      b. Choose instance type. For example, one of the smallest available, t2.small
      c. Configure instance details. For example, default VPC with a public IP for easy access
      d. Remove the second EBS data volume (/dev/sdf)
      e. Optional - Add Tags
      f. Configure Security Group - only SSH access is needed for the upgrade procedure
      g. Review and Launch
      Review step - AWS view:

      2. SSH into your new instance and switch the user to root in order to execute the commands in the following steps.

      $ sudo su -

      Note: As an option, you can also use "sudo ..." for each individual command.

      3. Stop MarkLogic and uninstall MarkLogic rpm:

      $ service MarkLogic stop
      $ rpm -e MarkLogic

      4. Update-patch the OS:

      $ yum -y update

      Note: If needed, restart the instance (For example: after a kernel upgrade/core-libraries).
      Note: If you would like to add more custom options/configuration/..., they should be done between steps 4 and 5.

      5. Install the new MarkLogic rpm
      a. Upload ML's rpm to the instance. (For example, via "scp" or S3)
      b. Install the rpm:

      $ yum install [<path_to_MarkLogic_RPM>]/[MarkLogic_RPM]

      Note: Do not start MarkLogic at any point of AMI's preparation.

      6. Double check to be sure that the following files and log traces do not exist. If they do, they must be deleted.

      $ rm -f /var/local/mlcmd.conf
      $ rm -f /var/tmp/mlcmd.trace
      $ rm -f /tmp/marklogic.host

      7. Remove artifacts
      Note: Performing the following actions will remove the ability to ssh back into the baseline image. New credentials are applied to the AMI when launched as an instance. If you need to add/change something, mount the root drive to another instance to make changes.

      $ rm -f /root/.ssh/authorized_keys
      $ rm -f /home/ec2user/.ssh/authorized_keys
      $ rm -f /home/ec2-user/.bash_history
      $ rm -rf /var/spool/mail/*
      $ rm -rf /tmp/userdata*
      $ rm -f [<path_to_MarkLogic_RPM>]/[MarkLogic_RPM]
      $ rm -f /root/.bash_history
      $ rm -rf /var/log/*
      $ sync

      8. Optional - Create an AMI from the stopped instance.[1] The AMI can be created at the end of step 7.

      $ init 0

      [1] For more information: https://docs.aws.amazon.com/toolkit-for-visual-studio/latest/user-guide/tkv-create-ami-from-instance.html

      At this point, your custom AMI should be ready and it can be used for your deployments. If you are using multiple AWS regions, you will have to copy the AMI as needed.
      Note: If you'd like to add more custom options/configuration/..., they should be done between steps 4 and 5.

      Additional references:
      [2] Upgrading the MarkLogic AMI - https://docs.marklogic.com/8.0/guide/ec2/managing#id_69624

      Summary

      Starting in MarkLogic Server version 10.0-7, XQuery FLWOR expressions that only use "let" will now stream results. Prior to 10.0-7, MarkLogic Server would have buffered results in memory. This change allows large result sets to be more easily streamed from XQuery modules.

      Impact

      Due to this change, code that relies on the previous behavior of buffered results from FLWOR expression with only a "let" may experience degraded performance if the results are iterated over multiple times. This is due to the fact that once a streaming result has been exhausted, the query has to be rerun in order to iterate over it again.

      Best Practice

      Regardless of this change, the best practice is:

      • to treat all query calls as lazily-evaluated expressions and only iterate over them once.
      • If the results need to be iterated multiple times, wrap the search expression in xdmp:eager() or iterate over the results once and assign the results to a new variable.

      Examples

      In 10.0-7 and prior versions, the following expression would be lazily-evaluated and run the search multiple times if the $results variable is iterated over multiple times.

      let $_ := xdmp:log("running search")
      let $results := cts:search(fn:collection(), cts:word-query("MarkLogic"))

      This behavior has not changed in 10.0-7. However, prior to 10.0-7, the following expression would short-circuit the lazy evaluation and buffer all of the results in memory

      let $results :=
          let $_ := xdmp:log("running search")
          return cts:search(fn:collection(), cts:word-query("MarkLogic"))

      In 10.0-7, this is now consistent with the other form of the expression above and returns an iterator. The search will be run multiple times if the $results variable is iterated over multiple times.

      To achieve the buffering behavior in 10.0-7 or later releases, you can wrap cts:search() inside xdmp:eager() as follows

      let $results :=
          let $_ := xdmp:log("running search")
          return xdmp:eager(cts:search(fn:collection(), cts:word-query("MarkLogic")))

      Diagnostics

      The xdmp:streamable function was added in MarkLogic 10.0-7 in order to help determine if a variable will stream or not, 

      Additional References

      For more information about lazy evaluation in MarkLogic, see the following resources

      Updates

      Tuesday, February 1, 2022 : Released Pega Connector 1.0.1 which uses MLCP 10.0-8.2 with forced dependencies to log4j 2.17.1.

      Tuesday, January 25, 2022 : MarkLogic Server versions 10.0-8.3 (CentOS 7.8 and 8) is now available on the Azure marketplace. 

      Monday, January 17, 2022 : MarkLogic Server 10.0-8.3 is now available on AWS marketplace;

      Monday, January 10, 2022 : MarkLogic Server 10.0-8.3 released with Log4j 2.17.1. (ref: CVE-2021-44832 ).

      Friday, January 7, 2022 : Fixed incorrect reference to log4j version included with MarkLogic 10.0-8.2 & 9.0-13.7.

      Wednesday, January 05, 2022 : Updated workaround to reference Log4j 2.17.1. (ref: CVE-2021-44832 ).

      Tuesday, December 28, 2021 : Add explicit note that for MarkLogic Server installations not on AWS, it is safe to remove the log4j files in the mlcmd/lib directory. 

      Saturday, December 25, 2021: MLCP update to resolve CVE-2019-17571 is now available for download;

      Friday, December 24, 2021: AWS & Azure Marketplace update;

      Wednesday, December 22, 2021: additional detail regarding SumoCollector files; AWS & Azure Marketplace update; & MLCP note regarding CVE-2019-17571.

      Monday, December 20, 2021: Updated workaround to reference Log4j 2.17.0.  (ref: CVE-2021-45105 )

      Friday, December 17, 2021: Updated for the availability of MarkLogic Server versions 10.0-8.2 and 9.0-13.7;

      Wednesday, December 15, 2021: Updated to include SumoLogic Controller reference for MarkLogic 10.0-6 through 10.0-7.3 on AWS;

      Tuesday, December 14, 2021: This article had been updated to account for the new guidance and remediation steps in CVE-2021-45046;

      "It was found that the fix to address CVE-2021-44228 in Apache Log4j 2.15.0 was incomplete in certain non-default configurations. This could allows attackers with control over Thread Context Map (MDC) input data when the logging configuration uses a non-default Pattern Layout with either a Context Lookup or a Thread Context Map pattern to craft malicious input data using a JNDI Lookup pattern resulting in a denial of service (DOS) attack. ..."

      Monday, December 13, 2021: Original article published.

      Subject

      Important MarkLogic Security update on Log4j Remote Code Execution Vulnerability (CVE-2021-44228)

      Summary

      A flaw in Log4j, a Java library for logging error messages in applications, is the most high-profile security vulnerability on the internet right now and comes with a severity score of 10 out of 10. At MarkLogic, we take security very seriously and have been proactive in responding to all kinds of security threats. Recently a serious security vulnerability in the Java-based logging package Log4j  was discovered. Log4j is developed by the Apache Foundation and is widely used by both enterprise apps and cloud services. The bug, now tracked as CVE-2021-44228 and dubbed Log4Shell or LogJam, is an unauthenticated RCE ( Remote Code Execution ) vulnerability allowing complete system takeover on systems with Log4j 2.0-beta9 up to 2.14.1. 

      As part of mitigation measures, Apache originally released Log4j 2.15.0 to address the maximum severity CVE-2021-44228 RCE vulnerability.  However, that solution was found to be incomplete (CVE-2021-45046) and Apache has since released Log4j 2.16.0. This vulnerability can be mitigated in prior releases (<2.16.0) by removing the JndiLookup class from the classpath.  Components/Products using the log4j library are advised to upgrade to the latest release ASAP seeing that attackers are already searching for exploitable targets.

      MarkLogic Server

      MarkLogic Server version 10.0-8.3 now includes Log4j 2.17.1. (ref: CVE-2021-44832 ).

      MarkLogic Server versions 10.0-8.2 & 9.0-13.7 includes log4j 2.16.0, replacing all previously included log4j modules affected by this vulnerability. 

      MarkLogic Server versions 10.0-8.3 & 9.0-13.7 are available for download from our developer site at https://developer.marklogic.com/products/marklogic-server

      MarkLogic Server versions 10.0-8.3 & 9.0-13.7 are  available on the AWS Marketplace.  

      MarkLogic Server versions 10.0-8.3 (CentOS 7.8 and 8) & 9.0-13.7 (CentOS 8) VMs are available in the Azure marketplace. 

      MarkLogic Server does not use log4j2 within the core server product. 

      However, CVE-2021-44228 has been determined to impact the Managed Cluster System (MLCMD) in AWS

      Note: log4j is included in the MarkLogic Server installation, but it is only used by MLCMD on AWS. For MarkLogic Server installations not on AWS, you can simply remove the log4j files in the mlcmd/lib directory (sudo rm /opt/MarkLogic/mlcmd/lib/log4j*).

      AWS Customers can use the following workaround to mitigate exposure to the CVE.

      Impacted versions

      The versions that are affected by the Log4Shell vulnerability are

      • 10.0-6.3 through 10.0-8.1 on AWS 
      • 9.0-13.4 through 9.0-13.6 on AWS 

      Earlier versions of MLCMD use a log4j version that is not affected by this vulnerability.

      How to check log4j version used by MarkLogic Managed Cluster System in AWS 

      1. Access the instance/VM via SSH.
      2. Run the following command ls /opt/MarkLogic/mlcmd/lib/ | grep "log4j"

      If the log4j jar files returned are between 2.0-beta9 and up to 2.14.1 then the system contains this vulnerability.

      An example response from a system containing the CVE:

      log4j-1.2-api-2.14.1.jar
      log4j-api-2.14.1.jar
      log4j-core-2.14.1.jar

      In the above case, the log4j dependencies are running version 2.14.1 which is affected.

      Workaround

      The following workaround can be executed on a running MarkLogic service, without stopping it.

      AWS

      1.  ssh into your EC2 instance, you must have sudo access in order to make the changes necessary for the fix.

      2.  Download and extract the Log4j 2.17.1 dependency from apache. 

      curl https://archive.apache.org/dist/logging/log4j/2.17.1/apache-log4j-2.17.1-bin.tar.gz --output log4j.tar.gz && tar -xf log4j.tar.gz

      • If your EC2 instance does not have outbound external internet access, download the dependency onto a machine that does, and then scp the file over to the relevant ec2 instance via a bastion host.

      3. Move the relevant log4j dependencies to the /opt/MarkLogic/mlcmd/lib/ folder IE:

      sudo mv ./apache-log4j-2.17.1-bin/log4j-core-2.17.1.jar /opt/MarkLogic/mlcmd/lib/
      sudo mv ./apache-log4j-2.17.1-bin/log4j-api-2.17.1.jar /opt/MarkLogic/mlcmd/lib/
      sudo mv ./apache-log4j-2.17.1-bin/log4j-1.2-api-2.17.1.jar /opt/MarkLogic/mlcmd/lib/

      4. Remove the old log4j dependencies

      sudo rm /opt/MarkLogic/mlcmd/lib/log4j-core-2.14.1.jar
      sudo rm /opt/MarkLogic/mlcmd/lib/log4j-1.2-api-2.14.1.jar
      sudo rm /opt/MarkLogic/mlcmd/lib/log4j-api-2.14.1.jar

      SumoLogic Collector

      AMIs for MarkLogic versions 10.0-6 through 10.0-7.3 were shipped with the SumoCollector libraries.  These libraries are not needed nor are they executed by MarkLogic Server. Starting with MarkLogic version 10.0-8, the SumoCollector libraries are no longer shipped with the MarkLogic AMIs.

      It is safe to remove those libraries from all the instances that you have launched using any of the MarkLogic AMIs available in Market place. You can remove the SumoCollector directory and all it's files under /opt.

      Additionally, if you have created any clusters using the Cloud Formation templates (managed cluster feature), we would suggest that you delete the SumoCollector directory under /opt if exists.  Once MarkLogic releases new AMIs, you can update the stack with new AMI ID and perform a rolling restart of nodes so that the permanent fix would be in place.

      Other Platforms

      For the impacted MarkLogic versions listed above running on platforms besides AWS, the log4j jars are included in the MarkLogic installation folder but are never used.  The steps listed in the workaround above can still be applied to these systems even though the systems themselves are not impacted.

      MarkLogic Java Client

      The MarkLogic Java Client API has neither a direct nor indirect dependency on log4j. The MarkLogic Java Client API  does use the industry-standard SLF4J abstract interface for logging. Any conformant logging library can provide the concrete implementation of the SLF4J interface. By default, MarkLogic uses the logback implementation of the SLF4J interface. The logback library doesn't have the vulnerability that exists in the log4j library. Customers who have chosen to override logback with log4j may have the vulnerability.  Such customers should either revert to the default logback library or follow the guidance provided by log4j to address the vulnerability: https://logging.apache.org/log4j/2.x/security.html

      MarkLogic Data Hub & Hub Central

      The MarkLogic Data Hub & Hub Central are not affected directly by log4j vulnerability, Datahub and Hub Central used Spring boot and spring has an option to switch default logging to use log4j, which Data Hub does not.
      The log4j-to-slf4j and log4j-api jars that we include in spring-boot-starter-logging cannot be exploited on their own. By default, MarkLogic Data Hub uses the logback implementation of the SLF4J interface. 
      The logback library doesn't have the vulnerability that exists in the log4j library.  Please refer: https://spring.io/blog/2021/12/10/log4j2-vulnerability-and-spring-boot 

      MarkLogic Data Hub Service

      For MarkLogic Data Hub Service customers, no action is needed at this time. All systems have been thoroughly scanned and patched with the recommended fixes wherever needed. 

      MarkLogic Content Pump (MLCP)

      MarkLogic Content Pump 10.0-8.2 & 9.0-13.7 are now available for download from developer.marklogic.com and GitHub. This release resolves the the CVE-2019-17571 vulnerability.

      MLCP versions 10.0-1 through 10.0-8.2 and versions prior to 9.0-13.6 used an older version of log4j-1.2.17 that is not affected by the primary vulnerability discussed in this article (CVE-2021-44228), but mlcp versions prior to 10.0-8.2 are affected by the critical vulnerability CVE-2019-17571.

      MLCP v10.0-8.2 & MLCP v9.0-13.7 specific workaround for CVE-2021-44832

      The following workaround can be executed on a host with mlcp

      1.  Download and extract the Log4j 2.17.1 dependency from apache. 

      curl https://archive.apache.org/dist/logging/log4j/2.17.1/apache-log4j-2.17.1-bin.tar.gz --output log4j.tar.gz && tar -xf log4j.tar.gz

      2. Move the relevant log4j dependencies to the $MLCP_PATH/lib/ folder IE:

      sudo mv ./apache-log4j-2.17.1-bin/log4j-core-2.17.1.jar $MLCP_PATH/lib/
      sudo mv ./apache-log4j-2.17.1-bin/log4j-api-2.17.1.jar $MLCP_PATH/lib/
      sudo mv ./apache-log4j-2.17.1-bin/log4j-1.2-api-2.17.1.jar $MLCP_PATH/lib/
      sudo mv ./apache-log4j-2.17.1-bin/log4j-jcl-2.17.1.jar $MLCP_PATH/lib/
      sudo mv ./apache-log4j-2.17.1-bin/log4j-slf4j-impl-2.17.1.jar $MLCP_PATH/lib/

      2. Remove the old log4j dependencies

      sudo rm $MLCP_PATH/lib/log4j-core-2.17.0.jar
      sudo rm $MLCP_PATH/lib/log4j-1.2-api-2.17.0.jar
      sudo rm $MLCP_PATH/lib/log4j-api-2.17.0.jar
      sudo rm $MLCP_PATH/lib/log4j-jcl-2.17.0.jar
      sudo rm $MLCP_PATH/lib/log4j-slf4j-impl-2.17.0.jar

      Pega Connector

      The 1.0.0 Pega connector installer briefly runs MLCP 10.0-6.2 via gradle as part of the setup. MLCP 10.0-6.2 uses the old 1.2 log4j jar. The actual connector does not use log4j at runtime.  We have released Pega Connector 1.0.1 which uses MLCP 10.0-8.2 with forced dependencies to log4j 2.17.1.

      MarkLogic-supported client libraries, tools

      All other MarkLogic-supported client libraries, tools, and products are not affected by this security vulnerability.  

      Verified Not Affected

      The following MarkLogic Projects, Libraries and Tools have been verified by the MarkLogic Engineering team as not being affected by this vulnerability

      • Apache Spark Connector
      • AWS Glue Connector
      • Corb-2
      • Data Hub Central Community Edition
      • Data Hub QuickStart
      • Jena Client - Distro not affected, but some tests contain log4j;
      • Kafka Connector
      • MLCP - uses an older version of log4j that is not affected CVE-2021-44228), but it is affected by CVE-2019-17571.  See notes above. 
      • ml-gradle
      • MuleSoft Connector - The MarkLogic Connector does not depend on log4j2, but it does leverage the MarkLogic Java Client API (see earlier comments); 
      • Nifi Connector
      • XCC

      MarkLogic Open Source and Community-owned projects

      If you are using one of the MarkLogic open-source projects which have a direct or transient dependency on Log4j 2 up to version 2.14.1 please either upgrade the Log4j to version 2.16.0 or implement the workaround in prior releases (<2.16.0) by removing the JndiLookup class from the classpath.  Please refer: https://logging.apache.org/log4j/2.x/security.html

      MarkLogic is dedicated to supporting our customers, partners, and developer community to ensure their safety. If you have a registered support account, feel free to contact support@marklogic.com with any additional questions.

      More information about the log4j vulnerability can be found at

      https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228 or

      https://logging.apache.org/log4j/2.x/security.html 

      https://www.cisa.gov/uscert/ncas/current-activity/2021/12/13/cisa-creates-webpage-apache-log4j-vulnerability-cve-2021-44228

      https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-45046 

      Introduction

      A powerful new feature was added to MarkLogic 8 - the ability to build applications around a declarative HTTP rewriter. You can read more about MarkLogic Server's HTTP rewriter and some of the new features it provides in our documentation.

      This article will cover some basic tips for debugging applications that make use of this feature.

      Validating your rewriter rules (Using XML Schema)

      The rewriter adheres to an XML Schema. At runtime the rewriter is not validated against this schema; this is by design so that potentially minor errors don't risk taking your application offline. As a best practice, we recommend validating your rewriters manually every time you make a change. In order to do this, you can use MarkLogic Server or any other tool that supports XML validation (the schema is standard XSD 1.0).  If you want to view the schema, it's copied to Config/rewriter.xsd when you install the product.

      In order to validate from within MarkLogic using XQuery you can simply execute:

      validate { fn:doc("/path/to/your/rewriter.xml") }

      The above will validate the XML if your rewriter rules are stored in a database. If you're using the filesystem, you can use xdmp:document-get instead.

      Alternatively, you can copy / paste the XML body into Query Console and wrap it with a call to validate as below:

      validate { * Paste your rewriter rules here * }

      The above approach should work without any issue as long as there is no content in your rewriter XML that contains any XQuery reserved syntax.

      General rewritter debugging and tracing

      For a simple "print" style debugging you can manually add trace statements at any point an eval rule is allowed. Like this:

      <trace event="customevent">data</trace>

      Then enable diagnostics (in your group settings) and add "customevent"; your custom trace will now show up in ErrorLog.txt whenever that endpoint is accessed. To read more on the use of trace events in your applications, refer to this Knowledgebase article

      There is error code handling:

      <error code="MYAPP-EXCEPTION" data1="value1" data2="... 

      You can also add ids - these will be traced out - which may aid debugging

      <match id="match-id-for-myregex" regex=".* ...

      Useful diagnostic trace events

      Note that additional trace events can generate a lot of data and may slow your application down, so make sure these do not get left on in a production-critical environment

      Below are some trace events you can use and a brief description of what each trace event does:

      Rewriter Parser Details of the parsing of the rewriter XML file
      Rewriter Evaluator Execution traces of rules as evaluated
      Rewriter Evaluator Verbose Additional (more verbose) tracing
      Declarative Rewriter Entry points into and out of the rewriter from the app server request handler
      Rewriter Print Rules After parsing and validation of the rewriter – a full dump of the internal data structures that resulted.

      Additional points to note

      Use of the "Evaluator" traces will write to the ErrorLog.txt on every request.

      The "Parser" trace event will only occur once or upon updating your rewriter.

      Introduction

      Prior to the 9.0-9 release, MarkLogic currently provides support for the Oracle JDK 8.  However, Oracle have recently announced End of Public Updates of Java SE 8

      What can we expect from MarkLogic?

      MarkLogic will support OpenJDK 9, OpenJDK 10 and OpenJDK 11 starting with MarkLogic Server 9.0-9 and associated products.

      These products/implementations include:

      From the 9.0-9 release onwards, we will no longer QA test our products with Oracle JDK.

      We will support Amazon Corretto JDK as part of our Amazon offerings.  Corretto meets the Java SE standard and certified compliant by AWS using the Java Technical Compatibility Kit.

      The latest version of MarkLogic Server is available to download from:

      http://developer.marklogic.com/products

      JDK Requirements for Data Hub Framework (DHF) Users

      Requirements are discussed in further detail in the DHF documentation, however it's important to note that versions of DHF prior to the 5.2 release require Java 8.

      JDK Requirements for MarkLogic on AWS

      The mlcmd script supports startup operations and advanced use of the Managed Cluster features. The mlcmd script is installed as an executable script in /opt/MarkLogic/bin/mlcmd

      In order to run any mlcmd command, user must be logged into the host and running as root or with root privileges. The hosts must also have Java installed and the java command in the PATH or JAVA_HOME set to the JRE or JDK home directory. 

      If the cluster is configured using any of MarkLogic AMIs as-is or using MarkLogic AMI to build custom AMIs or cloud formation templates to create the cluster, mlcmd is required at the start up of MarkLogic server and so the JDK.

      Summary

      The default configuration of MarkLogic Application Servers are not vulnerable to the FREAK SSL attack. 

      What is the FREAK SSL attack?

      Tuesday 2015/03/03 - Researchers of miTLS team (joint project between Inria and Microsoft Research) disclosed a new SSL/TLS vulnerability — the FREAK SSL attack (CVE-2015-0204). The vulnerability allows attackers to intercept HTTPS connections between vulnerable clients and servers and force them to use ‘export-grade’ cryptography, which can then be decrypted or altered.

      Read more about the FREAK SSL attack.

      Testing a webserver

      You can verify whether a webserver is attackable by the FREAK attack with this free SSL vulnerability checker.

      FIPS

      MarkLogic Server uses FIPS-capable OpenSSL to implement the Secure Sockets Layer (SSL v3) and Transport Layer Security (TLS v1) protocols. When you install MarkLogic Server, FIPS mode is enabled by default and SSL RSA keys are generated using secure FIPS 140-2 cryptography. This implementation disallows weak ciphers and uses only FIPS 140-2 approved cryptographic functions. Read more about OpenSSL FIPS mode in MarkLogic Server, and how to configure it.

      As long as FIPS mode was not explicitly disabled, MarkLogic Application Servers are not vulnerable to the FREAK SSL attack. 

      OpenSSL

      Eliminating the vulerability for all configurations requires an update to the OpenSSL library. MarkLogic Server continually updates the implementation version of the OpenSSL library so every MarkLogic Server maintenance release published after the discovery of this vulnerability will include the OpenSSL version that is not vulnerable to the FREAK attack.

      Conclusion

      As long as FIPS mode is enabled, which is the default configuration, MarkLogic Application Servers are not vulnerable to the FREAK SSL attack

       

      Question Answer Further Reading
      What are Backup/Restore best practices? Please refer to our MarkLogic Support FAQ for more details
      Should we backup default databases? Please refer to our MarkLogic Support FAQ for more details
      Should I be backing up my local disk failover forests? Please refer to our Local Disk Failover FAQ for more details
      In terms of disaster recovery (DR) - how do I choose between backup/restore or replication?
      Please refer to our Database Replication FAQ for more details

      How many copies of data do we have if we enable failover, Backup/Restore, Database Replication?

      Your primary cluster has its data forests (1st copy) and likely local disk failover forests (2nd) for high availability. Your replica cluster likely has its own data forests (3rd) and local disk failover forests (4th) for more up-to-date disaster recovery copies. You can also take backups from either environment (now 5 copies) for a less up-to-date DR copy.

      Please analyze these and setup accordingly (You don't have to setup all of them or have multiple replica forests or backup copies) depending on your need.

      On which environment should I take a backup? Primary or Replica cluster? 

      In general, it's probably best to take a backup from the environment,  primary or replica (one of the two, unlikely to need near identical or identical backups from both), that can best accommodate the backup load.

       

      What does a MarkLogic Database Backup contain?

      MarkLogic database backups are by default self-contained with the following

      • The configuration files.
      • The Security database, including all of its forests.
      • The Schemas database, including all of its forests.
      • The Triggers database, including all of its forests.
      • All of the forests of the database you are backing up.

      Documentation:

      White Paper:

      What are the important points to note before performing Backups/Restore?

      Refer to the "Notes about Backup and Restore Operations" section in our documentation.

      Documentation:

      Will there be any interruption in running queries/updates while backup runs?

      Most of the time, when a backup is running, all queries and updates proceed as usual. MarkLogic simply copies stand data from the source directory to the backup target directory, file by file. Stands are read-only except for the small Timestamps file, so this bulk copy can proceed without needing to interrupt any requests. Only at the very end of the backup does MarkLogic have to halt incoming requests briefly to write out a fully consistent view for the backup, flushing everything from memory to disk.

      Documentation:

      White Paper:

      What is Flash Backup?

      In flash backup mode you need to quiesce all forests in a given database for long enough to allow you to make a file level backup of the forest data.

      White Paper:

      KB Article:

      What are the advantages of using MarkLogic backup over other options/methods?


      • Our Backup and Restore APIs use a timestamp to guarantee that a backup is consistent according to a given timestamp; during the course of the time the backup takes to run, the on-disk stands being backed up will be kept until the backup has completed and it will also allow new updates to continue to take place (advancing the database forest timestamps), so it's generally recommended as the safest strategy to use if you want to be able to restore from a crash.
      • Our Backup and Restore API also force a checkpoint with the forest Journal files and any in-memory transactions just before the backup starts, meaning that all transactions up to the point at which the backup started are guaranteed to be in the backup set.
      • If you want to use other backup methods other than what MarkLogic provides, you can explore that. But you need to make sure that there are no updates happening at that time. Forests should be completely quiesced first; you wouldn't need to stop MarkLogic Server to do this, but you would need to (at the very least) ensure the forests were placed into flash-backup mode - this would allow queries to take place but would not allow any transactions to make changes while the backup task ran.

      KB Article:

      Can we restore backups across feature releases of MarkLogic? 

      Yes, you can restore from older version to newer version - but not vice versa.

      KB Articles:

      Can we restore backups across different OS platforms?

      No, MarkLogic backup files are platform specific and should only be restored onto the same platform. This is true for both database and forest backups.

      Documentation:

      KB Article:

      What is the role of Journals in relation to Backup and Restore?

      Refer to the Knowledgebase article for details.

      How does "point-in-time" recovery work with Journal Archiving?
      Refer to the documentation and Knowledgebase article for details.
      Do the journal archive files from a backup become invalid with the next backup?

      New journal archives are started when the next full backup is done. During the period of time that the new full backup is running, we archive journals to both the old and new location until we're sure the new full backup will complete successfully.

      Documentation:

      Do the archive files normally get deleted with a subsequent backup?

      They are typically deleted when the corresponding full backup is deleted.

      Documentation:

      How much free space is needed for the Journal Archive files in a Backup? The size of the journal archive can be larger (for example 6x) and totally dependent on how much data  you are ingesting and how much time you have between backups.

      KB Article:

      Can you explain resource consumption during Backup/Restore? Full backup/restore operations are resource (I/O, CPU and Memory) intensive and should be scheduled during off-hours, when possible.

      Documentation:

      Is it possible to restore to a target database with different number of forests than the source database? Yes, use the "Forest topology changed" option while restoring.

      Documentation:

      KB Article:

      What is the recommended way to backup/restore multiple databases? Refer to our knowledgebase article for more details
      How to configure database backup rotation? You can configure the maximum number of full (does not apply for incremental) backups to keep by specifying a number to the "max backups" parameter. When you reach the specified maximum number of backups, the next backup will delete the oldest backup. Specify 0 to keep an unlimited number of backups. You can set this in Admin UI or use API's to set this value.

      Documentation:

      What are the best practices for spacing incremental backups? Incremental backups are more resource-intensive than full backups as they need to query the data to find the changes between backup. You would need to monitor your system closely to ensure that the overhead of running so many incremental backups is not affecting your system performance or even that a subsequent backup starts before the previous has completed. Frequent incremental backups are not recommended, general recommendation is to space them at least 6 hours apart.

      KB Article:

      Can you explain the directory structure for Incremental backups?

      If an incremental backup directory is specified, after the first incremental backup is done, the full backup can be archived to another location. The subsequent incremental backups do not need to examine the full backup.

      Once you restore an incremental backup, you can no longer use the previous full backup location for ongoing incremental backups. After the restore, you need to make a fresh full backup and use the full backup location for ongoing incremental backups. This means that after restore of an incremental backup, scheduled backups need to be updated to use the fresh full backup location.

      Documentation: 

      Why do Incremental backups take more time than Full backups?

      Incremental backups would be expected to use higher CPU and RAM as they perform queries to determine what data has changed and needs to be backup, full backups simply backup up all available Forest data and are more likely to be I/O constrained. If the system is memory or CPU constrained during the time incremental backup is running, (i.e other processes or queries running), then the incremental task would take lower priority and could possibly take longer to run than a Full backup. Please also note that Incremental backups are designed to minimize storage - not time.

      Note that incremental backups could be fast when not much data has changed from the last time an incremental back up was taken, or when the system is otherwise idle. However, most of the time incremental backups are given lower priority, to consume least amount of resources, which ultimately results in longer run times.

      Why use incremental backup when using journal archiving? Is this a recommended combination?

      Incremental backups are more compact than archived journals and are faster to restore.

      Incremental backup improves both restore time and also space requirements over journal archiving, but it's not an either/or decision - you can use both where appropriate.

      Restoring from incremental backup taken on a different cluster fails. What do I need to check?

      Every incremental backup will store a reference to the location of the previous incremental backup and the very first one will store a reference to the location of the full backup. These are stored in a file by the name BackupTag.txt. The restore job fetches the backup locations from this file, and if they still point to an older location, then incremental restore will fail in this scenario.

       

      KB Article:

      Why MarkLogic Server backup is slower than file copy?

      Refer to our Knowledgebase article for more details

      Can you explain how Backup/Restore with encryption works?
      • If any forest in the backup has encryption enabled, then the entire backup will be encrypted.
      • As long as the current database being restored is encrypted, the restored database will also be encrypted.
      • By default the MarkLogic embedded KMS is automatically included in a backup. If you set the backup option to exclude and turn off the automatic inclusion of the keystore, you are responsible for saving keystore (the embedded KMS) to a secure location.

      Documentation:

      How can I monitor MarkLogic Backup?
      • Check Database status page on the Admin UI
      • Use the MarkLogic API's
       

      KB Article:

      Summary

      MarkLogic 9 introduces Certificate based User Authentication, which allows users to Log into MarkLogic Server without being required to enter user name/password. In previous versions, Certificates were only utilized to restrict client access to MarkLogic Server with the Digest/Basic User Authentication Scheme. In addition to Certificate based User Authentication using Internal user and External name verification MarkLogic 9 also permits authenticating and authorizing user certificates against an LDAP or Active Directory database to permit access based on MarkLogic Roles and LDAP Group membership. By using this method of authentication and authorization a site is able to maintain all users access externally without the need to manage a separate set of users within the MarkLogic security database.


      This document will expand on the concepts and configuration examples described in the associated "MarkLogic Certificate based User Authentication" knowledge base article and will show the additional steps required to configure MarkLogic to authorize a User certificate against an LDAP or Active Directory. It is highly recommended that you make yourself familiar with the previous article as it covers in more detail the steps required to setup the MarkLogic App Server to ensure that TLS Client Authentication is configured correctly to request and verify the certificates that may be presented by the user.

      Creating the External Security definition

      To authorize users presenting a certificate you should first create a new External Security definition selecting “Certificate” for authentication and LDAP for authorization.

       ExternalSecurity.png

      Next, configure the LDAP server entry.

      LDAPServer.png

      Notes:

      • Unlike standard user authorization when MarkLogic searches for the user certificate, MarkLogic uses a base Object search using the full certificate distinguished name rather than a sub-tree search off the “ldap base”. MarkLogic UI currently requires an entry for the “ldap base”; Even though it is not used, as such you will need to code a dummy value to satisfy UI verification.
      • When performing the LDAP search, MarkLogic will request the “ldap attribute” value to use when creating the temporary userid. Care should be taken when selecting this value to ensure that the value is unique for all possible Certificate DN’s that may be presented.
      • Ensure that the “ldap default user” has the required permissions to search for the Certificate within the LDAP or Active Directory server and return the required attributes.
      • MarkLogic uses the “memberOf” and “member” attributes to return Group and Group of Group membership, if your LDAP or Active Directory server using different attributes such as “isMemberOf” you can override them in the “memberOf” and “member” attribute fields. 

      Configuring the App Server

      Configure the App Server to use “certificate” authentication, set “Internal Security” to false and select the external security definition created above.

      AppServer1.png

      Enable TLS Client Authentication and configure the SSL Client Certificate authorities that you will accept to sign the user certificates. Any certificates presented that is not signed by one of the specified CA’s will be rejected.

      AppServer2.png 

      AppServer3.png

      For more details on configuring the CA certificates required for certificate based authentication please from to the knowledge base article "MarkLogic Certificate based User Authentication". 

      Configure MarkLogic Security Roles

      For each role specify one or more external names that match the “memberOf” attribute returned for the Certificate DN.

      Role.png
       
      To confirm that users are being authorized to the MarkLogic AppServer correctly, connect using your browser or command line tool such as “cUrl”.

      MacPro-4505:~ $ curl -k --cert ./mluser1.p12:password https://localhost:8013
      <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
      <html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
      <head>
      <title>Welcome to the MarkLogic Test page.</title>
      </head>
      <body><p>This application is running on MarkLogic Server version 9.0-1.1</p></body>

       
      Within the AppServer AccessLog, you should see a mapping for a new temporary userid to the expected role.

      External User(mluser1) is Mapped to Temp User(mluser1) with Role(s): mladmin
      ::1 - mluser1 [18/Jul/2017:16:07:05 +0100] "GET / HTTP/1.1" 200 347 - "curl/7.51.0"

      Troubleshooting

      If a user is not able to connect using their certificate, the first thing to check is if the Certificate Distinguished Name (DN) can be found in the LDAP or Active Directory database and if it contains the required userid and memberOf attributes.

      Using a tool such as OpenSSL determine the correct Subject Certificate DN, e.g.

      MacPro-4505:~ $ openssl x509 -in mluser1.pem -text
      Certificate:
      Data:
      Version: 3 (0x2)
      Serial Number: 1497030421 (0x593adf15)
      Signature Algorithm: sha256WithRSAEncryption
      Issuer: CN=User Signing Authority, O=MarkLogic, OU=Support
      Validity
      Not Before: Jun 9 17:47:13 2017 GMT
      Not After : Jun 9 17:47:13 2018 GMT
      Subject: CN=mluser1, OU=Users, DC=MarkLogic, DC=Local
       
      Next using an LDAP lookup tool such as “ldapsearch” or "ldp.exe" on Microsoft Windows, perform a base Object search for the Certificate DN requesting the LDAP user and memberOf attribute (with the entries matching your LDAP External Security settings).

      If either the userid or memberOf attributes are missing access will be denied.


      MacPro-4505:~ $ ldapsearch -H ldap://192.168.66.240:389 -x -D "cn=manager,dc=marklogic,dc=local" -W -s base -b "cn=mluser1,ou=Users,dc=MarkLogic,dc=Local" "memberOf" "cn"
      # extended LDIF
      #
      # LDAPv3
      # base <cn=mluser1,ou=Users,dc=MarkLogic,dc=Local> with scope baseObject
      # filter: (objectclass=*)
      # requesting: memberOf uid
      #
      # mluser1, Users, MarkLogic.Local
      dn: cn=mluser1,ou=Users,dc=MarkLogic,dc=Local
      uid: mluser1
      memberOf: cn=AppAdmin,ou=Groups,dc=MarkLogic,dc=Local
      # search result
      search: 2
      result: 0 Success
       
      If MarkLogic is able successfully to locate the certificate and return the required attributes, then check if the external names in the security role matches (case-sensitive) the “memberOf” attribute returned by the LDAP search.

      The following XQuery can be used to show all the external names assigned to a specific role. 


      (: execute this against the security database :)
      xquery version "1.0-ml";
      import module namespace sec = "http://marklogic.com/xdmp/security"
          at "/MarkLogic/security.xqy";
      sec:role-get-external-names("mladmin")


      Result

      cn=AppAdmin,ou=Groups,dc=MarkLogic,dc=Local


      If MarkLogic is still not able to authenticate users, it is very useful to use a packet capture tool such as Wireshark to check - if MarkLogic is able to contact the LDAP or Active Directory server and is receiving the expected successful Admin bind and Search for the Certificate DN.

      The following example trace shows a successful BIND using the LDAP Default user followed by a successful search for the Certificate DN.

      LDAPWireshark.png

      Further Reading

      Summary

      MarkLogic 9 introduces Certificate based User Authentication, which allows users to Log into MarkLogic Server without being required to enter user name/password. In previous versions, Certificates were only utilized to restrict client access to MarkLogic Server with the Digest/Basic User Authentication Scheme. Certificate based User Authentication configuration can be achieved using Internal User or External Name based user configurations.

      Certificate Authentication: Internal User vs External Name based Authentication:

      The difference between Internal User or External Name based authentication lies in the existence of the Certificate CN field based User (demoUser1 in our example) in the MarkLogic Security Database (Internal User) vs if the user retrieved from Certificate Subject field (whole Subject field as DN) is mapped as External Name value in any Existing User.

      User Certificate Example:

      There are few common steps/examples listed to add to clarity. For our example setup, the certificate presented by the App Server User (demoUser1) will be as following. 

      $ openssl x509 -in UserCert.pem -text -noout
      Certificate:
          Data:
              Version: 1 (0x0)
              Serial Number: 7 (0x7)
          Signature Algorithm: sha1WithRSAEncryption
              Issuer: C=US, ST=NY, L=New York, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic DemoCA
              Validity
                  Not Before: Jul 11 02:58:24 2017 GMT
                  Not After : Aug 27 02:58:24 2019 GMT
              Subject: C=US, ST=NJ, L=Princeton, O=MarkLogic Corporation, OU=Engineering, CN=demoUser1
              Subject Public Key Info:
                  Public Key Algorithm: rsaEncryption
                      Public-Key: (1024 bit)
                      Modulus:
                          .....................
                      Exponent: 65537 (0x10001)
          Signature Algorithm: sha1WithRSAEncryption

      CA Certificate (User Cert Signer) Import from Admin GUI

      In order to allow MarkLogic Server to accept the Certificate presented by a user, MarkLogic Server needs Certificate Authority (CA) to sign the User Certificate installed into MarkLogic. We can install CA Certificate (below) used to sign demoUser1 Cert using Admin GUI->Configure->Security->Certificate Authority Import tab.

      $ openssl x509 -in CACert.pem -text -noout
      Certificate:
          Data:
              Version: 3 (0x2)
              Serial Number: 9774683164744115905 (0x87a6a68cc29066c1)
          Signature Algorithm: sha256WithRSAEncryption
              Issuer: C=US, ST=NY, L=New York, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic DemoCA
              Validity
                  Not Before: Jul 11 02:53:18 2017 GMT
                  Not After : Jul  6 02:53:18 2037 GMT
              Subject: C=US, ST=NY, L=New York, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic DemoCA
              Subject Public Key Info:
                  Public Key Algorithm: rsaEncryption
                      Public-Key: (4096 bit)
                      Modulus:
                         ......................
                      Exponent: 65537 (0x10001)
              X509v3 extensions:
                  X509v3 Subject Key Identifier:
                      D9:45:B9:9A:DC:93:7B:DB:47:07:C6:96:63:57:13:A7:A8:F1:D0:C8
                  X509v3 Authority Key Identifier:
                      keyid:D9:45:B9:9A:DC:93:7B:DB:47:07:C6:96:63:57:13:A7:A8:F1:D0:C8
                  X509v3 Basic Constraints: critical
                      CA:TRUE
                  X509v3 Key Usage: critical
                      Digital Signature, Certificate Sign, CRL Sign
          Signature Algorithm: sha256WithRSAEncryption

      CA Certificate Import into MarkLogic from Query Console

      We can also import above Certificate Authority with xquery call pki:insert-trusted-certificates to load the Trusted CA into MarkLogic.  The sample Query Console code below demonstrates this process. 

      (Please ensure this query is executed against the Security database)

      Certificate Template & Template CA import into Client (Browser/SSL Client)

      To enable SSL App Server, we will either

      1) Create Certificate Template to utilize Self Signed Certificate.

      or, 2) Import pre-signed Certificate Certificate into MarkLogic

      In both of the above cases, we will need to import CA used to sign Certificate used by MarkLogic SSL AppServer ro Client Browser/SSL Client.

      Importing a Self Signed Certificate Authority into Windows

      Once template is created, we will link our Template with our App Server to enable SSL based App Server.

      Certificate Authentication: CN as Internal User vs External Name based Internal User

      Difference between above two lies in if Certificate CN field User (demoUser1 in our example) exist in MarkLogic Security Database as Internal User -vs- if User retrieved from Certificate Subject field is mapped as External Name to any Existing User.

      1.) Certificate Authentication: Certificate CN field value as MarkLogic Security Database Internal User

      Steps to configure Certificate based User Authentication for our User demoUser1 as MarkLogic Internal User.

      a.) Create User "demoUser1" with necessary roles in MarkLogic Security (Internal User).

      DemoUser1_Internal_User.png

      b.) On the AppServer page, we will set Authentication schema to "Certificate" with Internal Security to "true". Also, unless you want to have some Users Authenticated as External User as well, you should leave External Security object to "none".

      AppServer_Authentication_Certificate.png

      c.) AppServer would also select CA that will be used to sign Client/User Certificate as accepted Certificate Authorities (please see section: CA Certificate earlier for our example).

      ClientCert_CA.png

      Once Configured, accessing above App Server with Browser with User Certificate (demoUser1) installed will be able to log into MarkLogic with internal demoUser1 (Note- We will also need to assign necessary Roles to Internal User to access resource as needed). 

      2.) Certificate Authentication: User Certificate Subject field value as External Name for Internal User

      Steps to configure Certificate based User Authentication for our User demoUser1 as MarkLogic External Name for Internal User "newUser1".

      a.) Create User "newUser1" with necessary roles in MarkLogic Security (Internal User), and Configure User Certificate Subject field as External Name to User.

      NewUser1_External_Name.png

      b.) Create an External Security object with Certificate based Authentication.

      External_Sec_Object.png

      c.) On External Security Object Configuration itself, select CA that will be used to sign Client/User Certificate as accepted Certificate Authorities (please see section: CA Certificate earlier for our example).

      Please Note - below Configuration is different then configuring Client CA on App Server (required for Internal User).

      External_Sec_ClientCert_CA.png

      d.) For External Name (Cert Subject field) based linkage to Internal User, App Server needs to point to our External Security Object.

      AppServer_ExternalSec_Link.png

      Question

      Answer

      Further Reading

      What is MLCP? MarkLogic Content Pump (MLCP) is an open-source, Java-based command-line tool to import, export and copy data to or from databases.

      Documentation:

      How do I install MLCP? Refer to our documentation and tutorial for this.

      What are the required software for MLCP?

      • MarkLogic Server with XDBC App Server (MarkLogic 8 and later versions come with an XDBC App Server pre-configured on port 8000).
      • Oracle/Sun Java JRE 1.8 or later.

      Documentation:

      Can I connect to MLCP via Load Balancer?

      Yes. You can configure the MLCP tool to connect to a Load Balancer that sits in front of the MarkLogic Server cluster

      Documentation:

      What are the permissions needed for MLCP operations?
      • 'admin' role or
      • Necessary permissions mentioned in the documentation with additional privileges (for e.g read/update privileges to the database)

      Documentation:

      Does MLCP offer a way to export triples? MLCP currently doesn’t offer a way to export triples but if you are okay with exporting them as XML files (through a collection name - for managed triples, graph name can be used as a collection name), you can do so by exporting those documents as files through MLCP

      KB articles:

      Can I configure MLCP to use SSL? Yes, please refer to our "Connecting to MarkLogic Using SSL" documentation for details.
      Can I configure Kerberos with MLCP? Yes. Please check Using MLCP With Kerberos for additional details.
      How do I ingest data in Data Hub Framework using MLCP? Check the "Ingest Using MLCP" section in our Data Hub Documentation for more details.
      Can we use MLCP to read from Amazon S3?

      There is currently no direct support between MLCP and Amazon S3.

      But you can consider using s3fs-fuse https://github.com/s3fs-fuse/s3fs-fuse to mount the S3 Bucket as a local filesystem and then use MLCP.

      Can I filter the data by column values while importing csv via MLCP?   Not in MLCP. But you can use other tools like CORB.

      Documentation:

      How do I debug/troubleshoot MLCP issues? Check our MLCP Troubleshooting documentation.
      Can I export large files in compressed format? Yes, use the -compress option in MLCP's export command

      Documentation:

      What is -fastload option and when should I use it? The -fastload option can significantly speed up ingestion during import and copy operations, but it can also cause problems if not used properly. Please check the documentation for tradeoffs and other considerations

      Documentation:

      How does MLCP handle failover?
      Failover support in MLCP is only available when running against MarkLogic 9 or later. With older MarkLogic versions, the job will fail if MLCP is connected to a host that becomes unavailable.

      Documentation:

      Does MLCP support concurrent jobs? No, refer to our knowledge base article for details.

      What to consider when configuring the thread_count option for MLCP export?
      • By default the -thread_count is 4 (if -thread_count is not specified)
      • For best performance, you can configure this option to use the maximum number of threads supported by the app server in the group (maximum number of server threads allowed on each host in the group * the number of hosts in the group)
        • E.g.: For a 3-node cluster, this number will be 96 (32*3) where:
          • 32 is the max number of threads allowed on each host
          • 3 is the number of hosts in the cluster

      KB Articles:

      Documentation:

      What are the differences between MLCP and CORB? Check this MarkLogic Stackoverflow discussion for more details.

       

      How to handle white space in URI's/folders while loading data in MLCP? Check our "Handling Whitespace in URIs" blog for details.

       

      How can I use delimiter in MLCP?

      Please check these links for details

      Creating Documents from Delimited Text Files

      Ingesting Delimited Text with MLCP

      Loading tab delimited files

      Does MLCP support distributed (Hadoop) mode? No, MLCP in distributed mode has been deprecated since MarkLogic 10.0-4 

      How can I invoke MLCP via gradle task?

      Check the github "MarkLogic Content Pump (mlcp) and Gradle" documentation for details. 

      Introduction

      Performance of the data extraction, ingestion using mlcp depends on multiple factor including hardware capacity of client node running mlcp. This article is solely focused on how to adjust mlcp thread_count and thread_count_per_split for better performance during import and export for the given hardware and the data set size.

      mlcp Import

      For mlcp import jobs, there are two options for tuning the the thread: 

      1. -thread_count

      -thread_count is the number of threads to spawn for concurrent loading. The total number of the thread count, however, is controlled by the newly calculated thread count or -thread_count if it is specified.

      2. -thread_count_per_split

       -thread_count_per_split is the maximum number of threads that can be assigned to each split. If you specify -thread_count_per_split, each input split will run with the specified number.

      What if both the options are not specified?

      Prior to 10.0-4.2,  mlcp import will use default thread count 4 for concurrent loading.

      For mlcp versions higher than or equal to 10.0-4.2, thread polling mechanism was introduced. During job initialization, mlcp conducts a thread polling to identify the maximum app server or xdbc server threads on the port that handles mlcp requests. MLCP will then use this number as the default thread count. 

      mlcp Export

      For mlcp export jobs, the only option for thread tuning is -thread_count.

      What if thread_count is not specified?

      If it is not specified, the default thread count for concurrent exporting is 4.

      Recommendations

      For import: It is recommended to align mlcp concurrent thread count with the maximum server threads allowed on all hosts (preferrable all the E nodes) in the group, to achieve better performance. However, this may not be the case if your MarkLogic server is I/O bound. Increasing the concurrency of writes will not necessarily improve performance. Because of the polling mechanism, the concurrency of the current app server/xdbc server has been maxed out, so it's not recommended to run multiple mlcp jobs at the same time. 

      For export: It is a good reasonable practice to try out smaller numbers for thread count such as 8, 16, 24, 32, 40 or 48 threads until the environment reaches I/O bound.Since mlcp exports content from multiple MarkLogic servers and writes to the local file system on a single node, the performance is largely restricted by the I/O capability of the machine that runs the mlcp job.Further increasing the thread count may harm the performance, since the speed of the client consuming data is a lot slower than the speed of the server serving data. It may also result in long-running requests, which may timeout (SVC-EXTIME exception) on the app server/xdbc server depending on the request timeout setting.

      Additional Resources

      For more information on MLCP troubleshoot see following resources.

      Summary

      MarkLogic may fail to start, with an XDMP-ENCODING error, Initialization: XDMP-ENCODING: (err:XQST0087) Unsupported character encoding: ascii.  This is caused by a mismatch in the Linux Locale character set, and the UTF-8 character set required by MarkLogic.

      Solutions

      There are two primary causes to this error. The first is using service instead of systemctl to start MarkLogic on some Linux distros.  The second is related to the Linux language settings.

      Starting MarkLogic Service

      On an Azure MarkLogic VM, as well as some more recent Linux distros, you must use systemctl, and not service to start MarkLogic. To start the service, use the following command:

      • sudo systemctl start MarkLogic

      Linux Language Settings

      This issue occurs when the Linux Locale LANG setting is not set to UTF-8.  This can be accomplished by changing the value of LC_ALL to "en_US.UTF-8".  This should be done for the root user for default installations of MarkLogic.  To change the system wide locale settings, the /etc/locale.conf needs to be modified. This can be done using the localectl command.

      • sudo localectl set-locale LANG=en_US.UTF-8

      If MarkLogic is configured to run as a non-root user, then setting the locale can be done in the users environment.  Setting the value can be done using the $HOME/.i18n file.  If the file does not exist, please create it and ensure it has the following:

      • export LANG="en_US.UTF-8"

      If that does not resolve the issue in the user environment, then you may need to look at setting LC_CTYPE, or LC_ALL for the locale.

      • LC_CTYPE will override the character set part of the LANG setting, but will not change other locale settings.
      • LC_ALL will override both LC_CTYPE and all locale configurations of the LANG setting.

      References

      https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system_administrators_guide/ch-keyboard_configuration

      https://access.redhat.com/solutions/974273

      https://www.unix.com/man-page/centos/1/localectl/

      http://man7.org/linux/man-pages/man1/locale.1.html

      Summary
      Overlarge workloads, underprovisioned environments, or a combination of the two often result in false failovers - where MarkLogic Server will perceive an overloaded node as unavailable. Failover events redistribute the affected node’s traffic to the remaining nodes in the cluster. False failover events, unfortunately, redistribute an overloaded node’s workload to the likely similarly overloaded (and now even fewer number of) nodes remaining in the cluster. While it’s possible to mitigate this scenario in the short term by allowing more time for nodes to talk to one another, long term correction requires throttling of workloads, increasing the environment’s hardware provisioning, or a combination of the two.

      What does failover look like in MarkLogic Server?
      High availability systems require continuity within a cluster. MarkLogic Server delivers high availability by providing fault tolerance - if a node in a MarkLogic cluster fails, other nodes automatically pick up the workload so that the data stored in forests is always available. 

      More specifically, failover in MarkLogic Server is designed to address data node (“d-node”) or forest-level failure. D-node failures can include operating system crashes, MarkLogic Server restarts, power failures, or persistent system failures (hardware failures, for example). A forest-level failure is any disk I/O or other failure that results in an error state on the forest. 

      Failover in MarkLogic Server is "hot" in the sense that switchover occurs immediately to failover hosts already running in the same cluster, with no node restarts required. Failing back from a failover host to the primary host, however, needs to be done manually and requires a node restart.

      When a node is perceived as no longer communicating with the rest of the cluster, and a quorum of greater than 50% of the nodes in the cluster vote to remove the affected node, then a failover event will occur automatically. A node is defined to no longer be communicating with the rest of the cluster when that node fails to respond to cluster wide heartbeats within the defined host timeout.

      What does false failover look like in MarkLogic Server?
      False failover events in MarkLogic Server occur when a node is present and working, but so overloaded that it can no longer respond to cluster wide heartbeats within the specified host timeout. In other words, during false failover events the affected node is so busy that it is unable to communicate its status to the other nodes in the cluster, and consequently unable to prevent the other nodes from voting to remove it from the cluster.

      There could be many reasons causing a busy node/cluster and one of the reasons that’s often overlooked is the infrastructure especially when Virtualization is involved where you can get more out of your resources by allowing VMs to share resources under the assumption that not all systems will need the assigned resources at the same time. However, if you are in a situation where multiple VMs are under load, they can outstrip the available physical resources because more than 100% of the resources have been assigned to the VMs causing what is called a "resource starvation".

      What should I do about false failover events in MarkLogic Server?
      Recall that a node is voted out when it can no longer respond to the rest of the cluster within the specified host timeout. It might be possible to mitigate false failovers in the short term by temporarily increasing the environment’s XDQP and host timeouts. Larger timeouts would give all the nodes in the cluster more time to respond to clusterwide heartbeats, which under heavy load should decrease the frequency of false failover events. That said - DO NOT get in the habit of simply increasing your timeouts to larger and larger values. Increasing timeout to avoid false failovers is, at best, a temporary/short term tactic.

      Long term correction of false failover events requires better alignment between your system's workloads and its hardware provisioning. You could, for example, reduce the workload, or spread the same workload over more time, or increase your system’s hardware provisioning. All of these tactics would free up the affected nodes to respond to the clusterwide heartbeat in a more timely manner, thereby avoiding false failover events. You can read more about aligning your workloads and hardware footprint at:

      1. MarkLogic Performance: Understanding System Resources
      2. Performance Issues in MarkLogic Server: what they look like - and what you should do about them

      Further reading:

      MarkLogic Server is optimized for query performance - if you're coming from a relational database background, you might be surprised by how much storage and storage bandwidth might be used. To better understand this behavior, it's important to recall the following:

      Speed over storage savings - While it makes sense to minimize storage footprint from a storage utilization perspective, MarkLogic Server trades space for time to take advantage of rapidly falling storage prices.

      Lazy Deletes - To better prioritize query performance, in MarkLogic Server record deletions happen in the form of "lazy deletes" where the record (or "document") is first marked as "obsolete" and consequently hidden from query results. The work of actually deleting any one record is deferred for a later time, when multiple obsolete documents can be removed and your remaining data optimized all at the same time and in bulk during a merge operation.

      Index on ingest - MarkLogic Server indexes documents as they're ingested. If your data model and index configuration is where it needs to be, that means your data is ready to be queried as soon as it's in a MarkLogic Server database. If your index configuration isn't quite where you want it, however, revising it means reindexing your entire database, creating lots of obsolete documents and resulting in potentially multiple large merge operations. This is why it's always better in MarkLogic Server to optimize your index settings in smaller environments before propagating those index settings to your bigger environments, and why you'll want to do fewer, bigger index configuration changes instead of many small index configuration changes. Each index configuration change - regardless of size - will trigger a reindex, so you'll want to minimize the number of reindexes you need to perform instead of the minimizing the number of changes in any one reindex.

        In addition to reindexing, other aspects of MarkLogic Server that take up significant storage bandwidth include:

        • Rebalancing - which redistributes your data across your database
        • Local disk failover/database replication - both make copies of your data, and those copies need their own resources
        • Backup/restore - backup is making a copy of your data, and restore is effectively a mass update of your data
        • Mass updates of existing documents - Because of the way updates are performed in MarkLogic Server (read more), updating a large number of existing records will create a large number of obsolete documents, and consequently result in lots of large merge operations. To help reduce performance overhead, and if you have no need to preserve attributes of your existing data (read more), you might want to consider simply reloading data into an empty database, instead (which would result in avoiding the creation of obsolete documents and consequent merges)

        References:

        Understanding System Resources
        Understanding MarkLogic Minimum Disk Space Requirements
        MarkLogic - Lazy Deletes
        Mass Updates - "node-replace" vs "document-insert"

        Introduction

        A MarkLogic cluster is a group of inter-connected individual machines (often called “nodes” or “hosts”) that work together to perform computationally intensive tasks. Clustering offers scalability and high-availability by avoiding single-points of failure. This knowledgebase article contains tips and best practices around clustering, especially in the context of scaling out.

        How many nodes should I have in a cluster?

        If you need high-availability, there should be a minimum of three nodes in a cluster to satisfy quorum requirements.

        Anything special about deploying on AWS?

        Quorum requirements hold true even in a cloud environment where you have Availability Zones (or AZs). In addition to possible node failure, you can also defend against possible AZ failure by splitting your d-Nodes and e-Nodes evenly across three availability zones.

        Load distribution after failover events

        If a d-node experiences a failover event, the remaining d-nodes pick up its workload so that the data stored in its forests remains available.

        Failover forest topology is an important factor in both high-availability and load-distribution within a cluster. Consider the example below of a 3-node cluster where each node has two data forests (dfs) and two local disk-failover forests (ldfs):

        • Case 1: In the event of a fail over, if both dfs (df1.1 and df1.2) from node1 fail over to node2, the load on node2 would double (100% to 200%, where node2 would now be responsible for its own two forests - df2.1 and df2.2 - as well as the additional two forests from node1 - ldf1.1 and ldf1.2)
        • Case 2: In the event of a fail over, if we instead set up the replica forests in such a way that when node1 goes down, df1.1 would fail over to node2 and df1.2 would fail over to node3, then the load increase would be reduced per node. Instead of one node going from 100% to 200% load, two nodes would instead go from 100% to 150%, where node2 is now responsible for its two original forests - df2.1 and df2.2, plus one of node1's failover forests (ldf1.1), and node3 would also now be responsible for its two original forests - df3.1 and df3.2, plus one of node1's failover forests (ldf1.2)

        Growing or scaling out your cluster

        If you need to fold in additional capacity to your cluster, try to add nodes in "rings of three." Each ring of three can have its own independent failover topology, where nodes 1, 2, and 3 will fail over to each other as described above, and nodes 4, 5, and 6 will fail over to each other separate from the original ring of three. This results in minimal configuration changes for any nodes already in your cluster when adding capacity.

        Important related takeaways

        • In addition to the standard MarkLogic Server clustering requirements, you'll also want to pay special attention to the hardware specification of individual nodes
          • Although the hardware specification doesn’t have to be exactly the same across all nodes, it is highly recommended that all d-nodes be of the same specification because cluster performance will ultimately be limited by the slowest d-node in the system
          • You can read more about the effect of slow d-nodes in a cluster in the "Check the Slowest D-Node" section of our "Performance Testing
            With MarkLogic" whitepaper
        • Automatic fail-back after a failover event is not supported in MarkLogic due to the risks of unintentional overwrites, which could potentially result in accidental data loss. Should a failover event occur, human intervention is typically required to manually fail-back. You can read more about the considerations involved in failing a forest back in the following knowledgebase article: Should I flip failed over forests back to their respective masters? What are the risks if I leave them?

         

        Further reading

        Error

        What does it mean?

        References

        XDMP-BACKDIRINUSE

        This error may sometimes be encountered when:

        • When a restore is attempted while a backup task is running
        • Another process has the backup directory locked 
        XDMP-BACKDIRSPACE

        Seen when:

        • The disk containing the backup directory runs out of space
        • There's a bad disk configuration
        • The backup destination disk is unmounted

        XDMP-CANCELED

        Indicates that an operation such as a merge, backup or query was explicitly canceled. This can occur:

        • Through the Admin Interface
        • By calling an explicit cancellation function, such as xdmp:request-cancel()
        • When a client breaks the network socket connection to the server while a query is running 

        XDMP-CLOCKSKEW

        MarkLogic Server expects the system clocks to be synchronized across all the nodes in a cluster, as well as between Primary and Replica clusters. The acceptable level of clock skew (or drift) between hosts is less than 0.5 seconds, and values greater than 30 seconds will trigger XDMP-CLOCKSKEW errors, and could impact cluster availability

        XDMP-CONFLICTINGUPDATES

        Indicates that an update statement attempted to perform an update to a document that will conflict with other updates occurring in the same statement. For example:

        • A single update transaction that attempts to updates a node, then attempts to add a child element to that node in the same transaction
        • A single update transaction that attempts to insert a document and then attempts to insert a node to that document
        • A single update transaction that attempts to insert a document at the same URI twice

        XDMP-DBDUPURI

        Indicates that the same URI occurred in multiple forests of the same database. Under normal operating conditions, duplicate URIs are not allowed to occur, but there are ways that programmers and administrators can bypass the server safeguards

        XDMP-DEADLOCK

        Indicates that MarkLogic Server detected a deadlock. Depending on whether the error is frequent or infrequent or whether it occurs as a ‘debug’ level or ‘notice’ level message, you need to take appropriate corrective action to avoid the deadlock

        XDMP-EXPNTREECACHEFULL

        Indicates that MarkLogic has run out of room in the expanded tree cache during query evaluation, and that consequently it cannot continue evaluating the complete query

        XDMP-EXTIME

        Indicates that a query or other operation exceeded its processing time limit. This can be caused by:

        • Inefficient queries
        • Inadequate processing limit
        • Resource bottlenecks

        XDMP-INMM*FULL

        Indicates that in-memory storage is full, resulting in the forest stands being written out to disk. These are informational only and are not errors as MarkLogic Server is working as expected. However, if these messages consistently appear more frequently than once per minute, increasing the corresponding 'in-memory' settings in the affected database may be appropriate.

        XDMP-LISTCACHEFULL
        • MarkLogic Server uses its list cache to hold search term lists in memory
        • If you're attempting to execute a particularly non-selective or inefficient query, your query will fail due to the size of the search term lists exceeding the allocated list cache

        XDMP-MODNOTFOUND/
        XDMP-NOPROGRAM

         Both errors indicate that the requested module does not exist or the user does not have the right permissions on the module

        MarkLogic on AWS

        What are 504 Timeout errors? How to resolve them?

        504 timeout errors indicate that the load balancer may be closing the connection before the server responds to the request. To avoid these, make sure the idle time out setting is sufficient to receive responses from the MarkLogic server.

        SVC-AWSCRED

        • The error SVC-AWSCRED indicates that either no AWS security credentials are configured or there are issues recognizing IAM role
        • If you are using a non-managed instance using a custom AMI:
          • Add the below to your /etc/marklogic.conf file (create one if not present)
            • export MARKLOGIC_EC2_HOST=1
              export MARKLOGIC_MANAGED_NODE=0
        • If you are using a managed AMI and want to use an IAM role:
          • Make sure you have below entries in mlcmd.conf which is available under /var/local/
            • MARKLOGIC_EC2_HOST=1
              MARKLOGIC_AWS_ROLE="ROLE"

        Question Answer Further reading
        How many replicas (Database Replication) should each of my primary databases have?
        • One replica per each primary database
        • Multiple replicas are not typically worth the additional administrative complexity or resource provisioning

        KB Articles:

        Do my primary and replica clusters have to be the same spec?
        • Slow replicas will throttle the performance of your primary cluster.
        • Therefore, your replica should be provisioned to avoid primary throttling - which typically means a similar hardware specification

         KB Articles:

        Documentation:

        What do I do about a lagging primary?

        You can either:

        • Speed up the replica by reducing traffic to the replica, or adding hardware resources to the replica - or both
        • Pause replication - be aware you'll no longer have a synchronized DR copy until replication is re-enabled

        KB Articles:

        Documentation:

        In terms of disaster recovery (DR) - how do I choose between backup/recovery or replication?
        • Database Replication
          • Best if you need a more synchronized copy of your data
          • Needs a bigger hardware footprint
          • Can result in primary throttling if under-provisioned or under heavy load
        • Backup/Restore
          • Best if you are not sufficiently provisioned for a more synchronized DR copy, as seen with database replication
          • Results in a more unsynchronized snapshot copy of your data

        KB Articles:

        Documentation:

        Can I do multi-primary replication i.e., have primary databases on both the clusters on a pair of coupled clusters?
        • Database replication is intended for disaster recovery (DR) & redundancy
        • For DR purposes, the recommended configuration is a dedicated primary cluster for all primary databases and a dedicated DR cluster for all replicas of those primary databases
        • Because of administrative complexity and compromised DR functionality, multi-primary DR configurations are not recommended

        Documentation:

        Should I replicate the auxiliary databases?
        • Always replicate your Security database when setting up Database replication

        • Separate security databases on both primary and replica clusters are not recommended due to administrative complexity

        • Avoid replicating the App-Services database

        Documentation:

        Can my primary and replica both write to the same shared storage?

        Avoid writing both primary and replica to the same shared storage since it results in a single-point failure architecture, thereby defeating the purpose of DR

        KB Articles:

        How many bootstrap hosts should my cluster have?

        Only mark the hosts that hold your security forests (and its local disk failover copy) as bootstrap hosts to avoid too many unnecessary connections between primary & replica clusters.

        Documentation

        How do I upgrade replicated environments?

        Replica First. If Security DB is replicated then Replica cluster must be upgraded before Master.

        Documentation

        How do I divert traffic away from my primary to replica cluster?
        • Disable database replication for the database on the replica cluster.
        • Make the replica cluster/database as the master.
        • Rolling Back to the Non-Blocking Timestamp on the new master

        Documentation

        Question Answer Further reading
        How do I get help from MarkLogic Support?
        • MarkLogic Support services are for registered customer contacts with current license and maintenance entitlements
        • It is VERY IMPORTANT to to register your support contacts before you need them. Creating accounts and verifying entitlements is not how you want to spend time in the event you need immediate assistance
        • One registered, you can open a support case via the URLs, email addresses, and phone numbers listed in the "HOW TO CONTACT US" section of our linked Support Handbook

        MARKLOGIC CUSTOMER SUPPORT HANDBOOK

        What information should I pass along in my support case?

        For every support case, please send along:

        • A summary of the problem
        • Status only support dump of the affected cluster
        • ErrorLog.txt files from all the nodes in the cluster

        For issues centered on performance problems, please also send along:

        • Monitoring History or Meters data
        • If requested by MarkLogic Support, perf/pstack output
        • If requested by MarkLogic Support, output from diagnostic trace events

        Do note that MarkLogic's Telemetry feature, if enabled, will send much of this information in the background

        KB Articles:

          How do I generate a Support Dump?

          Via the Administration Interface (aka the "Admin Interface" or "Admin UI")

            Creating a support request

            Where are the MarkLogic Server Error logs? What log level setting should I use?
            • MarkLogic Server error logs are stored in the Logs directory under the MarkLogic Server data directory for your platform. For example, on Linux, they're at /var/opt/MarkLogic/Logs/ErrorLog.txt
            • It is good practice to run in production with the Debug file log level to get a more detailed record of operations.

            Documentation

            KB Articles:

              How do I debug performance issues in MarkLogic?
              • Meters data should be provided for analysis of any performance related issue
              • If the monitoring history suggests the issue is caused by an inefficient or incorrect query, output from the following calls can help to determine where the problem areas are within a given query:

              Documentation

              KB Articles:

              When should I provide pstack or perf output? How do I generate pstack or perf output?

              pstack and perf allow a view of what individual threads are doing in a running MarkLogic Server process. MarkLogic Support will request pstack or perf output, if necessary

              KB Articles:

              When should I provide trace event output? How do I generate trace event output?

              Trace events are useful when more diagnostic information is needed than is typically available in the standard MarkLogic or Operating System log files

              How to use diagnostic trace events

              What is Telemetry? 

              Telemetry, when enabled, collects, encrypts, packages, and sends diagnostic and system-level usage information about MarkLogic clusters so that the MarkLogic Support team has access - in advanced - to the typically requested collateral

              Documentation:

              KB Articles:

              How do I upload files to MarkLogic Support?
              • You can attach any of these files to the Support ticket.
              • Alternatively, you can also use our FTP server

              How do you use the MarkLogic FTP server?

              Question Answer Further reading
              When does failover occur?
              • Failover occurs when a quorum of nodes votes a node out of a cluster
              • Voting depends on timely cluster heartbeats between its nodes. If a node isn't communicating with other nodes in the cluster, it gets voted out and its forests are failed over

              KB Articles:

              Documentation:

              What nodes participate in quorum?

              All nodes in the cluster configuration count towards quorum, irrespective of the:

              • group they belong to
              • type of node it might be (E-node, D-node, E/D-node, etc.)
              • state of the node (online/offline)
              • forest, database or group configurations

              KB Articles:

              Documentation:

              My cluster saw a failover event - does fail back happen automatically?
              • Failing back is a manual operation
              • Automatic fail-back is not supported due to the risks of unintentional overwrites and accidental data loss

               

               KB Articles:

              Documentation:

              How should I distribute my forests across the nodes in my cluster?
              • Here is an example forest topology on a typical 3-node cluster:
                • Node1: df1.1, df1.2, ldf2.1, ldf3.1
                • Node2: df2.1, df2.2, ldf1.1, ldf3.2
                • Node3: df3.1, df3.2, ldf1.2, ldf2.2
              • Distributing local disk failover forests (LDFs) evenly on the other two nodes splits the load on each surviving node to just 150% of normal should a failover event occur
              • When scaling out the cluster, try adding nodes in "rings of three" to keep the above load distribution intact with minimal config changes within each ring (for example, nodes 4, 5, and 6 would mirror the data and LDF forest distribution seen in the ring made up of nodes 1, 2, and 3)

              KB Articles:

              How many Local Disk Failover forests (LDFs) should each of my primary forests have?
              • One LDF for each primary forest, hosted on a different machine than the one which hosts the primary forest
              • More than one LDF per primary is not recommended due to unnecessary increases in administrative complexity and hardware resource requirements

              KB Articles:

              Should I be backing up my local disk failover forests?
              • Local disk failover forests (LDFs) should be included in your backup if you expect backups to be taken in a failed over state. Note that this will typically double the size of your backup since you're including both data and LDF forests
              • If you're not in a failed over state, or you manually fail back before a backup starts, then you can reduce the size of your backups by only backing up your data forests

              KB Articles:

              Documentation:

              MarkLogic IT Security Advisory

              Following disclosure of a cyber-incident that affected one our AWS servers due to CVE2022-1388 (https://support.f5.com/csp/article/K23605346), the MarkLogic Security team immediately investigated to assess any impact.  Based on our assessment, no customer information and none of our internal networks were impacted by this incident.  The AWS server in question was immediately patched and taken offline for further forensic review. The impacted server was a redundant system that has no access to our internal networks, is only used for Domain Name Services (DNS), and does not contain customer information.  All other similar servers had previously been fully patched and are also under forensic review as a precaution.  Before we bring the impacted server back online, MarkLogic will complete our forensics investigation and perform a full rebuild of the instance. 

              Any updates or changes will be posted on this website for future reference.

              Last updated: Monday, May 16, 2022  

              MarkLogic Linux Tuned Profile

              Summary

              The tuned tuning service can change operating system settings to improve performance for certain workloads. Different tuned profiles are available and choosing the profile that best fits your use case simplifies configuration management and system administration. You can also write your own profiles, or extend the existing profiles if further customization is needed. The tuned-adm command allows users to switch between different profiles.

              RedHat Performance and Tuning Guide: tuned and tuned-adm

              • tuned-adm list will list the available profiles
              • tuned-adm active will list the active profile

              Creating a MarkLogic Tuned Profile

              Using the throughput-performance profile, we can create a custom tuned profile for MarkLogic Server. First create the directory for the MarkLogic profile:

              sudo mkdir /usr/lib/tuned/MarkLogic/
              

              Next, create the tuned.conf file that will include the throughput-performance profile, along with our recommended configuration:

              #
              # tuned configuration
              #
              
              [main]
              summary=Optimize for MarkLogic Server on Bare Metal
              include=throughput-performance
              
              [sysctl]
              vm.swappiness = 1
              vm.dirty_ratio = 40
              vm.dirty_background_ratio=1
              
              [vm]
              transparent_hugepages=never
              

              Activating the MarkLogic Tuned Profile

              Now when we do a tuned list it should show us the default profiles, as well as our new MarkLogic profile:

              $ tuned-adm list
              Available profiles:
              - MarkLogic                   - Optimize for MarkLogic Server
              - balanced                    - General non-specialized tuned profile
              - desktop                     - Optimize for the desktop use-case
              - hpc-compute                 - Optimize for HPC compute workloads
              - latency-performance         - Optimize for deterministic performance at the cost of increased power consumption
              - network-latency             - Optimize for deterministic performance at the cost of increased power consumption, focused on low latency network performance
              - network-throughput          - Optimize for streaming network throughput, generally only necessary on older CPUs or 40G+ networks
              - powersave                   - Optimize for low power consumption
              - throughput-performance      - Broadly applicable tuning that provides excellent performance across a variety of common server workloads
              - virtual-guest               - Optimize for running inside a virtual guest
              - virtual-host                - Optimize for running KVM guests
              Current active profile: virtual-guest
              

              Now we can make MarkLogic the active profile:

              $ sudo tuned-adm profile MarkLogic
              

              And then check the active profile:

              $ tuned-adm active
              Current active profile: MarkLogic
              

              Disabling the Tuned Daemon

              The tuned daemon does have some overhead, and so MarkLogic recommends that it be disabled. When the daemon is disabled, tuned will only apply the profile settings and then exit. Update the /etc/tuned/tuned-main.conf and set the following value:

              daemon = 0
              

              References

              Introduction

              There is a lot of useful information in MarkLogic Server's documentation surrounding many of the new features of MarkLogic 9 - including the new SQL implementation, improvements made to the ODBC driver and the new system for generating SQL "view" templates for your data. This article attempts to pull it all together by showing all the measures needed to create a successful connection and to verify that everything is set up correctly and works as expected?

              This guide presents a step-by-step walk through covering the installation of all the necessary components, the configuration of the ODBC driver and the loading of data into MarkLogic in order to create a Template View that will allow a SQL query to be rendered.

              Prerequisites

              We're starting with a clean install of Redhat Enterprise Linux 7:

              $ uname -a
              Linux engrlab-128-084.engrlab.marklogic.com 3.10.0-327.4.5.el7.x86_64 #1 SMP Thu Jan 21 04:10:29 EST 2016 x86_64 x86_64 x86_64 GNU/Linux

              In this example, I'm using yum to manage the additional dependencies (openssl-libs and unixODBC) required for the MarkLogic ODBC driver:

              $ sudo yum install openssl-libs
              Package 1:openssl-libs-1.0.2k-8.el7.x86_64 already installed and latest version
              Nothing to do
              
              $ sudo yum install unixODBC
              Package unixODBC-2.3.1-11.el7.x86_64 already installed and latest version
              Nothing to do
              

              If you want to use the latest version of unixODBC (2.3.4 at the time of writing), you can get it using cURL by running curl -O ftp://ftp.unixodbc.org/pub/unixODBC/unixODBC-2.3.4.tar.gz

              $ curl -O ftp://ftp.unixodbc.org/pub/unixODBC/unixODBC-2.3.4.tar.gz
                % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                               Dload  Upload   Total   Spent    Left  Speed
              100 1787k  100 1787k    0     0   235k      0  0:00:07  0:00:07 --:--:--  371k

              Please note - as per the documentation, this method will require unixODBC to be compiled so additional dependencies may need to be met for this.

              This article assumes that you have downloaded the ODBC driver for MarkLogic Server and the MarkLogic 9 install binary and have those available on your machine:

              $ ll
              total 310112
              -r--r--r-- 1 support support 316795526 Nov 16 04:19 MarkLogic-9.0-3.x86_64.rpm
              -r--r--r-- 1 support support    754596 Nov 16 04:18 mlsqlodbc-1.3-3.x86_64.rpm
              
              Getting started: installing and configuring MarkLogic 9 with an ODBC Server

              We will start by installing and starting MarkLogic 9:

              $ sudo rpm -i MarkLogic-9.0-3.x86_64.rpm
              $ sudo service MarkLogic start
              Starting MarkLogic:                                        [  OK  ]

              From there, we can point our browser at http://host:8001 and walk through the initial MarkLogic install process:

              As soon as the install process has been completed and you have created an Administrator user for MarkLogic Server, we're ready to create an ODBC Application Server.

              To do this, go to Configure > Groups > Default > App Servers and select the Create ODBC tab:

              Next we're going to make the minimal configuration necessary by entering the required fields - the odbc server name, the Application Server module directory root and the port.

              In this example we will configure the Application Server using the following values:

              odbc server name
              ml-odbc
              root
              /
              port
              5432

              After this is done, confirm that the Application Server has been created by going to Configure > Groups > Default > App Servers and ensure that you can see the ODBC Server listed and configured on port 5432 as per the image below:

              Getting started: Setting up the MarkLogic ODBC Driver

              Use RPM to install the ODBC driver:

              $ sudo rpm -i mlsqlodbc-1.3-3.x86_64.rpm
              odbcinst: Driver installed. Usage count increased to 1.
                  Target directory is /etc

              Configure the base template as instructed in the installation guide:

              $ odbcinst -i -s -f /opt/MarkLogic/templates/mlsql.template
              Getting started: ensure unixODBC is configured

              To ensure the unixODBC commandline client is configured, you can run isql -h to bring up the help options:

              $ isql -h
              
              **********************************************
              * unixODBC - isql                            *
              **********************************************
              * Syntax                                     *
              *                                            *
              *      isql DSN [UID [PWD]] [options]        *
              *                                            *
              * Options                                    *
              *                                            *
              * -b         batch.(no prompting etc)        *
              * -dx        delimit columns with x          *
              * -x0xXX     delimit columns with XX, where  *
              *            x is in hex, ie 0x09 is tab     *
              * -w         wrap results in an HTML table   *
              * -c         column names on first row.      *
              *            (only used when -d)             *
              * -mn        limit column display width to n *
              * -v         verbose.                        *
              * -lx        set locale to x                 *
              * -q         wrap char fields in dquotes     *
              * -3         Use ODBC 3 calls                *
              * -n         Use new line processing         *
              * -e         Use SQLExecDirect not Prepare   *
              * -k         Use SQLDriverConnect            *
              * --version  version                         *
              *                                            *
              * Commands                                   *
              *                                            *
              * help - list tables                         *
              * help table - list columns in table         *
              * help help - list all help options          *
              *                                            *
              * Examples                                   *
              *                                            *
              *      isql WebDB MyID MyPWD -w < My.sql     *
              *                                            *
              *      Each line in My.sql must contain      *
              *      exactly 1 SQL command except for the  *
              *      last line which must be blank (unless *
              *      -n option specified).                 *
              *                                            *
              * Please visit;                              *
              *                                            *
              *      http://www.unixodbc.org               *
              *      nick@lurcher.org                      *
              *      pharvey@codebydesign.com              *
              **********************************************

              If you're not seeing the above message, it could be possible that there's another application on your system overriding this, for this configuration, the isql command is found at /usr/bin/isql:

              $ which isql /usr/bin/isql
              Getting started: initial connection test

              If you're happy that isql is correctly, installed, we're ready to test the connection using isql -v:

              $ isql -v MarkLogicSQL admin admin
              +---------------------------------------+
              | Connected!                            |
              |                                       |
              | sql-statement                         |
              | help [tablename]                      |
              | quit                                  |
              |                                       |
              +---------------------------------------+
              SQL>

              Let's confirm that it's really working by loading some data into MarkLogic and creating an SQL view around that data.

              Loading sample data into MarkLogic

              To load data, we're going to use Query Console to insert the same sample data that is created in the Quick Start Documentation:

              To access Query Console, point your browser at http://host:8000 and make note of the following:

              Ensure the database is set to Documents (or at least, matches the database specified by your ODBC Application Server) and ensure that the Query Type is set to JavaScript

              When these are both set correctly, run the code to generate sample data (note that this data is taken from the quick start guide and reproduced here for convenience):

              After that has run, you should see a null response back from the query:

              To confirm that the data was loaded successfully, you can use the Explore button.  You should now see that 22 employee documents (rows) are now in the database:

              Create the template view

              Now the documents are loaded, a tabular view for that data needs to be created.

              Ensure the database is (still) set to Documents (or at least, matches the database specified by your ODBC Application Server) and ensure that the Query Type is now set to XQuery

              As soon as this is set, you can run the code below to generate the template view (note that this data is taken from the quick start guide and reproduced here for convenience):

              And to confirm this was loaded, Query Console should report an empty sequence was returned.

              Test the template using a SQL Query

              The database should remain set to Documents and ensure that the Query Type is now set to SQL:

              Then you can run the following SQL Query:

              SELECT * FROM employees

              If everything has worked correctly, Query Console should render a view of the table in response to your query:

              Test the SQL Query via the ODBC Driver

              All that remains now is to go back to the shell and test the same connection over ODBC.

              To do this, we're going to use the isql command again and run the same request there:

              $ isql -v MarkLogicSQL admin admin
              +---------------------------------------+
              | Connected!                            |
              |                                       |
              | sql-statement                         |
              | help [tablename]                      |
              | quit                                  |
              |                                       |
              +---------------------------------------+
              SQL> select * from employees
              <<< RESPONSE CUT >>>
              SQLRowCount returns 7
              7 rows fetched
              

              Further reading

              Question Answer Further Reading

              How do I stand up a MarkLogic instance on AWS?

              Launching a MarkLogic AMI via Cloud Formation templates (CFTs) is the best way to stand up MarkLogic instances on AWS as it helps you make use of the Managed Cluster feature, which is designed for easy and reliable cloud deployment.

              You can also run MarkLogic without the Managed Cluster feature (with or without MarkLogic AMIs) - but it is not recommended due to the additional administrative complexity.

              Documentation:

              KB Article:

              Video Tutorial:

              What deployment or provisioning tools are supported?

              MarkLogic supports Cloud Formation Templates

              While not officially supported, we do have customers using tools like Terraform, Ansible and Packer

              KB Articles:

              What are the recommended instance types for MarkLogic deployments?

              • Unfortunately, there is no one single instance type that works for all MarkLogic deployments
              • Do note, however, that MarkLogic deployments generally have higher memory and storage I/O bandwidth requirements than legacy RDBMS deployments - so you'll likely want to start with Memory Optimized, Storage Optimized, or General Purpose instance types
              • The best instance type for your deployment will depend on your application code, workload, networking / system / cluster configurations, storage options, cloud architecture, etc. (not to mention the fact that AWS itself changes quickly and often)
              • We recommend doing extensive testing in lower environments before using a specific instance type in production
              • MarkLogic AMIs will not run on micro instances

              Documentation:

              Can we use Nitro instances?

              • One of the features of the AWS Nitro System instances is that it allows multiple EBS volumes to be attached to the instance in any order.
              • Unfortunately, this behavior doesn't work reliably with the MarkLogic’s Cloud Formation Templates
              • While not recommended due to the additional administrative complexity, if you were to use multiple EBS volumes per node, you should set up additional monitoring to ensure that the hosts rejoin the cluster correctly and that the multiple volumes are mounted correctly after an EC2 node termination

              Does MarkLogic support AWS Graviton instances?

              MarkLogic does not currently support ARM based processors, so AWS Graviton instances are also not supported

              Documentation:

              What is our recommendation around volume management? 

              • Use large EBS volumes as opposed to multiple smaller ones
                • Larger EBS volumes (gp2) have faster IO as described by the Amazon EBS Volume types
                • You have to keep enough spare capacity on each EBS volume to allow for merges
                • The recommendation is to have one large EBS data volume per node - while it’s possible to have multiple volumes per instance, we’ve found that’s not typically worth the additional administrative complexity
              • When resizing, adopt a vertical scaling approach (so growing into a single bigger EBS volume vs. adding multiple smaller volumes per node)
              • Note that S3 storage is eventually consistent, therefore S3 can only be used for backups or read-only forests in MarkLogic Server (otherwise you risk the possibility of data loss)

              Documentation:

              KB Article:

              How do I change the size of EBS volumes attached to MarkLogic AWS EC2 instances?

              In general, the best strategy is to follow Amazon user-guides and best-practices on how to increase storage size or any other kind of system changes.

              Specific to the MarkLogic deployments stood up via the Cloud Formation templates provided by MarkLogic, the following approaches ensure a safe operation:

              • The recommended approach is to shut down the cluster, do the resize using snapshots and restart the cluster
              • You could also use multiple volumes and rebalance, if you wish to avoid downtime

              It’s important to remember that if your cluster has grown enough to need more disk space, it will likely need additional resources, as well - such as CPU, RAM, storage and network bandwidth, etc.

              KB Article:

              What is the typical architecture for ensuring high availability (HA) and disaster recovery (DR) for MarkLogic on AWS?

               

              • Use high availability to protect against availability zone failure, and disaster recovery to protect against region failure
              • For high-availability:
                • Within a cluster, spread your nodes across three different availability zones within a single region, then use local disk failover to have copies of your data in each availability zone
              • For disaster recovery:
                • Place two different clusters in two different regions, then use database replication to have copies of your database in both regions
                • Be aware that cross region traffic is expensive - it may be more cost effective to have both primary and replica clusters in the same region, but keep in mind that you’re then vulnerable to a failure of that region

              Documentation:

              Can I run MarkLogic Server in just two Availability Zones?

              • The best practice is to distribute your nodes across three different Availability Zones (AZs) within a single region
              • If a region has only two AZs, you can’t spread your nodes across enough regions to survive an AZ failure - so consider placing all your nodes in a single AZ to save on inter-zone networking costs
              • Note that two AZs are not a supported configuration for MarkLogic Cloud Formation Templates

              KB Articles:

               

              For my AWS security group, what ports do I need if I’m using a Cloud Formation Template?

              MarkLogic Server needs the same ports open as what you’d configure in an on-premise deployment.

              KB Articles:

               

              Best Practices for resizing a MarkLogic cluster on AWS

              • Resizing a MarkLogic Cluster on AWS can be done vertically or horizontally, similar to how it's done with on-premise deployments
              • Vertical scaling - changes the type of instances
                • You can change the instance type by using the update stack feature
                • Make sure you hibernate the cluster before and restart the cluster after the procedure
              • Horizontal scaling - changes the number of instances
                • Use the update stack feature  by changing the NodesPerZone setting on the CFT
                • Alternatively, use the auto-scaling groups
              • Similarly, data capacity can be resized in two different ways:
                • Resizing using AWS snapshots
                • Resizing using MarkLogic’s rebalancing feature
              • While vertical scale out is significantly easier on AWS vs. on-premise deployments, note that MarkLogic requires at least some degree of horizontal scaling as high availability (HA) requires at least three nodes in a cluster
              • Whether you are scaling nodes or data capacity, horizontally or vertically, it is recommended to:
                • Test your scale out procedure thoroughly before implementing
                • Take full backups of your data before making changes to your cluster

              Documentation:

              How do I upgrade MarkLogic on AWS?

              • In general, it’s important to understand that in on-premise deployments, you keep your machines, but change/upgrade the MarkLogic binary. In contrast, in AWS you keep your data/configuration, and instead change to a new instance with the new binary
              • If you have a MarkLogic AMI launched via a Cloud Formation Template (CFT):
                • If you want to upgrade MarkLogic alone, you must update the AMI IDs in your original CFT as you cannot upgrade your CFT to a different version.
                • If you want to upgrade both the MarkLogic and the CFT versions, you would instead set up a new cluster, then move your data and configuration to the new template, then after thorough testing - switch to the new cluster
              • If you have a custom AMI:
                • You’ll need to perform a manual upgrade or update your custom MarkLogic AWS AMI

              Documentation:

              KB Articles:

              Which load balancers are used for MarkLogic deployments on AWS?

              Classic Load Balancer - Used with MarkLogic Server 9.x, and 10.x until 10.0.-6.1. Also used for single availability zone deployments

              Application Load Balancer - Starting 10.0-6.2 and if the deployment is across the recommended configuration of multiple availability zones

              Network Load Balancer - Needed for ODBC connections

              Documentation:

              How do I monitor EC2 instances, EBS volumes etc?

               

              Documentation:

              How do I secure my Admin password for AWS Deployment?

              • It is not secure to store MarkLogic admin password in marklogic.conf file
              • Use secure S3 bucket in combination with a AMI Role that grants read-only access to the EC2 instances in the cluster

               

              Documentation:

              KB Articles:

              How do we push data from MarkLogic to AWS SQS queue?

              There are no direct functions to send messages to AWS SQS, but it should be possible to use the xdmp:http-post function as detailed below

              https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-making-api-requests.html#structure-post-request

               

              What do we do about the SVC-AWSCRED error?

                Refer KB Article: MarkLogic Fundamentals FAQ - Common Error Messages

                What are 504 Timeout errors? How to resolve them?

                Refer KB Article: MarkLogic Fundamentals FAQ - Common Error Messages

                Introduction

                This article details changes to the upgrade procedures for MarkLogic 9 AMIs.

                MarkLogic 9 now supports 1-click deployment in AWS Marketplace. This is an addition to existing options of manual launch of an AMI and launching MarkLogic clusters via CloudFormation templates. In order to make 1-click launch possible, our AMIs have pre-configured data volume (device on /dev/sdf).  The updated cloud formation templates account for the pre-configured data volume. This change also requires a different approach to our documented upgrade process.

                Details

                As per MarkLogic EC2 Guide, the main goal of the upgrade is to update AMI IDs in CloudFormation in order to upgrade all instances in the stack. There is now an additional step to handle the blank data volume that is pre-configured on MarkLogic AMIs.

                Always backup your data before attempting any upgrade procedures!

                Scenario 1:  You are using unmodified CF templates that were published by MarkLogic on http://developer.marklogic.com/products/cloud/aws starting from version 8.0-3.

                1. Update your CloudFormation stack with the latest template as there were no breaking changes since 8.0-3. The current templates for MarkLogic 9 include new AWS regions, new AMI IDs, and code to remove blank data volume that is bundled with current AMIs.
                2. In the EC2 Dashboard, stop one instance at the time and wait for it to be replaced with a new one.
                3. For a rolling upgrade (and as a good practice) terminate the other nodes one by one starting with the node that has Security database. They will come up and reconnect without any UI interaction.
                4. Go to 8001 port on any new instance where an upgrade prompt should be displayed.
                5. Click OK and wait for the upgrade to complete on the instance.

                Scenario 2: You made some changes to MarkLogic templates or you are using custom templates.

                1. Download current templates from http://developer.marklogic.com/products/cloud/aws.
                2. Locate the AMI IDs by searching for "AWSRegionArch2AMI" block in the template.
                  "AWSRegionArch2AMI": {
                        "us-east-1": {
                          "HVM": "ami-54a8652e"
                        },
                        "us-east-2": {
                          "HVM": "ami-2ab29f4f"
                        }, ...
                3. Locate AMI IDs in the old template and replace them with the ones from the new template. 
                4. Locate "BlockDeviceMappings" section in the new template that was downloaded in step 1. This block of code was added to remove blank volume that is part of the new 1-click AMIs.
                5. Update the old template to include "BlockDeviceMappings" as a property of LaunchConfig. There will be one or three LaunchConfig blocks depending on the template used. Those can by located by searching for "AWS::AutoScaling::LaunchConfiguration". Here is an example of the new property under LaunchConfig.
                  "LaunchConfig":
                  {
                    "Type":"AWS::AutoScaling::LaunchConfiguration",
                  "Properties":
                  {
                  "BlockDeviceMappings":
                  [{
                  "DeviceName":"/dev/sdf",
                  "NoDevice":true,
                  "Ebs": {}
                  }],
                  ...
                6. Once all the changes are saved, update your stack with the updated CloudFormation template. Make sure the stack update is complete.
                7. In the EC2 Dashboard, terminate nodes one by one starting with the node that has Security database. New nodes will come up after a couple of minutes and reconnect without any UI interaction.
                8. Wait for all nodes to be up and in green state.
                9. Go to 8001 port on any new instance where an upgrade prompt should be displayed.
                10. Click OK and wait for the upgrade to complete on the instance.

                Scenario 3: You have instances that were brought up directly from MarkLogic AMI. For each MarkLogic instance in your cluster, do the following:

                1. Terminate the instance.
                2. Launch a new instance from the upgraded AMI.
                3. Detach blank volume that is mounted on /dev/sdf (should be 10GB in size)
                4. Attach the EBS data volume associated with the original instance.

                More details on how to update CloudFormation stack can be found at http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks.html

                Question

                Answer

                Further Reading

                What is MarkLogic's Built-In search feature?

                • MarkLogic is a database with a built-in search engine, providing a single platform to load data from different silos and search/query across all of that data
                • It uses an "Ask Anything" Universal Index where data is indexed as soon as it is loaded - so you can immediately begin asking questions of your data
                • You want built-in search in your database because it:
                  • Removes the need for a bolt-on search engine for full-text searches, unlike other databases
                  • Enables you to immediately search/discover any new data loaded into MarkLogic, while also keeping track of your data as you harmonize it
                  • Can be leveraged when building apps (both transactional and analytical) that require powerful queries to be run efficiently, as well as when you want to build Google-like search features into your application

                Documentation:

                What features are available with MarkLogic search?

                MarkLogic includes rich full-text search features. All of the search features are implemented as extension functions available in XQuery, and most of them are also available through the REST and Java interfaces. This section provides a brief overview some of the main search features in MarkLogic and includes the following parts:

                • High Performance Full Text Search
                • Search APIs
                • Support for Multiple Query Styles
                • Full XPath Search Support in XQuery
                • Lexicon and Range Index-Based APIs
                • Alerting API and Built-Ins
                • Semantic Searches
                • Template Driven Extraction (TDE)
                • Where to Find Additional Search Information

                Documentation:

                KB Article:

                What are the various search APIs provided by MarkLogic?

                MarkLogic provides search features through a set of layered APIs.

                • The built-in, core, full-text search foundations are the XQuery cts:* and JavaScript cts.* APIs
                • The XQuery search:*, JavaScript jsearch.*, and REST APIs above this foundation provide a higher level of abstraction that enable rapid development of search applications.
                  • E.g.: The XQuery search:* API is built using cts:* features such as cts:search, cts:word-query, and cts:element-value-query.
                • On top of the REST API are the Java and Node.js Client APIs that enable users familiar with those interfaces access to the MarkLogic search features

                This diagram illustrates the layering of the Java, Node.js, REST, XQuery (search and cts), and JavaScript APIs.

                Documentation:

                What happens if you decide to change your index settings after loading content?

                The index settings are designed to apply to an entire database and MarkLogic Server indexes records (or documents/fragments) on ingestion based on these settings. If you change any index settings on a database in which documents are already loaded:

                • If the “reindexer” setting on the database is enabled, reindexing happens automatically
                • Otherwise, one should force reindex through the “reindex” option on the database “configure” page or by reloading the data

                Since the reindexer operation is resource intensive, on a production cluster, consider scheduling the reindex during a time when your cluster is less busy.

                Additionally, as reindexing is resource intensive, you’ll be best served to test any index changes on subsets of your data (as reindexing subsets will be faster and use fewer resources), then only promote those index changes to your full dataset once you’re sure those index settings are the ones you’ll want going forward

                Documentation:

                KB Article:

                What is the role of language baseline setting? What are the differences between legacy and ML9 settings?

                The language baseline configuration is for tokenization and stemming language support. The legacy language baseline setting is specified to allow MarkLogic to continue to use the older (MarkLogic 8 and prior versions) stemming and tokenization language support, whereas the ML9 setting would specify that the newer MarkLogic libraries (introduced in MarkLogic 9) are used.

                • If you upgrade to MarkLogic 9 or later from an earlier version of MarkLogic, your installation will continue to use the legacy stemming and tokenization libraries as the language baseline.
                • Any fresh installation of MarkLogic will use the new libraries. If necessary, you can change the baseline configuration using admin:cluster-set-language-baseline.

                Note: In most cases, stemming and tokenization will be more precise in MarkLogic 9 and later.

                Documentation:

                What is the difference between unfiltered vs filtered searches?

                In a typical search:

                • MarkLogic Server will first do index resolution from the D-Nodes - which results in unfiltered search results. Note that unfiltered index resolution is fast but may include false-positive results
                • As a second step, the Server will then do filtering of those unfiltered search results on the E-Nodes to remove false positives from the above result set - which results in filtered search results. In contrast to unfiltered searches, filtered searches are slower but more accurate

                While searches are filtered by default, it is often also possible to explicitly perform a search unfiltered. In general, if search speed, scale, and accuracy are priorities for your application, you’ll want to pay attention to your schemas and data models so unfiltered searches return accurate results without the need for the slower filtering step

                Documentation:

                KB Articles:

                Is filtering during a search bad?

                Filtering isn’t necessarily bad but:

                • It is still an extra step of processing and therefore not performant at scale
                • A bad data model often makes things even worse because they’ll typically require unnecessary retrieval of large amounts of unneeded information during index resolution - all of which then will be filtered on the e-nodes

                To avoid performance issues with respect to filtering, try:

                • Adding additional indexes
                • Improving your data model to more easily index/search without filtering
                • Structuring documents and configuring indexes to maximize both query accuracy and speed through unfiltered index resolution alone

                Documentation:

                KB Articles:

                What is the difference between cts.search vs jsearch?

                • cts.search() runs filtered by default.
                • JSearch runs unfiltered by default.
                  • JSearch can enable filtering by chaining the filter() method when building the query: http://docs.marklogic.com/DocumentsSearch.filter

                Note: Filtering is not performant at scale, so the better approach is to tune your data model and indexes such that filtering is not necessary.

                Documentation:

                What is the difference between Stemmed Searches vs Unstemmed (word) searches?


                Stemmed 

                Unstemmed

                Controls whether searches return relevance ranked results by matching word stems.

                A word stem is the part of a word that is common to all of its inflected variants.

                For example, in English, "run" is the stem of "run", "runs", "ran", and "running".

                Enables MarkLogic Server to return relevance ranked results which match exact words in text elements. 

                A stemmed search returns more matching results than the exact words specified in the query.

                A stemmed search for a word finds the same terms as an unstemmed search, plus terms that derive from the same meaning and part of speech as the search term.

                For example, a stemmed search for run returns results containing run, running, runs, and ran. 

                Unstemmed searches return exact word-only matches

                Stemmed search indexes take up less disk space than the word search (unstemmed) indexes.

                You have to decide based on your application requirements if the cost of creating extra indexes is worthwhile for your application, and whether you can fulfill the same requirements without some of the indexes.

                Documentation:

                What is the difference between fn:count and xdmp:estimate?

                In general, if fast accurate counts are important to your application, you’ll want to use xdmp:estimate with a data model that will allow for accurate counts directly from the indexes


                fn:count

                xdmp:estimate

                Provided by XQuery as a general-purpose function

                Provided by MarkLogic Server as an efficient way to approximate fn:count

                Processes the answer by inspecting data directly causing heavy I/O load

                Computes its answer directly from indexes

                Counts the actual number of items in the sequence

                Returns the number of matching fragments

                fn:count is accurate

                xdmp:estimate is fast

                The general-purpose nature of fn:count makes it difficult to optimize

                Puts the decision to optimize counting through the use of indexes in the hands of the developer

                Documentation:

                KB Article:

                How do data models affect Search?

                Some data model designs pull lots of unnecessary data from the indexes with every query. That means your application will:

                • Need to do a lot of filtering on the e-nodes
                • Use more CPU cycles on the e-node to do that filtering
                • Even with filtering disabled, you’re still be pulling lots of position information from the indexes - which means you’ll be using lots of CPU on the e-nodes to evaluate which positions are correct (and unlike filtering, position processing can’t be toggled on/off)
                • Retrieving more data means an increased likelihood of CACHEFULL errors

                How you represent your data heavily informs the speed, accuracy, and ease of construction of your queries. If your application needs to perform and/or scale, its data model is the first and most important thing on which to focus

                Documentation:

                KB Articles:

                How do I optimize my application’s queries?

                There are several things to consider when looking at query performance:

                • How fast does performance need to be for your application?
                • What indexes are defined for the database?
                • Is your code written in the most efficient way possible?
                • Can range indexes and lexicons speed up your queries?
                • Are your server parameters set appropriately for your system?
                • Is your system sufficiently large for your needs?
                • Access patterns and resource requirements differ for analytic workloads

                Here is a checklist for optimizing query performance:

                • Is your query running in “Accidental” update mode?
                • Are you running cts:search unfiltered?
                • Profile your code
                • Use indexes when appropriate
                • Optimize cts:search using indexes
                • Tuning queries with query-meters and query-trace

                Documentation:

                Blog:

                KB Article:

                How to ensure wildcard searches are fast?

                The following database settings can affect the performance and accuracy of wildcard searches:

                • word lexicons
                • element, element attribute, and field word lexicons. (Use an element word lexicon for a JSON property).
                • three character searches, two character searches, or one character searches. You do not need one or two character searches if three character searches is enabled.
                • three character word positions
                • trailing wildcard searches, trailing wildcard word positions, fast element trailing wildcard searches
                • fast element character searches

                The three character searches index combined with the word lexicon provides the best performance for most queries, and the fast element character searches index is useful when you submit element queries. One and two character searches indexes are only used if you submit wildcard searches that try to match only one or two characters and you do not have the combination of a word lexicon and the three character searches index. Because one and two character searches generally return a large number of matches and result in much larger index storage footprints, they usually are not worth subsequent disk space and load time trade-offs for most applications

                Lastly, consider using query plans to help optimize your queries. You can learn more about query optimization by consulting our Query Performance and Tuning Guide

                Documentation:

                Blog:

                What are the factors that affect relevance score calculations?

                The score is a number that is calculated based on

                • Statistical information, including the number of documents in a database
                • The frequency in which the search terms appear in the database
                • The frequency in which the search term appears in the document

                The relevance of a returned search item is determined based on its score compared with other scores in the result set, where items with higher scores are deemed to be more relevant to the search.

                By default, search results are returned in relevance order, so changing the scores can change the order in which search results are returned.

                Documentation:

                KB Article:

                How do I restrict my searches to only parts of my documents (or exclude parts of my documents from searches altogether)?

                MarkLogic Server has multiple ways to include/exclude parts of documents from searches.

                At the highest level you can apply these restrictions globally by including/excluding elements in word queries. Alternative (and preferably), you can also define specific fields, which are a mechanism designed to restrict searches to specifically targeted elements within your document

                KB Article:

                How do I specify that the match must be restricted to the top level attributes of my JSON document?

                You can configure fields in the database settings that are used with the cts:field-word-query, cts:field-words, and cts:field-word-match APIs, as well as with the field lexicon APIs in order to fetch the desired results. 

                You can create a field for each top-level JSON property you want to match with indexes. In the field specification you should use a path expression /property-name for the top-level property "property-name". Then use field queries to match the top level property.

                Depending on your use-case, this could be an expensive operation due to the indexes involved resulting in slower document loads and larger database files.

                Documentation:

                How to resolve "Searches not enabled" error?

                Make sure proper indexes are in place and there are no reindexing-related errors

                Documentation:



                Introduction: the decimal type

                In order to be compliant with the XQuery specification and to satisfy the needs of customers working with financial data, MarkLogic Server implements a decimal type, available in XQuery and server-side JavaScript.

                Decimal type has been implemented for very specific requirements, decimals have about a dozen more bits of precision than doubles but take up more memory and arithmetic operations over them are much slower.

                Use the double where possible

                Unless you have a specific requirement to use a Decimal data type, in most case it's better and faster to use the double data type to represent large numbers.

                Specific details about the decimal data type

                If you still want or need to use a decimal data type below are its limitations and details on how exactly it is implemented in MarkLogic Server:

                o   Precision

                • How many decimal digits of precision does it have?

                The MarkLogic implementation of xs:decimal representation is designed to meet the XQuery specification requirements to provide at least 18 decimal digits of precision. In practice, up to 19 decimal digits can be represented with full fidelity.

                • If it is a binary number, how many binary digits of precision does it have?

                 A decimal number is represented inside MarkLogic with 64 binary bits of digits and an additional 64 bits of sign and a scale (specifies where the decimal point is).

                • What are the exact upper and lower bounds of its precision?

                -18446744073709551615 to 18446744073709551615 

                Any operation producing number smaller or bigger than this range will result in XDMP-DECOVRFLW error (decimal overflow)

                o   Scale

                • Does it have a fixed scale or floating scale?

                It has a floating scale.

                • What are the limitations on the scale?

                -20 to 0

                So you can only represent numbers between 1 * (2^-64) and 18446744073709551615

                • Is the scale binary or decimal?

                Decimal

                • How many decimal digits can it scale?

                20

                • How many binary digits can it scale?

                N/A

                • What is the smallest number it can represent and the largest?

                smallest: -1*(2^64)
                closest to zero: 1*(10^-20)
                largest: (2^64)

                • Are all integers safe or does it have a limited safe range for integers?

                It can represent 64 bit unsigned integers with full fidelity.

                 

                o   Limitations

                • Does it have binary rounding errors?

                The division algorithm on Linux in particular does convert to an 80-bit binary floating point representation to calculate reciprocals - which can result in binary rounding errors. Other arithmetic algorithms work solely in base 10.

                • What numeric errors can it throw and when?

                Overflow: Number is too big or small to represent
                Underflow: Number is close to zero to represent
                Loss of precision: The result has too many digits of precision (essentially the 64bit digits value has overflowed)

                • Can it represent floating point values, such as NaN, -Infinity, +Infinity, etc.?

                 No

                o   Implementation

                • How is the DECIMAL data type implemented?

                It has a representation with 64 bits of digits, a sign, and a base 10 negative exponent (fixed to range from -20 to 0). So the value is calculated like this:

                sign * digits * (10 ^ -exponent)

                • How many bytes does it consume?

                On disk, for example in triple indexes, it's not a fixed size as it uses integer compression. At maximum, the decimal scalar type consumes 16 bytes per value: eight bytes of digits, four bytes of sign, and four bytes of scale. It is not space efficient but it keeps the digits aligned on eight-byte boundaries.

                Summary

                A database or forest backup in MarkLogic Server may be significantly slower than just performing a file copy (cp in Linux).  Why is this so?

                Details

                Using cp on very large files on a large-memory linux can produce huge amounts of dirty pages that can saturate i/o channels for minutes in order to flush data to the disk. Cp also doesn’t wait for the data to be written before returning.  As a result, cp is very unfriendly to other applications running on the same system.

                When MarkLogic Server performs a backup, it works hard not to saturate any subsystem or resource. MarkLogic takes care that the number of dirty pages at any one time is never very large, and it keeps the i/o queues short so that any concurrent database queries and updates are not significantly impacted by the backup. Finishing the backup in the fastest possible time is not the priority. 

                Can I make it go faster?

                Yes, there is a diagnostic trace event “Unthrottle Backup” that turns off throttling in MarkLogic. However, even with throttling turned off, MarkLogic will still work to keep the number of dirty pages low.

                The diagnostic trace event can be enabled from the MarkLogic Server Admin UI by navigating to -> Configure -> Groups -> {group-name} -> Diagnostic:  trace events activated = true; Add  “Unthrottle Backup” (without quotes); Press "ok".

                Introduction

                MarkLogic automatically provides 

                • ANSI REPEATABLE READ level of isolation for update transactions, and 
                • Serializable isolation for read-only (query) transactions.

                MarkLogic can be made to provide ANSI SERIALIZABLE isolation for update transactions, but doing so requires developers to manage their own predicate locks.

                Isolation Levels - Background

                There are many possible levels of isolation, and many different taxonomies of isolation levels. The most common taxonomy (familiar to those with a RDBMS background) is the one defined by ANSI SQL, which defines four levels of isolation based on read phenomena that are possible at each level. ANSI has a definition for each phenomenon, but these definitions are open to interpretation. Broad interpretation results in more rigorous criteria for each isolation level (and therefore better isolation at each level), whereas strict interpretation results in less rigorous isolation at each level. Here I’ll use a shorthand notation to describe these phenomena, and will use the broad rather than the strict interpretation. The notation specifies the operation, the transaction performing the operation, and the item or domain on which the operation is performed. Operations in my notation are:

                • Write (w)
                • Read (r)
                • Commit (c)
                • Abort/rollback (a)

                An example of this shorthand: w1[x] means transaction1 writes to item x.

                Now the phenomena:

                • A dirty read happens when a transaction T2 reads an item that is being written by concurrently running transaction T1. In other words: w1[x]…r2[x]…((c1 or a1) and (c2 or a2) in any order). This phenomenon could lead to an anomaly in the case where T1 later aborts, and T2 has then read a value that never existed in the database.
                •  A non-repeatable read happens when a transaction T2 writes an item that was read by a transaction T1 prior to T1 completing. In other words: r1[x]…w2[x]…((c1 or a1) and (c2 or a2) in any order). Non-repeatable reads don’t produce the same anomalies as dirty reads, but can produce errors in cases where T1 relies on the value of x not changing between statements in a multi-statement transaction (e.g. reading and then updating a bank account balance).
                • A phantom read happens when a transaction T1 retrieves a set of data items matching some search condition and concurrently running transaction T2 makes a change that modifies the set of items that match that condition. In other words: (r1[P] and w2[x in P] in any order)…((c1 or a1) and (c2 or a2) in any order), where P is a set of results. Phantom reads are usually less serious than dirty or non-repeatable reads because it generally doesn’t matter if item x in P is written before or after T1 finishes unless T1 is itself explicitly reading x. And in this case the phenomenon would no longer be a phantom, but would instead be a dirty or non-repeatable read per the definitions above. That said, there are some cases where phantom reads are important.

                 The isolation levels ANSI defines are based on which of these three phenomena are possible at that isolation level. They are:

                • READ UNCOMMITTED – all three phenomena are possible at this isolation level.
                • READ COMMITTED – Dirty reads are not possible, but non-repeatable and phantom reads are.
                • REPEATABLE READ – Dirty and non-repeatable reads are not possible, but phantom reads are.
                • SERIALIZABLE – None of the three phenomena are possible at this isolation level.

                Note that as defined above, ANSI SERIALIZABLE is not sufficient for transactions to be truly serializable (in the sense that running them concurrently and running them in series would in all cases produce the same result), so SERIALIZABLE is an unfortunate choice of names for this isolation level, but that’s what ANSI called it.

                Update Transaction Locks

                Typically, a DBMS will avoid dirty and non-repeatable reads by taking locks on records (called item locks). Locks are either shared locks (which can be held by more than one transaction) or exclusive locks (which can be held by only one transaction at a time). In most DBMSes (including MarkLogic), locks taken when reading an item are shared and locks taken when writing an item are exclusive.

                MarkLogic prevents dirty and non-repeatable reads in update transactions by taking item locks on items that are being read or written during a transaction and releasing those locks only on completion of the transaction (post-commit or post-abort). When a transaction needs to lock an item on which another transaction has an exclusive lock, that transaction waits until either the lock is released or the transaction times out. Deadlock detection prevents cases where two transactions are waiting on each other for exclusive locks. In this case one of the transactions will abort and restart.

                In addition, MarkLogic prevents some types of phantom reads by taking item locks on the set of items in a search result. This prevents phantom reads involving T2 removing an item in a set that T1 previously searched, but does not prevent phantom reads involving T2 inserting an item in a set that T1 previously searched, or those involving T2 searching for items and seeing a deletion caused by T1.

                Avoiding All Phantom Reads

                To avoid all phantom reads via locking, it is necessary to take locks not just on items that currently match the search criteria, but also on all items that could match the search criteria, whether they currently exist in the database or not. Such locks are called predicate locks. Because you can search for pretty-much anything in MarkLogic, guaranteeing a predicate lock for arbitrary searches would require locking the entire database. From a concurrency and throughput perspective, this is obviously not desirable. MarkLogic therefore leaves the decision to take predicate locks and the scope of those locks in the hands of application developers. Because the predicate domain can frequently be narrowed down with some application-specific knowledge, this provides the best balance between isolation and concurrency. To take a predicate lock, you lock a synthetic URI representing the predicate domain in every transaction that reads from or writes to that domain. You can take shared locks on a synthetic URI via fn:doc(URI). Exclusive locks are taken via xdmp:lock-for-update(URI).

                Note that predicate locks should only be taken in situations where phantom reads are intolerable. If your application can get by with REPEATABLE READ isolation, you should not take predicate locks, because any additional locking results in additional serialization and will impact performance.

                Summary

                To summarize, MarkLogic automatically provides ANSI REPEATABLE READ level of isolation for update transactions and true serializable isolation for read-only (query) transactions. MarkLogic can be made to provide ANSI SERIALIZABLE isolation for update transactions, but doing so requires developers to manage their own predicate locks.

                Introduction

                Rosetta 2 is a seamless, very efficient emulator designed to bridge the transition between Intel and Apple Silicon processors (e.g. M1[x]). The first time you launch a Mac app on an Apple Silicon computer, you might be asked to install the Rosetta component to open it. 
                Currently, when installing MarkLogic Server DMG (pkg) on Apple Silicon macOS, you will be blocked by the following error:
                “MarkLogic Server ([version]) can’t be installed on this computer.
                MarkLogic Server requires an Intel processor.”
                The error above is caused by MarkLogic’s macOS system call to verify if it’s running on an Intel processor. This legacy check was required when Apple was transitioning from PowerPC to Intel CPUs (announced in June 2005, Rosetta 1 emulation). MarkLogic Server has never been available for PowerPC-based Apple Computers. In order to install MarkLogic’s Intel package on Apple Silicon, the legacy check has to be removed from the installation script.

                Procedure

                *1. Open a Terminal [0] and install Rosetta2 emulation software.

                $ softwareupdate --install-rosetta

                Note: For additional information, please check the official Apple Rosetta 2 article. [1]
                [1] https://support.apple.com/en-us/HT211861  
                * Step not required if Rosetta 2 is already installed for other Intel-based applications.

                2. Download any ML server DMG from the ML-DMC website [2]
                [2] https://developer.marklogic.com/products/marklogic-server  

                3. Mount the DMG and copy the install package to a writable temporary location in the local filesystem

                $ cp -R /Volumes/MarkLogic/ /Users/[your_user_name]/tmp

                4. In a Terminal window, edit Contents/Resources/InstallationCheck in a text editor (e.g. vim or nano)

                $ vim /Users/[your_username]/tmp/MarkLogic-[downloaded_package_version].pkg/Contents/Resources/InstallationCheck 

                Note: As an alternative, in the GUI-Finder, right-click and "Show Package Contents”. Navigate to “Contents/Resources/“, and edit the file “InstallationCheck” with a GUI text editor.

                5. Delete or comment out the block starting with (lines 46-52) and save the file “InstallationCheck”:

                 46 echo "Checking for Intel CPU"
                 47 if [[ $CPU_TYPE != "7" ]] ;
                 48    then
                 49    echo "MarkLogic Server requires a CPU with an Intel instruction set."
                 50    exit 114;     # displays message 18
                 51 fi
                 52 echo "$CPU_NAME is an Intel CPU."Save the file and back out of the folder

                6. Install the MarkLogic package from the GUI Finder or CLI as intended. [3]
                [3] https://docs.marklogic.com/guide/installation/procedures#id_28962 

                Conclusions
                • The procedure in this knowledge base article allows to install MarkLogic Server on macOS Rosetta2 - Apple Silicon M1 / M[x]. 
                • MacOS is supported for development only. Conversion (Office and PDF) and entity enrichment are not available on macOS. [4] 
                • The legacy installation check is removed starting from MarkLogic 10.0-10+ release.
                • Once the legacy check is removed, Rosetta 2 emulation software will be still required till an official native M1 / M[x] MarkLogic Server package will be available.

                References
                [0] https://support.apple.com/guide/terminal/open-or-quit-terminal-apd5265185d-f365-44cb-8b09-71a064a42125/ 
                [1] https://support.apple.com/en-us/HT211861  

                [2] https://developer.marklogic.com/products/marklogic-server  
                [3] https://docs.marklogic.com/guide/installation/procedures#id_28962 
                [4] https://docs.marklogic.com/guide/installation/intro#id_63469 

                Summary

                Text is stored in MarkLogic Server in Unicode NFC normalized form.

                Discussion

                In MarkLogic Server, all text is converted into Unicode NFC normalized form before tokenization and storage. 

                Unicode considers NFC-compatible characters to be essentially equivalent. See the Unicode normalization FAQ and Conformance Requirements in the Unicode Standard.

                Example

                For example, consider the NFC equivalence of the codepoints x2126 (&#x2126) and x03A9 (&#x03A9). This is shown for the x2126 entry in the Unicode code chart for the U2100 block.

                You can see the effects of normalization alone, and during tokenization, by running the following in MarkLogic Server's Query Console:

                xquery version "1.0-ml";
                (: equivalence of Ω forms :)
                let $s := fn:codepoints-to-string (xdmp:hex-to-integer ('2126'))
                let $token := cts:tokenize ($s)
                return (
                    'original: '||xdmp:integer-to-hex (fn:string-to-codepoints ($s)),
                    'normalized: '||xdmp:integer-to-hex (fn:string-to-codepoints (fn:normalize-unicode ($s, 'NFC'))),
                    'tokenized: '||xdmp:describe ($token, (), ())
                )
                

                The results show the original value, the normalized value, and the resulting token:

                original: 2126
                normalized: 3a9
                tokenized: cts:word("&#x03a9;")
                

                Abstract

                In MarkLogic Server version 9, the default tokenization and stemming code has been changed for all languages (except English tokenization). Some tokenization and stemming behavior will change between MarkLogic 8 and MarkLogic 9. We expect that, in most cases, results will be better in MarkLogic 9.

                Information is given for managing this change in the Release Notes at Default Stemming and Tokenization Libraries Changed for Most Languages, and for further related features at New Stemming and Tokenization.

                In-depth discussion is provided below for those interested in details.

                General Comments on Incompatibilities

                General implications of tokenization incompatibilities

                If you do not reindex, old content may no longer match the same searches, even for unstemmed searches.

                General tokenization incompatibilities

                There are some edge-case changes in the handling of apostrophes in some languages; in general this is not a problem, but some specific words may include/break at apostrophes.

                Tokenization is generally faster for all languages except English and Norwegian (which use the same tokenization as before).

                General implications of stemming incompatibilities

                Where there is only one stem, and it is now different:  Old data will not match stemmed searches without reindexing, even for the
                same word.

                Where the new stems are more precise:  Content that used to match a query may not match any more, even with
                reindexing.

                Where there are new stems, but the primary stem is unchanged:  Content that used to not match a query may now match it with advanced
                stemming or above. With basic stemming there should be no change.

                Where the decompounding is different, but the concatenation of the components is the same:  Under decompounding, content may match a query when it used to not match, or may not match a query when it used to match, when the query or content involves something with one of the old/new components. Matching under advanced or basic stemming would be generally the same.

                General stemming incompatibilities

                • MarkLogic now has general algorithms backing up explicit stemming dictionaries.  Words not found in the default dictionaries will sometimes be stemmed when they previously were not.
                • Diminutives/augmentatives are not usually stemmed to base form.
                • Comparatives/superlatives are not usually stemmed to base form.
                • There are differences in the exact stems for pronoun case variants.
                • Stemming is more precise and restricted by common usage. For example, if the past participle of a verb is not usually used as an adjective, then the past participle will not be included as an alternative stem. Similarly, plural forms that only have technical or obscure usages might not stem to the singular form.
                • Past participles will typically include the past participle as an alternative stem.
                • The preferred order of stems is not always the same: this will affect search under basic stemming.

                Reindexing

                It is advisable to reindex to be sure there are no incompatibilities. Where the data in the forests (tokens or stems) does not match the current behavior, reindexing is recommended. This will have to be a forced reindex or a reload of specific documents containing the offending data. For many languages this can be avoided if queries do not touch on specific cases. For certain languages (see below) the incompatibility is great enough that it is essential to reindex.

                Language Notes

                Below we give some specific information and recommendations for various languages.

                Arabic

                stemming

                The Arabic dictionaries are much larger than before. Implications:  (1) better precision, but (2) slower stemming.

                Chinese (Simplified)

                tokenization

                Tokenization is broadly incompatible.

                The new tokenizer uses a corpus-based language model.  Better precision can be expected.

                recommendation

                Reindex all Chinese (simplified).

                Chinese (Traditional)

                tokenization

                Tokenization is broadly incompatible.

                The new tokenizer uses a corpus-based language model.  Better precision can be expected.

                recommendation

                Reindex all Chinese (traditional).

                Danish

                tokenization

                This language now has algorithmic stemming, and may have slight tokenization differences around certain edge cases.

                recommendation

                Reindex all Danish content if you are using stemming.

                Dutch

                stemming

                There will be much more decompounding in general, but MarkLogic will not decompound certain known lexical items (e.g., "baastardwoorden").

                recommendation

                Reindex Dutch if you want to query with decompounding.

                English

                stemming

                British variants may include the British variant as an additional stem, although the first stem will still be the US variant.

                Stemming produces more alternative stems. Implications are (1) stemming is slightly slower and (2) index sizes are slightly larger (with advanced stemming).

                Finnish

                tokenization

                This language now has algorithmic stemming and may have slight tokenization differences around certain edge cases.

                recommendation

                Reindex all content in this language if you are using stemming.

                French

                See general comments above.

                German

                stemming

                Decompounding now applies to more than just pure noun combinations. For example, it applies to "noun plus adjectives" compound terms. Decompounding is more aggressive, which can result in identification of more false compounds. Implications: (1) stemming is slower, (2) decompounding takes more space, and (3) for compound terms, search gives better recall, with some loss of precision.

                recommendation

                Reindex all German.

                Hungarian

                tokenization

                This language now has algorithmic stemming and may have slight tokenization differences around certain edge cases.

                recommendation

                Reindex all content in this language if you are using stemming.

                Italian

                See general comments above.

                Japanese

                tokenization

                Tokenization is broadly incompatible.

                The tokenizer provides internal flags that the stemmer requires.  This means that (1) tokenization is incompatible for all words at the storage level due to the extra information and (2) if you install a custom tokenizer for Japanese, you must also install a custom stemmer.

                stemming

                Stemming is broadly incompatible.

                recommendation

                Reindex all Japanese content.

                Korean

                stemming

                Particles (e.g., 이다) are dropped from stems; they used to be treated as components for decompounding.

                There is different stemming of various honorific verb forms.

                North Korean variants are not in the dictionary, though they may handled by the algorithmic stemmer.

                recommendation

                Reindex Korean unless you use decompounding.

                Norwegian (Bokmal)

                stemming

                Previously, hardly any decompounding was in evidence; now it is pervasive.

                Implications: (1) stemming is slower, (2) decompounding takes more space, and (3) search gives better recall, with some loss of precision, at least where it comes to compounds.

                recommendation

                Reindex Bokmal if you want to query with decompounding.

                Norwegian (Nynorsk)

                stemming

                Previously hardly any decompounding was in evidence; now it is pervasive.

                Implications: (1) stemming is slower, (2) decompounding takes more space, and (3) search gives better recall, with some loss of precision, at least where it comes to compounds.

                recommendation

                Reindex Nynorsk if you want to query with decompounding.

                Norwegian (generic 'no')

                stemming

                Previously 'no' was treated as an unsupported language; now it is treated as both Bokmal and Nynorsk: for a word present in both dialects, all stem variants from both will be present.

                recommendation

                Do not use 'no' unless you really must; reindex if you want to query it.

                Persian

                See general comments above.

                Portuguese

                stemming

                More precision with respect to feminine variants (e.g., ator vs atriz).

                Romanian

                tokenization

                This language now has algorithmic stemming and may have slight tokenization differences around certain edge cases.

                recommendation

                Reindex all content in this language if you are using stemming.

                Russian

                stemming

                Inflectional variants of cardinal or ordinal numbers are no longer stemmed to a base form.

                Inflectional variants of proper nouns may stem together due to the backing algorithm, but it will be via affix-stripping, not to the nominal form.

                Stems for many verb forms used to be the perfective form; they are now the simple infinitive.

                Stems used to drop ё but now preserve it.

                recommendation

                Reindex all Russian.

                Spanish

                See general comments above.

                Swedish

                stemming

                Previously hardly any decompounding was in evidence; now it is pervasive.

                Implications: (1) stemming is slower, (2) decompounding takes more space, and (3) search gives better recall, with some loss of precision, at least where it comes to compounds.

                recommendation

                Reindex Swedish if you want to query with decompounding.

                Tamil

                tokenization

                This language now has algorithmic stemming and may have slight tokenization differences around certain edge cases.

                recommendation

                Reindex all content in this language if you are using stemming.

                Turkish

                tokenization

                This language now has algorithmic stemming and may have slight tokenization differences around certain edge cases.

                recommendation

                Reindex all content in this language if you are using stemming.

                What is MarkLogic Data Hub?

                MarkLogic’s Data Hub increases data integration agility, in contrast to time consuming upfront data modeling and ETL. Grouping all of an entity’s data into one consolidated record with that data’s context and history, a MarkLogic Data Hub provides a 360° view of data across silos. You can ingest your data from various sources into the Data Hub, standardize your data - then more easily consume that data in downstream applications. For more details, please see our Data Hub documentation.

                Note: Prior to version 5.x, Data Hub was previously known as Data Hub Framework (DHF)

                Takeaways:

                • In contrast to previous versions, Data Hub 5 is largely configuration-based. Upgrading to Data Hub 5 will require either:
                  • Conversion of legacy flows from the code-based approach of previous versions to the configuration-based format of Data Hub 5
                  • Executing your legacy flows with the “hubRunLegacyFlow” Gradle task
                • It’s very important to verify the “Version Support” information on the Data Hub GitHub README.md before installing or upgrading to any major Data Hub release

                Pre-requisites:

                One of the pre-requisites for installing Data Hub is to check for the supported/compatible MarkLogic Server version. For details, see our version compatibility matrix. Other pre-requisites can be seen here.

                New installations of Data Hub

                We always recommend installing the latest Data Hub version compatible with your current MarkLogic Server version. For example:

                -If a customer is running MarkLogic Server 9.0-7, one should install the most recent compatible Data Hub version (5.0.2), even if the previous Data Hub versions (such as 5.0.1, 5.0.0, 4.x and 3.x) also work with server version 9.0-7.

                -Similarly, if a customer is running 9.0-6, the recommended Data Hub version would be 4.3.1 instead of previous versions 4.0.0, 4.1.x, 4.2.x and 3.x.

                Note: A specific MarkLogic server version can be compatible with multiple Data Hub versions and vice versa, which allows independent upgrades of either Data Hub or MarkLogic Server.

                 

                Upgrading from a previous version

                1. To determine your upgrade path, first find your current Data Hub version in the “Can upgrade from” column in the version compatibility matrix.
                2. While Data Hub should generally work with future server versions, it’s always best to run the latest Data Hub version that's also explicitly listed as compatible with your installed MarkLogic Server version.
                3. If required, make sure to upgrade your MarkLogic Server version to be compatible with your desired Data Hub version. You can upgrade MarkLogic Server and Data Hub independently of each other as long as you are running a version of MarkLogic Server that is compatible with the Data Hub version you plan to install. If you are running an older version of MarkLogic Server, then you must upgrade MarkLogic Server first, before upgrading Data Hub.

                Note: Data Hub is not designed to be 'backwards' compatible with any version before the MarkLogic Server version listed with the release. For example, you can’t use Data Hub 3.0.0 on 9.0-4 – you’ll need to either downgrade to Data Hub 2.0.6 while staying on MarkLogic Server 9.0-4, or alternatively upgrade MarkLogic Server to version 9.0-5 while staying on Data Hub 3.0.0.

                • Example 1 - Scenario where you DO NOT NEED to upgrade MarkLogic Server:

                         

                • Current Data Hub version: 4.0.0
                • Target Data Hub version: 4.1.x
                • ML server version: 9.0-9
                • The “Can upgrade from” value for the target version shows 2.x which means you need to be at least be on Data Hub 2.x. Since, the current Data Hub version is 4.0.0, this requirement has been met.
                • Unless there is a strong reason for choosing 4.1.x, we highly recommend to upgrade to the latest version compatible with MarkLogic Server 9.0-9 in 4.x - which in this example is 4.3.2. Consequently, the recommended upgrade path here becomes 4.0.0-->4.3.2 instead of 4.0.0-->4.1.x.
                • Since 9.0-9 is supported by the recommended Data Hub version 4.3.2, there is no need to upgrade ML server.
                • Hence, recommended path will be Data Hub 4.0.0-->4.3.2

                 

                • Example 2 - Scenario where you NEED to upgrade MarkLogic Server:

                           

                • Current Data Hub version: 3.0.0
                • Target Data Hub version: 5.0.2
                • ML server version: 9.0-6
                • The “Can upgrade from” value for the target version shows Data Hub version 4.3.1 which means you need to be at least be on 4.3.x (4.3.1 or 4.3.2 depending on your MarkLogic Server version). Since the current Data Hub version 3.0.0 doesn’t satisfy this requirement, upgrade path after this step becomes Data Hub 3.0.0-->4.3.x
                • As per the matrix, the latest compatible Data Hub version for 9.0-6 is 4.3.1, so the path becomes 3.0.0-->4.3.1
                • From the matrix, the minimum supported MarkLogic Server version for 5.0.2 is 9.0-7, so you will have to upgrade your MarkLogic Server version before upgrading your Data Hub version to 5.0.2.
                • Because 9.0-7 is supported by all 3 versions under consideration (3.0.0, 4.3.1 and 5.0.2), recommended path can be either
                  1. 3.0.0-->4.3.1-->upgrade MarkLogic Server version to at least 9.0-7-->upgrading Data Hub version to 5.0.2
                  2. Upgrading MarkLogic Server version to at least 9.0-7-->upgrade Data Hub from 3.0.0 to 4.3.1-->upgrade Data Hub version to 5.0.2
                • Recall that Data Hub 5 moved to a configuration-based approach from previous versions’ code-based approach. Upgrading to Data Hub 5 from a previous major version will require either:
                  • Conversion of legacy flows from the code-based approach of previous versions to the configuration-based format of Data Hub 5
                  • Executing your legacy flows with the “hubRunLegacyFlow” Gradle task

                Links for Reference:

                https://docs.marklogic.com/datahub/upgrade.html

                 

                 

                 

                 

                Question

                Answer

                Further Reading

                What are the maximum and minimum number of nodes a MarkLogic Cluster can have?

                Minimum: 1 node (3 nodes if you want high availability)

                Optimum: ~64 nodes

                Maximum: 256 nodes

                KB Articles:

                Documentation:

                Are all nodes created equal in MarkLogic?

                In MarkLogic, how a node is configured, provisioned, and scaled depends on the type of that node and what roles it might serve:

                • A single node can act as an e-node, d-node, or both ("e/d-node")
                • With respect to high availability/failover, any one node serves as both primary host (for its assigned data forests) and failover host (for its assigned failover forests)
                • With respect to disaster recovery/replication, nodes can serve as either hosts for primary data forests in the primary cluster, or as hosts for replica forests in the replica cluster
                • Bootstrap hosts  are used to establish an initial connection to foreign clusters during database replication. Only the nodes hosting your security forests (both primary security forests as well as their local disk failover copies) need to be bootstrap hosts

                KB Articles:

                Documentation:

                Can I have nodes with mixed specifications within a cluster?

                • Queries in MarkLogic Server use every node in the cluster
                • Fast nodes will wait for slow nodes - especially slow d-nodes
                • Therefore, all nodes - especially all d-nodes - should be of the same hardware specification

                KB Articles:

                Documentation:

                Does MarkLogic support Horizontal Scaling or Vertical Scaling?

                • Both horizontal (more nodes) and vertical scaling (bigger nodes) are possible with MarkLogic Server
                • Do note that high availability (HA) in MarkLogic Server requires at least some degree of horizontal scaling with a minimum of three nodes in a cluster
                • Given the choice between one big node and three smaller nodes, most deployments would be better off with three smaller nodes to take advantage of HA

                Documentation:

                 

                I'm confused about high availability (HA) vs. disaster recovery (DR) - How does MarkLogic do HA?  - How does MarkLogic do DR?

                • High Availability (HA) in MarkLogic Server involves automatic forest failover, which maintains database availability in the face of host failure. Failing back is a manual operation
                • Disaster Recovery (DR) in MarkLogic Server involves a separate copy - with smaller data deltas (database replication) or larger (backup/restore). Switching to and back from DR copies are both manual operations

                Documentation:

                How many forests can a MarkLogic cluster have?

                • There is a design limit of 1024 forests (including Local Disk Failover forests)
                • If you need more than 1024 forests, look into super-clusters and super-databases

                KB Articles

                Docuementation:

                How to calculate the I/O bandwidth on a ML node?

                • I/O bandwidth of a node can be calculated with the following formula:
                  • (# of forests per node*I/O bandwidth per forest)
                • If your node has a 10tb disk capacity
                  • # of forests per node: (Disk space/max forest size)
                    • Disk space: 10tb
                    • Recommended max forest size in ML: 512gb
                    • Recommended # of forests for this node: 20 (Disk space/forest size)
                  • I/O bandwidth per forest: 20mb/sec read, 20mb/sec write
                  • Total I/O bandwidth: 20*20mb/sec (# of forests/I/O per forest)
                • So, If your disk capacity is 10tb, the I/O bandwidth will be:
                  • 400mb/sec read, 400mb/sec write
                • Similarly, if your disk capacity is 20tb, the I/O bandwidth will be:
                  • 800mb/sec read, 800mb/sec write

                KB Articles:

                What is the maximum size for a forest in MarkLogic?

                • The rule-of-thumb maximum size for a forest is 512GB
                • It's almost always better to have more small forests instead of one very large forest
                • It's important to keep in mind that forests have hard maximums for:
                  • Number of stands
                  • Number of fragments

                KB Articles:

                Documentation:

                How many documents per forest/database?

                While MarkLogic Server does not have a practical or effective limit on the number of documents in a forest or database, you'll want to watch out for:

                • Size of forests - as bigger forests require more time and computational resources to maintain
                • Maximum number of stands per forest (64) is a hard stop and difficult to unwind - so it's important that your database is merging often enough to stay well under that limit. Most deployments don't come close to this maximum unless they're underprovisioned and therefore merging too slowly or too infrequently
                • Maximum number of fragments per stand (on the order of tens or hundreds of millions). Most deployments typically scale horizontally to more forests (and therefore more stands) well before needing to worry about the number of fragments in a specific stand

                KB Articles:

                Documentation:

                How should I configure my default databases (like security)?

                • The recommended number of local disk failover (LDF) forests for default databases is one for each primary forest
                • For example - each default database (including security) should have one data forest and one LDF forest
                • More LDF copies are not recommended as they're almost never worth the additional administrative complexity and dedicated hardware resources

                KB Articles:

                What is the recommended record or document size?

                100 kb +/- two orders of magnitude (1 kB - 10 MB)

                KB Articles:

                What is the recommended number of range indexes for a database?

                • On the order of 100 or so
                • If you need many more, revise your data model to take advantage of Template Driven Extraction (TDE)

                KB Articles

                Documentation

                Does it help to do concurrent MLCP jobs in terms of performance?

                • Each MLCP job, starting in version 10.0-4.2, uses the maximum number of threads available on the server as the default thread count
                • Since a single job already uses the all the available threads, concurrent MLCP jobs won't be helpful in terms of performance

                KB Articles:

                Documentation:

                Should we backup default databases?

                • We recommend regular backups for the Security database
                • If actively used, regular backups are recommended for Schemas, Modules, Triggers and other default databases

                KB Articles:

                Backup/restore best practices?

                • Backups can be CPU/RAM intensive
                • Incremental backups minimize storage, not necessarily time
                • Unless your cluster is over-provisioned compared to most, concurrent backup jobs are not recommended
                • The "Include Replica" setting allows for backup if failed over - but also doubles your backup footprint in terms of storage
                • The "Max Backups" setting is applicable only for full backups

                KB Articles:

                Documentation:

                Do we need to mirror configuration between primary and replica databases? If so, how do we do it?
                • Yes - primary and replica databases should have mirrored configurations. If the replica database's configuration is different, query results from the replica database will also be different

                • Configurations can be mirrored with Configuration Manager (deprecated in 10.0-3), or mlgradle/Configuration Management API (CMA)

                KB Articles:

                What to consider when configuring the thread_count option for MLCP export?
                • By default the -thread_count is 4 (if -thread_count is not specified)
                • For best performance, you can configure this option to use the maximum number of threads supported by the app server in the group (maximum number of server threads allowed on each host in the group * the number of hosts in the group)
                  • E.g.: For a 3-node cluster, this number will be 96 (32*3) where:
                    • 32 is the max number of threads allowed on each host
                    • 3 is the number of hosts in the cluster

                Note: If the -thread_count is configured to use max server threads, it is highly not recommended to use concurrent jobs

                KB Articles:

                Documentation:

                Summary

                In addition to the multiple language support in MarkLogic Server, MarkLogic Server also supports ISO codes listed below for representation of names for these languages.

                 

                MarkLogic supported ISO codes

                MarkLogic supports following ISO codes for the representation of language names:
                1. ISO 639-1
                2. ISO 639-2/T , and
                3. ISO 639-2/B

                Further, NOTE:
                a. MarkLogic uses the 2-letter ISO 639-1 codes, including zh's zh_Hant variant, and
                b. MarkLogic uses the 3-letter ISO 639-2 codes. To get a more specific list of ISO 639-2 codes go to http://www.loc.gov/standards/iso639-2/php/code_list.php


                Again, MarkLogic only supports below listed languages, http://docs.marklogic.com/guide/search-dev/languages#id_64343
                English
                French
                Italian
                German
                Russian
                Spanish
                Arabic
                Chinese (Simplified and Traditional)
                Korean
                Persian (Farsi)
                Dutch
                Japanese
                Portuguese
                Norwegian (Nynorsk and Bokmål)
                Swedish

                 

                Suggestion

                The function cdict:get-languages() can be used to get ISO Codes for all supported languages. Here is an example of the usage:

                  xquery version "1.0-ml";
                  import module namespace cdict = "http://marklogic.com/xdmp/custom-dictionary" 
                		  at "/MarkLogic/custom-dictionary.xqy";
                
                  cdict:get-languages()
                
                  ==> ("en", "ja", "zh", "zh_Hant")

                 

                Summary

                There are many different kinds of locks present in MarkLogic Server.

                Transaction locks are obtained when MarkLogic Server detects the potential of a transaction to change the database, at which point the server considers it to be an update transaction. Once a lock is acquired, it is held until the transaction ends. Transaction locks are set by MarkLogic Server either explicitly or implicitly depending on the configured commit mode. Because it's very common to see poorly performing application code written against MarkLogic Server due to unintentional locking, the two concepts of transaction type and commit mode have been combined into a single, simpler control - transaction mode

                MarkLogic Server also has the notion of document and directory locks. Unlike transaction locks, document and directory locks must be set explicitly and are persistent in the database - they are not tied to a transaction. Document locks also apply to temporal documents. Any version of a temporal document can be locked in the same way as a regular document.

                Cache partition locks are used by threads which can make changes to a cache. Threads need to acquire a write lock for both the relevant cache and cache partition before it makes the change.

                Transaction Locks and Commit Mode vs. Transaction Mode

                Transaction lock types are associated with transaction types. Query type transactions do not use locks to obtain a consistent view of data, but rather the state of the data at a particular timestamp. Update type transactions have the potential to change the database and therefore require locks on documents to ensure transactional integrity. 

                So - if an update transaction type is run in explicit commit mode, then locks are acquired for all statements in an update transaction -  whether or not those statements perform updates. Once a lock is acquired, it is held until the transaction ends. If an update transaction type is run in auto commit mode, by default MarkLogic Server detects the transaction type through static analysis of the first statement in that transaction. If the server detects the potential for updates during static analysis, then the transaction is considered an update transaction - which results in a write lock being acquired.

                In multi-statement transactions, if an update transaction type is run in explicit commit mode, then the transaction is an update transaction and locks are acquired for all statements in an update transaction - even if no update occurs. In auto commit mode MarkLogic Server determines the transaction type through static analysis of the first statement. If in auto commit mode, and the first statement is a query, and an update occurs later in that transaction, MarkLogic Server will throw an exception. In multi-statement transactions, the transaction ends only when it is explicitly committed or rolled back. Failure to explicitly commit or roll back a multi-statement transaction might retain locks until the transaction times out or reaches the end of the session - at which point the transaction rolls back.

                Best practices:

                1) Avoid unnecessary transaction locks or holding on to transaction locks for too long. For single-statement transactions, do not explicitly set the transaction type to update if running a query. For multi-statement transactions, always explicitly commit or rollback the relevant transaction to free transaction locks as soon as possible.

                2) It's very common for users to write code that unintentionally takes write locks. One of the best ways to avoid unintentional locks is to use transaction modes instead of transaction types/commit modes. Transaction modes combines transaction type and commit mode into a single configurable value. You can read more about transaction mode in our documentation at: Transaction Mode Overview.

                3) Be aware that when setting transaction mode, the xdmp:commit and xdmp:update XQuery prolog options affect only the next transaction created after their declaration; they do not affect an entire session. Use xdmp:set-transaction-mode or xdmp.setTransactionMode if you need to change the transaction mode settings at the session level.

                Document and Directory Locks

                Document and directory locks are not tied to a transaction. The locks must be explicitly set and stored as a lock document in a MarkLogic Server database. So the locks can last a specified time period or be persistent until explicitly unlocked.

                Each document and directory can have a lock. The lock can be used as part of an application's update strategy. MarkLogic Server provides the flexibility for client to set up a policy of how to use the locks that suitable for client environment. For example, if only one user is allowed to update the specific database objects, you can set the lock to be "exclusive." In contrast, if you have multiple users updating the same database object, you can set the lock to be "shared."

                Unlike transaction locks, document and directory locks are persistent in the database and are consequently searchable.   

                Temporal Document Locks

                A temporal collection contain bi-temporal or uni-temporal documents. Each version of a temporal document can be locked in the same way as a regular, non-temporal document.

                Cache and Cache Partition Locks

                If a thread attempts to make a change to database cache, it needs to acquire a write lock for the relevant cache and cache partition. This cache or cache partition write lock is serializes write access, which keep date in the relevant cache or cache partition thread-safe. While cache and cache partition locks are short-lived, be aware that in the case of a single cache partition, all of the threads needing to access that would need to serialize through a single cache partition write lock. For multiple cache partitions, multiple write locks can be acquired with one lock per partition - which allows multiple threads to make concurrent cache partition updates.

                References and Additional Reading:

                1) Understanding Transactions in MarkLogic Server

                2) Cache Partitions

                3) Document and Directory Locks

                4) Understanding Locking in MarkLogic Server Using Examples

                5) Understanding XDMP-DEADLOCK

                6) Understanding the Lock Trace Diagnostic Trace Event

                7) How MarkLogic Server Supports ACID Transactions

                Question

                Answer

                Further Reading

                How can we download the latest version of MarkLogic server? 

                The latest available versions are published on our Support portal, and they are available for download from Developer portal

                Documentation:

                How can we download older versions of MarkLogic Server in case it's required?

                Please request the specific version you need in a support ticket and we’ll try to provide it to you

                Where can we find the EOL (end of life) information for a MarkLogic version? 

                You can visit MarkLogic Support page for the updates on end of life for different versions

                Note: we typically give our customers 12 months notice before the EOL date

                Where can I see the list of bug fixes between two versions or MarkLogic Server?

                You can view all of the bug fixes between MarkLogic Server versions by using our Fixed Bugs reporting tool, available on our Support site

                Link:

                What is MarkLogic's recommended procedure for performing major version upgrades (for instance, MarkLogic 9 to MarkLogic 10)?

                It is good practice to upgrade to the latest release of whichever version you are currently running (e.g MarkLogic 9.x to 9.latest) and then to the desired major release version (e.g MarkLogic 10.x)

                Documentation:

                What are the best practices to be followed while planning/performing upgrades? 

                • Refer to our Release Notes, New Features, Known Incompatibilities and Fixed Bugs report
                • Perform thorough testing of any operational procedures on non-production systems
                • Run application sanity checks in lower environments against the MarkLogic version to which you’re upgrading before upgrading your production environment. This may require making application code changes to resolve any unforeseen issues (see our documentation on “Known Incompatibilities”)
                • Ensure you have relevant backups in case you need to rebuild after a failed upgrade

                Documentation:

                KB Articles:

                How do I upgrade from one version of MarkLogic Server to another? 

                Please refer to our knowledge base article for details. You can check our documentation for additional details. 

                Documentation:

                KB Article:

                Can we downgrade or rollback upgrades to previously installed versions?

                No, version downgrades are NOT supported. Ensure you have sufficient backups for your system’s  MarkLogic Configuration and Data in the event you need to rebuild your environment

                Documentation:

                KB Articles:

                What is the recommended back out plan in case of emergency upgrade failure? 

                Although it is not expected that you will ever need to back out a version upgrade of MarkLogic Server, it is always prudent to have contingency plans in place. This knowledgebase article includes the preparation steps needed to back out of an upgrade.

                KB Articles:

                Back-out Plan sections in

                Is there a particular order we need to follow while upgrading a multi node cluster? 

                The forests for the Security and Schemas databases must be on the same host, and that host must be the first host you upgrade when upgrading a cluster. 

                Documentation:

                KB Article:

                What is the difference between conventional and rolling upgrades? 

                Conventional Upgrades:

                A cluster can be upgraded with a minimal amount of transactional downtime.

                Rolling Upgrades

                The goal in performing a rolling upgrade is to have zero downtime of your server availability or transactional data. 

                Note: if a cluster’s high availability (HA) scheme is designed to only allow for one host to fail, that cluster becomes vulnerable to failure when a host is taken offline for a rolling upgrade. To maintain HA while a host is offline for a rolling upgrade, your cluster’s HA scheme must be designed to allow two hosts to fail simultaneously

                Documentation:

                What are the important points to note before performing rolling upgrades?

                Please refer to our documentation for more details. 

                Documentation:

                How do I roll back a partial upgrade while performing Rolling Upgrades

                If the upgrade is not yet committed, you can reinstall the previously used version of MarkLogic on the affected nodes

                Documentation:

                Are there API's available to perform and manage Rolling upgrades?

                Yes, there are API's to perform and manage rolling upgrades. Please refer to our documentation. 

                Documentation:

                How do I upgrade replicated environments? 

                Replica first! If your Security database is replicated then your replica cluster must be upgraded before your primary cluster. This is also mentioned in our Database Replication FAQ.

                Documentation:

                KB Article:

                What is the procedure for performing OS patches/upgrades?

                There are two choices, each with their own advantages and disadvantages:

                • Creating brand new nodes with desired OS version and setting them up from scratch (Steps are listed in this KB)
                  • Advantage: Your can switch back to your existing cluster if there’s something wrong with your new cluster
                  • Disadvantage: You’ll need a backup cluster until you’re satisfied with the upgraded environment
                • Upgrading OS on the existing nodes
                  • Advantage: You won’t need two environments as you’ll be upgrading in place
                  • Disadvantage: Larger maintenance window, since your existing nodes will be unavailable during the upgrade/patching process

                KB Article:

                What order should the client libraries (such as ODBC Driver, MLCP/XCC, Node.js Client API etc) be upgraded?

                While the order of client library upgrades doesn't matter, it’s important that the client library versions you’re using are compatible with your MarkLogic Server installation

                Documentation:

                Can we perform in-place upgrades or do we need to commission new servers? Which among the two is recommended?

                Both approaches can work. However, it’s important to understand that in on-premise deployments, you’d generally stay on your same machines, then change/upgrade the MarkLogic binary. In contrast, in AWS and other virtualized environments, you’d generally keep your data/configuration, and instead attach them to a new machine instance, with itself is running the more recent binary 

                When is the Security database updated?

                Security database upgrades happen if the security-version in clusters.xml is different after the upgrade. In between minor release upgrades, there is typically no upgrade of the Security database. You can also check the status in the 'Upgrade' tab in the Admin UI. 

                What is the procedure for transferring data between MarkLogic clusters?

                There are multiple options - these include:

                • Database Backup and Restore
                • Database Replication
                • Tools like MLCP

                Documentation:

                KB Articles:

                What is the procedure for copying configuration between clusters?

                Configurations can be copied with Configuration Manager (deprecated in 10.0-3), or mlgradle/Configuration Management API (CMA) 

                Documentation:

                KB Articles:

                How do I upgrade MarkLogic on AWS?

                Please refer to our MarkLogic on AWS FAQ for more details

                What are the best practices for performing Data Hub upgrades?

                Please refer to our Data Hub Framework FAQ for more details

                Updates are a key aspect of data manipulation in MarkLogic Server, and can sometimes be performance intensive, especially if performed in bulk. Therefore one should take time to consider exactly how your application will perform updates. Moreover, a given document often is associated with data other than its content, such as attributes, permissions, collections, quality, and metadata - all of these attributes can be affected by a chosen update method.

                MarkLogic Server offers various methods to update a document, but there are two major ways to do it, in general:

                • node-replace - Replaces a node in an existing document
                • document-insert - Inserts an entirely new document into the database or replaces the content of an existing document based on whether or not a document with a specified URI already exists.

                Although there is no material difference between node-replace and document-insert, using node-replace for updates is better because it preserves document attributes like permissions, collections, quality and metadata as opposed to document-insert which replaces all the aforementioned attributes along with the content of the document unless these attributes are explicitly found and attached to the insert query.

                Note: Using ‘node-replace’ is the authoritative way of updating documents among all the node-level update functions

                Mass-updates:

                For updating a small set of documents where it is important to preserve all attributes of a document, ‘node-replace’ would be a better choice as it saves the overhead of finding the existing attributes by yourself. On the other hand, if query performance holds a higher priority over preserving the existing attributes of a document, ‘document-insert’ would likely be a better choice as it is faster when used without querying for the attributes. There is, however, no significant difference between the two if used in a similar fashion.

                With the release of MarkLogic Server versions 8.0-8 and 9.0-4, detailing memory use broken out by major areas is periodically recorded to the error log. These diagnostic messages can be useful for quickly identifying memory resource consumption at a glance and aid in determining where to investigate memory-related issues.

                Error Log Message and Description of Details

                At one hour intervals, an Info level log message will be written to the server error log in the following format:

                Info: Memory 46% phys=255137 size=136452(53%) rss=20426(8%) huge=97490(38%) anon=1284(0%) swap=1(0%) file=37323(14%) forest=49883(19%) cache=81920(32%) registry=1(0%)

                The error log entry contains memory-related figures for non-zero statistics: raw figures are in megabytes; percentages are relative to the amount of physical memory reported by the operating system. Except for phys, all values are for the MarkLogic Server process alone.  The figures include

                Memory: percentage of physical memory consumed by the MarkLogic Server process
                phys: size of physical memory in the machine
                size: total process memory for the MarkLogic process; basically huge+anon+swap+file on Linux.  This includes memory-mapped files, even if they are not currently in physical memory.
                swap: swap consumed by the MarkLogic Server process
                rss: Resident Set Size reported by the operating system
                anon: anonymous mapped memory used by the MarkLogic Server
                file: total amount of RAM for memory-mapped data files used the MarkLogic Server---the MarkLogic Server executable itself, for example, is memory-mapped by the operating system, but is not included in this figure
                forest: forest-related memory allocated by the MarkLogic Server process
                cache: user-configured cache memory (list cache, expanded tree cache, etc.) consumed by the MarkLogic Server process
                registry: memory consumed by registered queries
                huge: huge page memory reserved by the operating system
                join: memory consumed by joins for active running queries within the MarkLogic Server process
                unclosed: unclosed memory, signifying memory consumed by unclosed or obsolete stands still held by the MarkLogic Server process

                In addition to reporting once an hour, the Info level error log entry is written whenever the amount of main memory used by MarkLogic Server changes by more than five percent from one check to the next. MarkLogic Server will check the raw metering data obtained from the operating system once per minute. If metering is disabled, the check will not occur and no log entries will be made.

                With the release of MarkLogic Server versions 8.0-8 and 9.0-5, this same information will be available in the output from the function xdmp:host-status().

                <host-status xmlns="http://marklogic.com/xdmp/status/host">
                . . .
                <memory-process-size>246162</memory-process-size>
                <memory-process-rss>27412</memory-process-rss>
                <memory-process-anon>54208</memory-process-anon>
                <memory-process-rss-hwm>73706</memory-process-rss-hwm>
                <memory-process-swap-size>0</memory-process-swap-size>
                <memory-system-pagein-rate>0</memory-system-pagein-rate>
                <memory-system-pageout-rate>14.6835</memory-system-pageout-rate>
                <memory-system-swapin-rate>0</memory-system-swapin-rate>
                <memory-system-swapout-rate>0</memory-system-swapout-rate>
                <memory-size>147456</memory-size>
                <memory-file-size>279</memory-file-size>
                <memory-forest-size>1791</memory-forest-size>
                <memory-unclosed-size>0</memory-unclosed-size>
                <memory-cache-size>40960</memory-cache-size>
                <memory-registry-size>1</memory-registry-size>
                . . .
                </host-status>


                Additionally, with the release of MarkLogic Server 8.0-9.3 and 9.0-7, Warning-level log messages will be reported when the host may be low on memory.  The messages will indicate the areas involved, for example:

                Warning: Memory low: forest+cache=97%phys

                Warning: Memory low: huge+anon+swap+file=128%phys

                The messages are reported if the total memory used by the mentioned areas is greater than 90% of physical memory (phys). As best practice for most use cases, the total of the areas should not be more than around 80% of physical memory, and should be even less if you are using the host for query processing.

                Both forest and file include memory-mapped files; for example, range indexes.  Since the OS manages the paging in/out of the files, it knows and reports the actual RAM in use; MarkLogic reports the amount of RAM needed if all the mapped files were in memory at once.  That's why MarkLogic can even report >100% of RAM in use---if all the memory-mapped files were required at once the machine would be out of memory.

                Data Encryption Scenario: An encrypted file cannot be memory-mapped and is instead decrypted and read into anon memory. Since the file that is decrypted in memory is not file-backed it cannot be paged out. Therefore, even though encrypted files do not require more memory than unencrypted files, they become memory-resident and require physical memory to be allocated when they are read.

                If the hosts are encountering these warnings, memory use should be monitored closely.

                Remedial action to support memory requirements might include:

                • Adding more physical memory to each of the hosts;
                • Adding additional hosts to the cluster to spread the data across;
                • Adding additional forests to any under-utilized hosts.

                Other action might include:

                • Archiving/dropping any older forest data that is no longer used;
                • Reviewing the group level cache settings to ensure they are not set too high, as they make up the cache part of the total. For reference, default (and recommended) group level cache settings based on common RAM configurations may be found in our Group Level Cache Settings based on RAM Knowledge base article.

                Summary

                This enhancement to MarkLogic Server allows for easy periodic monitoring of memory consumption over time, and records it in a summary fashion in the same place as other data pertaining to the operation of a running node in a cluster. Since all these figures have at their source raw Meters data, more in-depth investigation should start with the Meters history. However, having this information available at a glance can aid in identifying whether memory-related resources need to be explored when investigating performance, scale, or other like issues during testing or operation.

                Additional Reading

                Knowledgebase: RAMblings - Opinions on Scaling Memory in MarkLogic Server 

                Introduction

                The MarkLogic Monitoring History feature allows you to capture and view critical performance data from your cluster. By default, this performance data is stored in the Meters database. This article explains how you can plan for the additional disk space required for the Meters database.

                Meters Database Disk Usage

                Just like any other database, Meters database is also made up of forests which in turn are made up of stands that reside physically on-disk. As Meters database is used by Monitoring History to store critical performance data of your cluster, the amount of information can grow significantly with more number of hosts, forests, databases etc. Thus the need to plan and manage the disk space required by Meters database.

                Recommendation

                Meters database stores critical performance data of your cluster. The size of data is proportional to the number of hosts, app servers, forests, databases etc. Typically, the raw retention settings have the largest impact on size.

                MarkLogic's recommendation for a new install is to start with the default settings and monitor usage over the first two weeks of an install. The performance history charts, constrained to just show the Meters database, will show an increasing storage utilization over the first week, then leveling off for the second week. This would give you a decent idea of space utilization going forward.

                You can then adjust the number of days of raw measurements that are retained.

                You can also add additional forests to spread the Meters database over more hosts if needed.

                Monitoring History

                The Monitoring History feature allows you to capture and view critical performance data from your cluster. Monitoring History capture is enabled at the group level. Once the performance data has been collected, you can view the data in the Monitoring History page.

                By default, the performance data is stored in the Meters database. A consolidated Meters database that captures performance metrics from multiple groups can be configured, if there is more than one group in the cluster.

                Monitoring History Data Retention Policy

                How long the performance data should be kept in the Meters database before it is deleted can be configured with the data retention policy. (http://docs.marklogic.com/guide/monitoring/history#id_80656)

                If it is observed that meters data is not being cleared according to the retention policy, the first place to check would be the range indexes configured for the Meters database.

                Range indexes and the Meters Database

                Meters database is configured with a set of range indexes which, if not configured correctly (or not present) can prevent the cleaning up of Meters database according to the set retention policy.

                It is possible to have missing or misconfigured range indexes in either of the below scenarios

                •  if the cluster was upgraded from a version of ML before 7.0 and the upgrade had some issues
                •  if the indexes were manually created (when using another database for meters data instead of the default Meters database)

                The size of the meters database can grow significantly as the cluster grows, so it is important that the meters database is cleared per the retention policy.

                The required indexes (as of 8.0-5 and 7.0-6) are attached as an ML Configuration Manager package(http://docs.marklogic.com/guide/admin/config_manager#id_38038). Once these are added, the Meters database will reindex and the older data should be deleted.

                Note that deletion of data older than the retention policy occurs no sooner than the retention policy. Data older than the retention policy may still be maintained for an unspecified amount of time.

                Related documentation

                http://docs.marklogic.com/guide/monitoring

                https://help.marklogic.com/Knowledgebase/Article/View/259/0/metering-database-disk-space-requirements

                 

                 

                 

                 

                 

                 

                 

                 

                 

                 

                 

                 

                SUMMARY:

                Prior to MarkLogic 4.1-5, role-ids were randomly generated.  We now use a hash algothm that ensures that roles created with the same name will be assigned the same role-id.  When attempting to migrate data from a forest created prior to MarkLogic 4.1-5 to a newer installation can cause the user to be met with a "role not defined error".  In order to work around this issue, we will need to create a new role with the role-id defined in the legacy system. 

                Procedure:

                This process creates a new role with the same role-id from your legacy installation and assigns this old role to your new role with the correct name.

                Step 1: You will need to find the role-id of the legacy role. This will need to be run against the security DB on the legacy server. 

                <code>

                xquery version "1.0-ml";
                import module namespace sec="http://marklogic.com/xdmp/security" at
                "/MarkLogic/security.xqy";

                let $role-name := "Enter Roll Name Here" 

                return
                /sec:role[./sec:role-name=$role-name]/sec:role-id/text()

                </code>


                Step 2: In the new environment, store the attached module to the following location on the host containing the security DB.

                /opt/MarkLogic/Modules/role-edit/create-master-role.xqy

                Step 3: Ensure that you have created the role on the new cluster.

                Step 4: Run the following code against the new clusters security DB. This will create a new role with the legacy role-id. Be sure to enter the role name, description, and role-id from Step 1.

                <code>
                xquery version "1.0-ml";
                import module namespace cmr="http://role-edit.com/create-master-role" at
                "/role-edit/create-master-role.xqy";

                let $role-name := "ENTER ROLE NAME"
                let $role-description := "ENTER ROLE DESCRIPTION"
                let $legacy-role-id := 11658627418524087702 (: Replace this with the Role ID from Step 1:)

                let $legacy-role := fn:concat($role-name,"-legacy")
                let $legacy-role-create := cmr:create-role-with-id($legacy-role, $role-description, (), (), (), $legacy-role-id)

                return
                fn:concat("Inserted role named ",$legacy-role," with id of ",$legacy-role-id)

                </code>


                Step 5: Run the following code against the new clusters security database to assign the legacy role to the new role.

                <code>
                xquery version "1.0-ml";
                import module namespace sec="http://marklogic.com/xdmp/security" at
                "/MarkLogic/security.xqy";

                let $role-name := "ENTER ROLE NAME"
                let $legacy-role := fn:concat($role-name,"-legacy")

                return
                (
                sec:role-set-roles($role-name, ($legacy-role)),
                "Assigned ",$legacy-role," role to ",$role-name," role"
                )

                </code>

                 

                You should now have a new role named [your-role]-legacy.  This legacy role will contain the role-id from your legacy installation and will be assigned to [your-role] on the new installation.  Legacy documents in your DB will now have the same rights they had in the legacy system.

                At some point, you may need to migrate your application's data and configuration from an existing cluster (source cluster) to a new cluster (destination cluster).

                This migration can be done in several ways:

                • Backup and restore approach - where you take a full backup of your source cluster and perform a restore on the destination cluster
                • Hybrid-cluster approach -  where you add new nodes to the source cluster, rebalance/distribute forest data across all the nodes, retire the old forests to let the data completely move over to the new nodes, and finally remove the old nodes
                • Backup/Restore and Database Replication approach - where you create a new destination cluster, perform a full backup of the source cluster, restore the resulting backup on the destination cluster, setup database replication between the source and destination clusters (source cluster being the primary). Once both the clusters are in sync, you'd then switch over to the destination cluster making it the new primary

                Among the three approaches listed above, we recommend the backup/restore and database replication approach because this approach:

                • Is safer as your source cluster remains in a known good state.
                • Is easier because if something goes wrong, you can return to your known good state by simply disabling database replication 
                • Supports disaster recovery by providing an active copy of your source cluster
                • Gives you enough time to build trust in the destination cluster before switching over to it from the source cluster to complete the migration

                No matter which one of the above approaches you take, we highly recommend testing your chosen migration process first in your lower environments before implementing this in your production environment.

                Introduction

                Those familiar with versions of MarkLogic Server prior to MarkLogic 7 may have heard the 3X disk space rule being mentioned. At the time of writing, references to are to be found in the MarkLogic 5 documentation and the MarkLogic 6 documentation

                The Monitoring Metrics of Interest section in the Monitoring MarkLogic Guide refers to the 3X rule as during a preparatory question on disk allocation for a database:

                • Is there enough disk space for forest data and merges? Merges require at least twice as much free disk space as used by the forest data (3X rule). If a merge runs out of disk space, it will fail.

                For anyone reading the requirements guidelines for MarkLogic 7 (and above), you may have noticed a section that suggests that you should plan to ensure disk space is available to:

                • 1.5 times the disk space of the total forest size. Specifically, each forest on a filesystem requires its filesystem to have at least 1.5 times the forest size in disk space (or, for each forest less than 32GB, 3 times the forest size). This translates to 1.5 times the disk space of the source content after it is loaded.

                  For example, if you plan on loading content that will result in a 100 GB database, reserve at least 150GB of disk space. The disk space reserve is required for merges.

                This Knowledgebase article will cover both requirements and offer some further guidance as to how to plan and size your databases and - crucially - how you can take advantage of the newer 1.5X rule.

                3X

                The original logic behind the allocation of 3X disk space was to provide ample space to allow for a situation where a database is fully reindexed. The allocation would be in thirds according to the following measures:

                1. Your Data
                2. Space for reindexing
                3. Space for merges

                The 3X disk provision rule was offered as a very general (and very safe for production) rule to cover the most extreme example where your data gets reindexed in its entirety and then merges have to take place on top of that.

                ... but why 3X?

                To understand this, we need to briefly explore what happens when a document is updated in MarkLogic Server.

                As an update is made to a document - and the same rule applies to an update to a document when index changes are concerned - the transaction takes place at a given timestamp (a given point in time). At that point, the original fragment is marked as deleted and a new fragment is written to an in-memory-stand. Eventually, the in-memory stand is written to disk.

                For a period of time - especially at times where a MarkLogic instance/cluster is busy performing a large number of updates - it's likely that there will be occasions where two versions of the same fragment exist in different stands on disk; one stand will contain the fragment now marked as deleted and the other stand will contain the newly written fragment - which will be used by any subsequent queries running at later timestamps.

                ... so that covers 2X - what about the other third?

                When a merge takes place, merge candidate stands are identified and a new stand is created. As the candidate stands are read through, the active fragments are copied over to the new stand.

                At the point where the merge takes place, the new stand coexists with the older stand because - like updates and reindexing - queries will still need to run against the candidate stands; the timestamp will only get moved on to accommodate the data in the new stand as soon as the process has completed in it's entirety.

                While all of this is taking place, other updates could be taking place to documents in other stands and the same rules apply to those fragments too.

                So the 3X rule provides a true safeguard; allowing for a situation where forest sizes are likely to swell way above and beyond the size of the data they contain, to accommodate the fragments marked deleted for queries at earlier timestamps and to accommodate the additional headroom required by a merge of some very large stands.

                1.5X

                Some changes were made in MarkLogic 7 which effectively reduce the footprint of your data on-disk. With some careful planning, you can take advantage of the lower sizing rule.

                While the documentation still acknowledges the 3X rule (which is still true if you're performing an upgrade directly from MarkLogic 6 or earlier without making any other configuration changes), a new default configuration has been introduced to databases created under MarkLogic 7; this is the merge max size

                What does the merge max size do?

                This setting enforces an upper limit of 32GB on the size of an individual stand.

                With previous versions of the product, the expectation would be for the contents of a forest to merge down to one large stand. That is: given a quiesced database, on full completion of a merge, all content (all active fragments) should be in a single stand.

                For databases on MarkLogic 7 (and later), you can now expect to see more stands - each with a maximum size of 32GB.

                This means you should expect to see your data in more stands than you would have done on prior versions of the product, but it also means that you can lower the amount of disk space you need due to this size restriction.

                From MarkLogic 7 and onwards - with the merge max size correctly set - the largest amount of space a single merge operation should require would be 64GB

                ... but why 1.5X?

                If we return to this line in the documentation:

                • For example, if you plan on loading content that will result in a 100 GB database, reserve at least 150GB of disk space. The disk space reserve is required for merges.

                Given that we now have an upper limit on the size of a stand (32GB), as two smaller stands are being merged to create the new, larger stand and given the space required by other concurrent operations that may be taking place in other stands, a space limit of 1.5X should now cover any merges (and subsequent updates to documents).

                For further understanding or the 1.5X rule, read our knowledgebase article 'Explanation of the 1.5X Disk Space Requirement' .

                How do I find out whether my database is configured for this new merge max size?

                If you're on the admin interface at http://[yourhostname]:8001

                Go to: Configure > Databases > [Your Database Name] > Merge Policy

                On the right-hand panel, you should see the merge max size; the default should now be 32768

                Important caveats

                MarkLogic 7 is designed to allow you to work with more stands. While it's safe to say that you should be concerned when you see a system with a very large number of small stands exists, a slightly different rule requires a shift in thinking and this has implications in particular when you start to think about applying the 1.5x disk space rule in your environment.

                In releases prior to MarkLogic 6, the expectation (over time) was that all data in a forest would ultimately attempt to get merged into a single stand.

                In MarkLogic 7, at least with the default setting of the merge-max-size (to 32768 - 32GB), it is understood that a reasonably large forest would now be divided into a number of 32GB stands.

                If you are strictly following this rule for all reasonably large forests on your system - then the 1.5x rule can safely be used operationally in a production environment, but reliance on the rule should require careful management when migrating an existing system as running out of disk space can have catastrophic consequences for a live system.

                For very small forests, the 1.5X rule does not apply.  Due to the 32GB stand size overhead, your forests need to be sufficiently larger in order to use the 1.5X rule. 

                You should treat the 1.5x rule as an absolute minimum requirement for disk space for a given database. If you are going to use it, we would recommend having a strategy in place for allocating more space until you are confident that the cluster can run safely within the lower (1.5x) boundaries.

                I'm upgrading from an earlier version of MarkLogic to MarkLogic 7 - I have changed the merge max size to 32768. Can I reclaim the disk space?

                It's important to note that the 1.5x guidelines will only work if your forests all contain stands that have the new maximum size of 32GB. If your forests still contain larger stands, you'll need to break these down before you can consider reclaiming disk space. 

                ... Breaking Large Stands Down

                If your forests contain stands larger than 32 GB, you will want to break these stands down in order to take advantage of the lower disk space requirements.

                Different techniques can be followed to break the stands and reclaim disk space:

                1. Re-ingesting the content of the forests with large stands - When documents are re-ingested in a forest, the old fragments will be marked as deleted and the new fragment will be written to a new stand. Once there are sufficient deleted fragments, the large stands will be merged down into smaller stands.
                2. Perform re-indexing – A Forced re-index will update every fragment in the database, effectively re-loading the content - the original fragments will be marked as deleted and the new fragments will be written to a new stand. Once there are sufficient deleted fragments, the large stands will be merged down into smaller stands.  
                3. Forest rebalancing  - Rebalance active fragments from existing forests and retire old forest with Max Merge Size configured, this will merge out deleted fragments in old stand and maintain active fragments in smaller stand/stands in other rebalanced forests.

                Conclusion

                The major points for the 1.5X rule:

                • The estimated 1.5X disk space utilization is only true for databases where merge-max-size is correctly set and for forests that are sufficiently large. For databases created in MarkLogic Server v7 or later, the default merge-max-size is to 32768 (32GB)
                • If you're upgrading from earlier releases, you would need to make sure you set this value as part of your upgrade process.
                  • After upgrading from a version previous to MarkLogic 7, you will have to take explicit steps to decrease the size of any pre-existing large stands. 

                 

                Summary

                New and updated mimetypes were added for MarkLogic 8.  If your MarkLogic Server instance has customized mimetypes, the upgrade to MarkLogic Server v8.0-1 will not update the mimetypes table. 

                Details

                MarkLogic 8 includes the following new mimetype values:

                Name    Extension Format
                application/json json json
                application/rdf+json rj json
                application/sparql-results+json srj json
                application/xml xml xsd xvs sch    xml
                text/json   json
                text/xml   xml
                application/vnd.marklogic-javascript     sjs text
                application/vnd.marklogic-ruleset rules text

                If you upgraded to 8.0 from a previous version of MarkLogic Server and if you have ever customized your mimetypes (for example, using the MIME Types Configuration page of the Admin Interface), the upgrade will not automatically add the new mimetypes to your configuration. If you have not added any mimetypes, then the new mimetypes will be automatically added during the upgrade. You can check if you have these mimetypes configured by going to the Mimetype page of the Admin Interface and checking if the above mimetypes exist. If they exist, then there is nothing you need to do.

                Effect

                Not having these mimetypes may lead to application level failures - for example: running Javascript code via Query Console will fail. 

                Resolving Manually

                If you do not have the above mimetypes after upgrading to 8.0, you can manually add the mimetypes to your configuration using the Admin Interface. To manually add the configuration, perform the following

                1. Open the Admin Interface in a browser (for example, open http://localhost:8001).
                2. Navigate to the Mimetypes page, near the bottom of the tree menu.
                3. Click the Create tab.
                4. Enter the name,the extension, and the format for the mimetype (see the table above).
                5. Click OK.
                6. Repeat the preceding steps for each mimetype in the above table.

                Please be aware that updating the mimetype table results in a MarkLogic Server restart.  You will want to execute this procedure when MarkLogic Server is idle or during a maintenance window.

                Resolve by Script

                Alternatively, if you do not have the above mimetypes after upgrading to 8.0, you can add the mimetypes to your configuration by executing the following script in Query Console:

                xquery version "1.0-ml";

                import module namespace admin = "http://marklogic.com/xdmp/admin" at "/MarkLogic/admin.xqy";
                declare namespace mt = "http://marklogic.com/xdmp/mimetypes";

                let $config := admin:get-configuration()
                let $all-mimetypes := admin:mimetypes-get($config) (: existing mimetypes defined :)
                let $new-mimetypes := (admin:mimetype("application/json""json""json"),
                    admin:mimetype("application/rdf+json""rj""json"),
                    admin:mimetype("application/sparql-results+json""srj""json"),
                    admin:mimetype("application/xml""xml xsd xvs sch""xml"),
                    admin:mimetype("text/json""""json"),
                    admin:mimetype("text/xml""""xml"),
                    admin:mimetype("application/vnd.marklogic-javascript", "sjs", "text"),
                    admin:mimetype("application/vnd.marklogic-ruleset", "rules", "text"))
                (: remove intersection to avoid conflicts :)
                let $delete-mimetypes :=
                    for $mimetype in $all-mimetypes
                    return if ($mimetype//mt:name/data() = $new-mimetypes//mt:name/data()) then $mimetype else ()
                let $config := admin:mimetypes-delete($config, $delete-mimetypes)
                (: save new mimetype definitions :)
                return admin:save-configuration( admin:mimetypes-add( $config, $new-mimetypes))
                (: executing this query will result in a restart of MarkLogic Server :)

                Please be aware that updating the mimetype table results in a MarkLogic Server restart.    You will want to execute this script when MarkLogic Server is idle or during a maintenance window.

                Fixes

                At the time of this writting, it is expected that the upgrade scripts will be improved in a maintenance release of MarkLogic Server where these updates will occur automatically.

                Introduction

                In this article, we discuss use of xdmp:cache-status in monitoring cache status, and explain the values returned.

                Details

                Note that this is a relatively expensive operation, so it’s not something to run every minute, but it may be valuable to run it occasionally for information on current cache usage.

                Output format

                The values returned by xdmp:cache-status are per host, defaulting to the current host. It takes an optional host-id to allow you to gather values from a specific host in the cluster.

                The output of xdmp:cache-status will look something like this:

                <cache-status xmlns="http://marklogic.com/xdmp/status/cache">
                  <host-id>18349804367231394552</host-id>
                  <host-name>macpro-2113.local</host-name>
                  <compressed-tree-cache-partitions>
                    <compressed-tree-cache-partition>
                      <partition-size>512</partition-size>
                      <partition-table>0.2</partition-table>
                      <partition-used>0.8</partition-used>
                      <partition-free>99.2</partition-free>
                      <partition-overhead>0</partition-overhead>
                    </compressed-tree-cache-partition>
                  </compressed-tree-cache-partitions>
                  <expanded-tree-cache-partitions>
                    <expanded-tree-cache-partition>
                      <partition-size>1024</partition-size>
                      <partition-table>0.7</partition-table>
                      <partition-busy>0</partition-busy>
                      <partition-used>30.4</partition-used>
                      <partition-free>69.6</partition-free>
                      <partition-overhead>0</partition-overhead>
                    </expanded-tree-cache-partition>
                  </expanded-tree-cache-partitions>
                  <list-cache-partitions>
                    <list-cache-partition>
                      <partition-size>1024</partition-size>
                      <partition-table>0.2</partition-table>
                      <partition-busy>0</partition-busy>
                      <partition-used>0</partition-used>
                      <partition-free>100</partition-free>
                      <partition-overhead>0</partition-overhead>
                    </list-cache-partition>
                  </list-cache-partitions>
                  <triple-cache-partitions>
                    <triple-cache-partition>
                      <partition-size>1024</partition-size>
                      <partition-busy>0</partition-busy>
                      <partition-used>0</partition-used>
                      <partition-free>100</partition-free>
                    </triple-cache-partition>
                  </triple-cache-partitions>
                  <triple-value-cache-partitions>
                    <triple-value-cache-partition>
                      <partition-size>512</partition-size>
                      <partition-busy>0</partition-busy>
                      <partition-used>0</partition-used>
                      <partition-free>100</partition-free>
                    </triple-value-cache-partition>
                  </triple-value-cache-partitions>
                </cache-status>
                

                Values

                cache-status contains information for each partition of the caches:

                • The list cache holds search term lists in memory and helps optimize XPath expressions and text searches.
                • The compressed tree cache holds compressed XML tree data in memory. The data is cached in memory in the same compressed format that is stored on disk.
                • The expanded tree cache holds the uncompressed XML data in memory (in its expanded format).
                • The triple cache hold triple data.
                • The triple value cache holds triple values.

                The following are descriptions of the values returned:

                • partition-size: The size of a cache partition, in MB.
                • partition-table: The percentage of the table for a cache partition that is currently used. The table is a data structure that has a fixed overhead per cache entry, for cache admin. This will fix the number of entries that can be resident in the cache. If the partition table is full, something will need to be removed before another entry can be added to the cache.
                • partition-busy: The percentage of the space in a cache partition that is currently used and cannot be freed.
                • partition-used: The percentage of the space in a cache partition that is currently used.
                • partition-free: The percentage of the space in a cache partition that is currently free.
                • partition-overhead: The percentage of the space in a cache partition that is currently overhead.

                When do I get errors?

                You will get a cache-full error when nothing can be removed from the cache to make room for a new entry.

                The "partition-busy" value is the most useful indicator of getting a cache-full error. It tells you what percent of the cache partition is locked down and cannot be freed to make room for a new entry. 

                 

                MarkLogic DHS

                MarkLogic Data Hub Service (DHS) provides the fastest and most cost-effective way for enterprises to integrate, store, harmonize, analyze, and secure mission-critical data in the cloud. Because it is a managed service, not all of the monitoring options are available in DHS as are available using MarkLogic Server with the Data Hub Framework.

                Management endpoint

                DHS on AWS uses port 8002 for the management endpoint.

                Dashboard and History

                The Monitoring Dashboard, and Monitoring History can also be accessed on the management port.

                The Monitoring Dashboard provides task-based views of MarkLogic Server performance metrics in real time

                Monitoring History feature allows you to view critical performance data collected from your cluster

                Database Status

                Return status information for the named database:

                Database Metrics

                Retrieve historical monitoring data about the databases in the cluster

                Retrieve historical monitoring data about the named databases:

                Server Logs

                Available Log Files

                List the logs available on the server

                Retrieving, Filtering, and Formatting Logs

                The Log files can be retrieved with text, json, or xml formatting. The files can be retrieved in whole, or they can be filtered using any combination of:

                • Start time (start)
                • End time (end)
                • Regular expression/s (regex)

                Retrieve data-hub-STAGING app server error log with text, json, or xml formatting

                Retrieve Server ErrorLog entries for a specific time range with xml formatting

                Retrieve Server ErrorLog looking for the patterns SVC or XDMP. Regex or condition, |, is URL encoded (%7C) between the two patterns.

                MarkLogic recommends that the Security database only have 1 primary forest.  Having more than one primary forest for the Security database can cause failover issues when doing upgrades and restarts.  The Security database should have a single primary forest, and one replica forest to support High Availability.

                More details available in the knowledge base article How many forests should my Security database have?

                Refer to our documentation for Configuring the Security and Auxiliary Databases to Use Failover Forests

                Summary

                When restarting very large forests, some customers have noted that it may take a while for them to mount. While the forests are mounting, the database is unable to come online, thus impacting the availability of your main site. This article shows you how to change a few database settings to improve forest-mounting time.

                 


                 

                When encountering delays with forest mounting time after restarts, we usually recommend the following settings:

                format-compatibility set to the latest format
                expunge-locks set to none
                index-detection set to none

                Additionally, some customers might be able to spread out the work of memory mapping forest indexes by setting preload-mapped-data to false - though it should be noted that instead of the necessary time being taken during the mounting of the forest, memory-mapped file data will be loaded on demand through page faults as the server accesses it.

                While the above settings should help with forest mounting time, in general, their effects can be situationally dependent. You can read more about each of these settings in our documentation here: http://docs.marklogic.com/admin-help/database. In particular:


                1) Regarding format compatability: "The automatic detection occurs during database startup and after any database configuration changes, and can take some time and system resources for very large forests and for very large clusters. The default value of automatic is recommended for most installations." So to your question, while automatic is recommended in most cases, you should try changing the setting if you're seeing long forest mount times.

                2) Regarding expunge-locks: "Setting this to none is only recommended to speed cluster startup time for extremely large clusters. The default setting of automatic, which cleans up the locks as they expire, is recommended for most installations."

                3) Regarding index-detection: "This detection occurs during database startup and after any database configuration changes, and can take some time and system resources for very large forests and for very large clusters. Setting this to none also causes queries to use the current database index settings, even if some settings have not completed reindexing. The default value of automatic is recommended for most installations"

                It may also be worth considering why forests are taking a long time to mount. If your data size has grown significantly over the lifetime of the affected database, it might be the case that your forests are now overly large, in which case a better approach might be to instead distribute the data across more forests.

                Introduction
                 
                MarkLogic Server's 'DatabaseClient' instance represents a database connection sharable across threads. The connection is stateless, except that authentication is done the first time a client interacts with the database via a Document Manager, Query Manager, or other manager. For instance: you may instantiate a DatabaseClient as follows:
                 
                // Create the database client

                DatabaseClient client = DatabaseClientFactory.newClient(host, port,
                                                          user, password, authType);

                And release it as follows:
                // release the client
                client.release();

                Details on DatabaseClient Usage

                To use the Java Client API efficiently, it helps to know a little bit about what goes on behind the scenes.

                You specify the enode or load balancer host when you create a database client object.  Internally, the database client object instantiates an Apache HttpClient object to communicate with the host.

                The internal Apache HttpClient object creates a connection pool for the host.  The connection pool makes it possible to reuse a single persistent HTTP connection for many requests, typically improving performance.

                Setting up the connection pool has a cost, however.

                As a result, we strongly recommend that applications create one database client for each unique combination of host, database, and user.  Applications should share the database client across threads.  In addition, applications should keep a reference to the database client for the entire life of the application interaction with that host.


                For instance, a servlet might create the database client during initialization and release the database client during destruction. The same servlet may also use two separate database client instances with different permissions, one for read-only users and one with read/write permissions for editors. In the latter case, both client instances are used throughout the life of the servlet and destroyed during client destruction.

                MarkLogic 11, MarkLogic 10, & MarkLogic 9 all use the Network Security Service (NSS) libraries ( nss, nss-sysinit, nss-tools ) which have known vulnerabilities.

                Library

                Version

                CVE

                Severity

                nss

                3.79.0-4.el7_9 rpm

                CVE-2023-0767

                High

                nss-sysinit

                3.79.0-4.el7_9 rpm

                CVE-2023-0767

                High

                nss-tools

                3.79.0-4.el7_9 rpm

                CVE-2023-0767

                High

                At this time, there is no impact for MarkLogic Server from these vulnerabilities as MarkLogic Server does not exercise or use the code that has the identified vulnerability. 

                Summary

                Clock synchronization plays a critical part in the operation of a MarkLogic Cluster.

                MarkLogic Server expects the system clocks to be synchronized across all the nodes in a cluster, as well as between Primary and Replica clusters. The acceptable level of clock skew (or drift) between hosts is less than 0.5 seconds, and values greater than 30 seconds will trigger XDMP-CLOCKSKEW errors, and could impact cluster availability.

                Tools

                Network Time Protocol (NTP) is the recommended solution for maintaining system clock synchronization.  NTP services can be provided by public (internet) servers, private servers, network devices, peer servers and more.

                NTP Basics

                NTP uses a daemon process (ntpd) that runs on the host.  The ntpd periodically wakes up, and polls the configured NTP servers to get the current time, and then adjust the local system clock as necessary.  Time can be adjusted two ways, by immediately changing to the correct time, or by slowly speeding up or slowing down the system clock as necessary until it has reached the correct time. The frequency that the ntpd wakes up, called the polling interval, can be adjusted based on the level of accuracy needed anywhere between 1 and 17 minutes.  NTP uses a hierarchy of servers called a strata.  Each strata synchronizes with the layer above it, and provides synchronization to the later below it.

                Public NTP Reference Servers

                There are many public NTP reference servers available for time synchronization.  It's important to note that the most common public NTP reference server addresses are for a pool of servers, so hosts synchronizing against them may end up using different physical servers.  Additionally, the level of polling recommended for cluster synchronization is usually higher, and excessive polling could result in the reference server throttling or blocking traffic from your systems.

                Stand Alone Cluster

                For a cluster that is not replicated or connected to another cluster in some way, the primary concern is that all the hosts in the cluster be in sync with each other, rather than being accurate to UTC.

                Primary/Replica Clusters

                Clusters that act as either Primary or Replicas need to be synchronized with each other for replication to work correctly.  This usually means that the hosts in both clusters should reference the same NTP servers.

                NTP Configuration

                Time Synchronization Configuration Files

                It is common to have multiple servers referenced in the chronyd configuration file, /etc/chrony.conf or the ntpd configuration file, /etc/ntpd.conf. NTP may not choose the server based on the order in the file.  Because of this, hosts could synchronize with different reference servers, introducing differences in the system clocks between the hosts in the cluster. Most organizations may have devices that can act as NTP servers in their infrastructure already, as many network devices are capable of acting as NTP servers, as are Windows Primary Domain Controllers.  These devices can use default polling intervals, which avoids excessive polling against public servers.

                Once you have identified your NTP server, you can configure the NTP daemon on the cluster hosts. We suggest using a single reference server for all the cluster hosts, then add all the hosts in the cluster as peers of the current node.  We also suggest adding an entry for the local host as it's own server, assigning it a low strata. Using peers allows the cluster hosts to negotiate and elect a host to act as the reference server, providing redundancy in case the reference server is unavailable.

                Common Configuration Options

                The burst option sends a burst of 8 packets when polling to increase the average quality of time offset statistics.  Using it against a public NTP server is considered abuse.

                The iburst sends a burst of 8 packets at initial synchronization which is designed to speed up the initial synchronization at startup.  Using it against a public NTP server is considered aggressive.

                The minpoll and maxpoll settings are measured in seconds to the power of two, so a setting of 4 is 16 seconds, so setting minpoll and maxpoll to 4 will cause the host to check time approximately every minute.

                Time Synchronization with chronyd

                The following is a sample chrony.conf file:

                # Primary NTP Source

                server *.*.*.200 burst iburst minpoll 4 maxpoll 4

                # Allow peering as a backup to the primary time servers

                peer mlHost01 burst iburst minpoll 4 maxpoll 4
                peer mlHost02 burst iburst minpoll 4 maxpoll 4
                peer mlHost03 burst iburst minpoll 4 maxpoll 4

                # Serve time even if not synchronized to a time source (for peering)
                local stratum 10

                # Allow other hosts on subnet to get time from this host (for peering)
                # Can also be specified by individual IP
                # https://chrony.tuxfamily.org/manual.html#allow-directive
                allow *.*.*.0

                # By default chrony will not step the clock after the initial few time checks.
                # Changing the makestep option allows the clock to be stepped if its offset is larger than .5 seconds.
                makestep 0.5 -1

                The other settings (driftfile, rtsync, log) can be left as is, and the new settings will take effect after the chronyd service is restarted.

                Time Synchronization with ntpd

                The following is a sample ntpd.conf file:

                #The current host has an ip of 10.10.0.1
                server ntpserver burst iburst minpoll 4 maxpoll 4
                 
                #All of the cluster hosts are peered with each other.
                peer mlHost01 burst iburst minpoll 4 maxpoll 4
                peer mlHost02 burst iburst minpoll 4 maxpoll 4
                peer mlHost03 burst iburst minpoll 4 maxpoll 4
                 
                #Add the local host so the peered servers can negotiate
                # and choose a host to act as the reference server
                server 10.10.0.1
                fudge 10.10.0.1 stratum 10

                The fudge setting is used to alter the stratum of the server from the default of 0.

                Choosing Between NTP Daemons

                Red Hat states that chrony is the preferred NTP daemon, and should be used when possible.

                Chrony should be preferred for all systems except for the systems that are managed or monitored by tools that do not support chrony, or the systems that have a hardware reference clock which cannot be used with chrony.

                As always, system configuration changes should always be tested and validated prior to putting them into production use.

                References

                Summary

                On March 1, 2016, a vulnerability in OpenSSL named DROWN, a man-in-the-middle attack that stands for “Decrypting RSA with Obsolete and Weakened eNcryption", was announced. All MarkLogic Server versions 5.0 and later are *not* affected by this vulnerability.

                Advisory

                The Advisory reported by OpenSSL.org states

                CVE-2016-0800 (OpenSSL advisory)  [High severity] 1st March 2016: 

                A cross-protocol attack was discovered that could lead to decryption of TLS sessions by using a server supporting SSLv2 and EXPORT cipher suites as a Bleichenbacher RSA padding oracle. Note that traffic between clients and non-vulnerable servers can be decrypted provided another server supporting SSLv2 and EXPORT ciphers (even with a different protocol such as SMTP, IMAP or POP) shares the RSA keys of the non-vulnerable server. This vulnerability is known as DROWN (CVE-2016-0800). Recovering one session key requires the attacker to perform approximately 2^50 computation, as well as thousands of connections to the affected server. A more efficient variant of the DROWN attack exists against unpatched OpenSSL servers using versions that predate 1.0.2a, 1.0.1m, 1.0.0r and 0.9.8zf released on 19/Mar/2015 (see CVE-2016-0703 below). Users can avoid this issue by disabling the SSLv2 protocol in all their SSL/TLS servers, if they've not done so already. Disabling all SSLv2 ciphers is also sufficient, provided the patches for CVE-2015-3197 (fixed in OpenSSL 1.0.1r and 1.0.2f) have been deployed. Servers that have not disabled the SSLv2 protocol, and are not patched for CVE-2015-3197 are vulnerable to DROWN even if all SSLv2 ciphers are nominally disabled, because malicious clients can force the use of SSLv2 with EXPORT ciphers. OpenSSL 1.0.2g and 1.0.1s deploy the following mitigation against DROWN: SSLv2 is now by default disabled at build-time. Builds that are not configured with "enable-ssl2" will not support SSLv2. Even if "enable-ssl2" is used, users who want to negotiate SSLv2 via the version-flexible SSLv23_method() will need to explicitly call either of: SSL_CTX_clear_options(ctx, SSL_OP_NO_SSLv2); or SSL_clear_options(ssl, SSL_OP_NO_SSLv2); as appropriate. Even if either of those is used, or the application explicitly uses the version-specific SSLv2_method() or its client or server variants, SSLv2 ciphers vulnerable to exhaustive search key recovery have been removed. Specifically, the SSLv2 40-bit EXPORT ciphers, and SSLv2 56-bit DES are no longer available. In addition, weak ciphers in SSLv3 and up are now disabled in default builds of OpenSSL. Builds that are not configured with "enable-weak-ssl-ciphers" will not provide any "EXPORT" or "LOW" strength ciphers. Reported by Nimrod Aviram and Sebastian Schinzel.

                Fixed in OpenSSL 1.0.1s (Affected 1.0.1r, 1.0.1q, 1.0.1p, 1.0.1o, 1.0.1n, 1.0.1m, 1.0.1l, 1.0.1k, 1.0.1j, 1.0.1i, 1.0.1h, 1.0.1g, 1.0.1f, 1.0.1e, 1.0.1d, 1.0.1c, 1.0.1b, 1.0.1a, 1.0.1)

                Fixed in OpenSSL 1.0.2g (Affected 1.0.2f, 1.0.2e, 1.0.2d, 1.0.2c, 1.0.2b, 1.0.2a, 1.0.2)

                MarkLogic Server Details

                Marklogic Server disallows SSLv2 and disallows weak ciphers in all supported version.  As a result, MarkLogic Server is not affected by this vulverability.

                Whenever MarkLogic releases a new version of MarkLogic Server, OpenSSL versions are reviewed and updated. 

                 

                OpenSSL.org released a blogpost announcement regarding the OpenSSL vulnerabilities CVE-2022-3786 (“X.509 Email Address Variable Length Buffer Overflow”) and CVE-2022-3602 (“X.509 Email Address 4-byte Buffer Overflow”) along with the OpenSSL 3.0.7 release.  The vulnerability has been downgraded from CRITICAL to HIGH.

                Since MarkLogic family of products have not yet switched to OpenSSL 3.x, MarkLogic Server is NOT Impacted by this critical vulnerability.

                For more details, please refer to https://www.openssl.org/blog/blog/2022/11/01/email-address-overflows/

                Original Article Content

                (published 10/28/2022)

                The OpenSSL project team announced the forthcoming release of OpenSSL version 3.0.7, expected to be available on Tuesday 1st November 2022 between 1300-1700 UTC.

                At the time of this writing, MarkLogic understands that this release addresses a CRITICAL vulnerability that was discovered in the OpenSSL 3.x version stream and that it does not affect earlier versions of OpenSSL.  Since MarkLogic family of products have not yet switched to OpenSSL 3.x, we believe  MarkLogic Server is NOT Impacted by this critical vulnerability.

                OpenSSL.org is expected to provide details regarding the vulnerability itself and potential exploits on Tuesday, November 1st, along with the patch release.  Since this is a CRITICAL level vulnerability they are giving advance notice so organizations can be ready to act as soon as the patch is released.  If you want to receive announcements directly from OpenSSL.org, you can register here.

                Note: The Ops Director feature has been deprecated as of September 30, 2020 and support ended on November 14, 2021.

                Introduction

                Ops Director enables you to monitor MarkLogic clusters ranging from a single node to large multi-node deployments. A single Ops Director server can monitor multiple clusters. Ops Director provides a unified browser-based interface for easy access and navigation.

                Ops Director presents a consolidated view of your MarkLogic infrastructure, to streamline monitoring and troubleshooting of clusters with alerting, performance, and log data. Ops Director provides enterprise-grade security of your cluster configuration and performance data with robust role-based access control and information security powered by MarkLogic Server.

                Problems installing Ops Director 2.0.0, 2.0.1 & 2.0.1-1

                Check gradle.properties

                To successfully install Ops Director, the value for mlhost in gradle.properties must have a hostname and that hostname must match the name of one of the hosts in the cluster.  You can not use localhost to install Ops Director, nor can you use a host name other than one that is listed as a host in the cluster as this effects the use of certificates for authentication to the OpsDirectorSystem application server.

                Check for App-Services

                Ops Director can sometimes encounter errors when attempting to install in groups other than Default. To successfully install, the Ops Director installer needs to be able to connect to the App-Services application server on port 8000 in the group where Ops Director is being installed.  There are two ways to work around this issue:

                • Create a copy of the App-Services app server in the new group, then install Ops Director
                  • Be aware this allows QConsole access in the new group, for users with appropriate privileges. 
                  • If you wish to prevent QConsole access in that group, the App-Services application server should be deleted after Ops Director has been installed.
                • Install Ops Director in the Default group, then move the host to the new group, and create the OpsDirector app servers in the new group.
                  • Be aware this allows Ops Director access to remain in the Default group.
                  • If you wish to prevent Ops Director access in the Default, the Ops Director application servers should be deleted from the Default group.
                    • To do this you must also copy the scheduled tasks associated with Ops Director over to the new group, and delete the scheduled tasks from the old group

                See the attached Workspace OpsDirCopyAppServers.xml which has scripts to do the following:

                • Copy and/or remove the App-Services app server
                • Copy and/or remove the OpsDirectorSystem/OpsDirectorApplication/SecureManage app servers
                • Copy and/or remove the scheduled tasks associated with the Ops Director application.

                Also note that Ops Director will install forests on all hosts in the cluster, regardless of group assignments.

                Managing a Cluster

                Check DNS Settings

                When setting up a managed host, it's important to note that the hosts in both the Ops Director cluster, and the cluster being managed must be able to resolve hostnames via DNS.  Modifying the /etc/hosts file is not sufficient.

                Check Ops Director Scheduled Tasks

                When setting up a managed host, you may encounter a XDMP-DEADLOCK error, or have an issue seeing the data for a managed cluster.  If this occurs do the following:

                • Un-manage the affected cluster.  If there are any issues un-managing the cluster, use the procedures in this KB under the Problems with Un-managing Clusters to un-manage the cluster
                • Disable the scheduled tasks associated with Ops Director
                  • /common/tasks/info.xqy
                  • /common/tasks/running.xqy
                  • /common/tasks/expire.xqy
                  • /common/tasks/health.xqy
                • Manage the cluster again
                • Enable the scheduled tasks that were disabled

                Verify Necessary Ports are Open

                Assuming the default installation ports are in use, verify the following access:

                • 8003 Inbound TCP on the Managed Cluster, accessed by the Ops Director Cluster.
                • 8008 Inbound TCP on the Ops Director Cluster, accessed by the Ops Director Users.
                • 8009 Inbound TCP on the Ops Director Cluster, accessed by the Managed Cluster

                Upgrading Ops Director

                When upgrading to a new version of Ops Director, it may necessary to uninstall the previous version.  To do that, you must un-manage any clusters being managed by Ops Director, prior to uninstalling the application.

                Un-managing Clusters

                The first step in uninstalling Ops Director is to remove any clusters from being managed from Ops Director.  This is done via the Admin UI on a host in the managed cluster, as detailed in the Ops Director Guide: Disconnecting a Managed Cluster from Ops Director

                Uninstalling Ops Director 2.0.0 & 2.0.1

                These versions of Ops Director use the ml-gradle plugin for deployment.  To uninstall these versions, you will also use gradle, as detailed in the Ops Director Guide: Removing Ops Director 2.0.0 and 2.0.1

                Uninstalling Ops Director 1.1 or Earlier

                If you are using the 1.1  version that was installed via the Admin UI, then it can be uninstalled via the Admin UI as detailed in the Ops Director Guide: Removing Ops Director 1.1 or Earlier

                Problems with Uninstalling Ops Director

                Occasionally an Ops Director installation may partially fail, due to misconfiguration, or missing dependencies.  Issues can also occur that prevent the standard removal methods from working correctly.  In these cases, Ops Director can be removed manually using the attached QConsole Workspace, OpsDirRemove.xml.  The instructions for running the scripts are contained in the first tab of the workspace.

                Problems with Un-managing Clusters

                Occasionally, disconnecting a managed cluster from Ops Director may partially fail.  If this occurs, you can use the attached QConsole Workspace, OpsDirUnmanage.xml.  The instructions for running the scripts are contained in the first tab of the workspace.

                Further Reading

                Installing, Uninstalling, and Configuring Ops Director

                Monitoring MarkLogic with Ops Director

                Introduction

                MarkLogic offers many different ways to access your data. The best interface to use is ultimately determined by your use case. The table below is taken from MarkLogic University's on demand training course "Using the Optic API." That course runs only 12 minutes, and the table below appears at the 9:50 mark.

                When to use which API

                OPTIC SEARCH SQL SPARQL
                Data Shape Multi-model Documents Relational lens Semantic data
                Output Rows, Documents, Any structure Documents, parts of documents (snippet, highlight) Rows, values Solutions
                Strengths Combines aspects of each query mechanism Discovery, relevance, fuzzy text, matching/stemming Joins, aggregates, summarizing large data sets, exact matches Relating entities, linking facts, inferring relationships
                Sample queries Review liability terms for every Race with >1000 Runners that offered a "cash prize" What is the best holiday race for me to enter (nearest, highest rated, most relevant)? Which Races had the most evenly balanced Runners based on gender? Find Runners who ran Races in Europe

                Summary

                Performance of MarkLogic Server query evaluation can be impacted by user and roles the user inherits running the query.

                Impact of Number of Roles inherited by User on Query evaluation.

                When application users are assigned necessary application roles, security evaluation for each user comes into play. By design, query performance is inversely proportional to the number of roles inherited by the user executing the query. Meaning, each new Role user inherits, Query run by that user will take little longer to evaluate Security schema.

                Question: How does number of Roles inherited by user increase query evaluation time?

                For each role that a user has, MarkLogic Server adds an index term to every query the user executes.

                For example, if a user inherits ten roles, MarkLogic Server adds ten terms to every query the user executes; One hundred roles adds one hundred terms to every query; One thousand roles adds one thousand terms to every query that specific user runs.

                If your testing shows that the performance of queries with hundreds of terms is acceptable, then having a user inherit hundreds of roles may also be acceptable. However, if a query with hundreds of terms is too slow, then a user inheriting hundred of Roles will also be too slow.

                Question: Does a large number of new roles for different users, but not all roles inherited by single user, have impact on query performance ?

                You can have thousands of roles defined and not have your query performance affected by the security evaluation overhead, as long as those roles are not inherited by same user. It is only when those roles are all inherited by a single user, do they increase the security evaluation overhead for queries run by that particular user.

                Query performance is not correlated with the total number of roles, but there is performance degradation with the number of roles per user. MarkLogic can easily handle tens of thousands of total roles, but cannot easily handle more than tens of roles per user.

                Recommendation:

                It is unlikely that thousands of roles inherited by user will give acceptable performance to query run by that specific user. Unless absolute necessary and role evaluation performance overhead considered, we recommend against using thousands of roles for user.

                Further Reading

                Summary

                This article briefly looks at the performance implications of ad hoc queries versus passing external variables to a query in a module

                Details

                Programatically, you can achieve similar results by dynamically generating ad hoc queries on the client as you can by definining your queries in modules and passing in external variable values as necessary.

                Dynamically generating ad hoc queries on the client side results in each of your queries being compiled and linked with library modules before they can be evaluated - for every query you submit. In contrast, queries in modules only experience that performance overhead the first time they're invoked.

                While it's possible to submit queries to MarkLogic Server in any number of ways, in terms of performance, it's far better to define your queries in modules, passing in external variable values as necessary.

                Summary

                MarkLogic does not enforce a programmatic upper limit on How many indexes you *can* have. This leaves open the question of how many range indexes should be used in your application. The answer is that you should have as many as the application requires, but with the caveat that there are some infrastructure limits that should be taken into account. For instance:

                1. More Memory Mapped file Handles (file fd)

                OS has limits of how many file handles a given process can have at a given point in time. This limit, therefore, affects how many range index files, and therefore range indexes a given MarkLogic process can have; However, One could configure higher File Handle limits on most platforms (ulimit, vm.max_map_count).

                2. More RAM requirement 

                In-memory footprint of node involves In-memory structures like in-memory-list-cache, in-memory-tree-cache, in-memory-range index, in-memory-reverse-index (if-reverse-query-enabled) , in-memory-triple-index (if-triple-positions-enabled); multiply those with total number of forests + buffer.

                A Large number of Range indexes can result in a huge index expansion in memory use. Also, values mentioned above are in addition to memory that would be required for MarkLogic Server to maintain its HTTP servers, perform merges, reindex, re-balance, as well as operations like processing queries, etc.

                Tip: Memory consumption can be reduced by configuring a database to optimize range indexes for minimum memory usage (memory-size); Default is configured for maximum performance (facet-time). 

                UI : Admin UI > Databases > {database-name} > Configure > range index optimize [facet-time or memory-size]

                API : admin:database-set-range-index-optimize 

                3. Longer Merge Times (Bigger stands due to Large index expansion)

                Large number of Range Index ends up expanding data in forests. Now for a given host size and number of hosts- larger stand sizes in forest will make range index query faster; However it will also make merge times slower. If we want to make Queries and merges all fast with a large number of range indexes, we will need to scale out the number of physical hosts. 

                4. More CPU, Disk & IO requirement 

                Merges are IO intensive processes; this, combined with frequent updates/load could result in CPU as well as IO bottlenecks.

                5. Longer Forest Mount times

                In general, Each configured range index with data takes two memory mapped files per stand.

                A typical busy host has on the order of 10 forests, each forest with on the order of 10 stands; So a typical busy host has on the order of 100 stands.

                Now for 100 stands -

                • With 100 range indexes, we have in the order of 10,000 files to open and map when the server starts up.
                • While for 1,000 range indexes, we have in the order of 100,000 files to open and map when the server starts up.
                • While for 10,000 range indexes, we have in the order of 1,000,000 mapped files to open and map when the server starts up.

                As we increase our range indexes, at some point of time, Server will take unreasonably long time to start up (unless we throw equivalent processing power).

                The amount of time one is willing to wait for the server to start up is not a hard limit, but the question should be "what is 'reasonable' behavior for Server start-up in eyes of Server Admin based on current hardware."

                Conclusion

                Range Indexes in magnitude of a thousand starts affecting Performance if not managed properly and if above consideration are not accounted for; In most scenarios the solution to the problem is not about "How many indexes can we configure", but rather about "How many indexes do we need".

                MarkLogic considers configured range index in the order of 100 as a “reasonable” limit, because it results in “reasonable” behaviors of the Server.

                Tips for Best Performance for Solutions with lots of Range Indexes

                Before launching your application, review the number of Range Indexes and work to 1) Remove ones that are not being used, and 2) Consolidate any range indexes that are mutually redundant. This will help you get under the prescribed 100 range index limit.

                On systems that already have a large number of range indexes (say 100+), merging multiple stands may become a performance issue. Thus, you will need to think about easing the query and merge load, here are some strategies for easing the load on your system: 

                1. Increase merge-max-size from 32768 to 49152 on your database. This will create larger stands and will lower the number of merges that need to be performed.
                2. There is configuration setting "preload mapped data" (default false), by leaving it as false, it will speed up merging of forest stands. Bear in mind that this will come at the cost of slower query performance immediately after forest mounts.
                3. If your system begins to slow down due to merging activity, you can spread the load by adding more hosts & forests to your cluster. The smaller forests and stands will merge and load faster when there are more CPU cores and IO bandwidth to service them.

                Further Reading

                Performance implications of updating Module and Schema databases

                This article briefly looks at the performance implications of adding or modifying modules or schemas to live (production) databases.

                Details

                When XQuery modules or schemas are referenced for the first time after upload, they are parsed and then cached in memory so that subsequent access is faster.

                When a module is added or updated, the modules cache is invalidated and every module (for all Modules databases within the cluster) will need to be parsed again before they can be evaluated by MarkLogic Server.

                Special consideration should be made when updating modules or schemas in a production environment as reparsing can impact the performance of MarkLogic server for the duration that the cache is being rebuilt.

                MarkLogic was designed with the assumption that modules and schemas are rarely updated. As such, the recommendation is that updates to modules or schemas in production environments is made during periods of low activity or out of hours.

                Further reading

                Overview

                Performance issues in MarkLogic Server typically involve either 1) unnecessary waiting on locks or 2) overlarge workloads. The goal of this knowledgebase article is to give a high level overview of both of these classes of performance issue, as well as some guidelines in terms of what they look like - and what you should do about them.

                Waiting on Locks

                We often see customer applications waiting on unnecessary read or write locks. 

                What does waiting on read or write locks look like? You can see read or write lock activity in our Monitoring History dashboard at port 8002 in the Lock Rate, Lock Wait Load, Lock Hold Load, and Deadlock Wait Load displays. This scenario will typically present with low resource utilization, but spikes in the read/write lock displays and high request latency.

                What should you do when faced with unnecessary read or write locks? Remediation of this scenario pretty much always goes through optimization of either request code, data model, or both. Additional hardware resources will not help in this case because there is no hardware resource bound present. You can learn more about data model optimizations through MarkLogic University's On-Demand courses, in particular XML and JSON Data Modeling Best Practices and Impact of Normalization: Lessons Learned

                Relevant Knowledgebase articles:

                1. Understanding XDMP Deadlock
                2. How Do Updates Work in MarkLogic Server?
                3. Fast vs Strict Locking
                4. Read Only Queries Run at a Timestamp & Update Transactions use Locks
                5. Performance Theory: Tales From MarkLogic Support

                Overlarge Workloads

                Overlarge workloads typically take two forms: a. too many concurrent workloads or b. work intensive individual requests

                Too Many Concurrent Workloads

                With regard to too many concurrent workloads - we often see clusters exhibit poor performance when subjected to many more workloads than the cluster can reasonably handle. In this scenario, any individual workload could be fine - but when the total amount of work over many, many concurrently running workloads is large, the end result is often the oversubscription of the underlying resources.

                What does too many concurrent workloads look like? You can see this scenario in our Monitoring History at port 8002, in the Disk I/O, CPU, Memory Footprint, App Server Request Rate, App Server Latency, or Task Server Queue Size displays. This scenario will typically present with spikes in both App Server Latency and App Server Request Rate, and correlated maximum level plateaus in one or more of the aforementioned hardware resource utilization charts.

                What should you do when faced with too many concurrent workloads? Remediation of this scenario pretty much always involves the addition of more rate-limiting hardware resource(s). This assumes, of course, that request code and/or data model are both already fully optimized. If either could be further optimized, then it might be possible to enable a higher request count given the same amount of resources - see the "Work Intensive Individual Requests" section, below. Rarely, in circumstances where traffic spikes are unpredictable - but likely - we’ve seen customers incorporate load shedding or traffic management techniques in their application architectures. For example, when request times pass a certain threshold, traffic is then routed through a less resource hungry code path.

                Note that concurrent workloads entail both request workload and maintenance activities such as merging or reindexing. If your cluster is not able to serve both requests and maintenance activities, then the remediation tactics are the same as listed above: you either need to a. add more rate-limiting hardware resource(s) to serve both, or b. you need to incorporate load shedding or traffic management techniques like restricting maintenance activities to periods where the necessary resources are indeed available.

                Relevant Knowledgebase articles:

                1. When submitting lots of parallel queries, some subset of those queries take much longer - why?
                2. How reindexing works, and its impact on performance
                3. MarkLogic Server I/O Requirements Guide
                4. Sizing E-nodes
                5. Performance Theory: Tales From MarkLogic Support
                Work Intensive Individual Requests

                With regard to work intensive individual requests - we often see clusters exhibit poor performance when individual requests attempt to do too much work. Too much work can entail an unoptimized query, but it can also be seen when an otherwise optimized query attempts to work over a dataset that has grown past its original hardware specification.

                What do work intensive requests look like? You can see this scenario in our Monitoring History at port 8002, in the Disk I/O, CPU, Memory Footprint, App Server Request Rate, App Server Latency, or Task Server Queue Size displays. This scenario will typically present with spikes in one or more system resources (Disk I/O, CPU, Memory Footprint) and App Server Latency. In contrast to the "Too Many Concurrent Requests" scenario App Server Request Rate should not exhibit a spike.

                What should you do when faced with work intensive requests? As in the case with too many concurrent requests, it's sometimes possible for customers to address this situation with additional hardware resources. However, remediation in this scenario more typically involves finding additional efficiencies via code or data model optimizations. Code optimizations can be made with the use of xdmp:plan() and xdmp:query-trace(). You can learn more about data model optimizations through MarkLogic University's On-Demand courses, in particular XML and JSON Data Modeling Best Practices and Impact of Normalization: Lessons Learned. If the increase in work is rooted in data growth, it's also possible to reduce the amount of data. Customers pursuing this route will typically do periodic data purges or by using features like Tiered Storage.

                Relevant Knowledgebase articles:

                1. Gathering information to troubleshoot long-running queries
                2. Fast searches: resolving from the indexes vs. filtering
                3. What do I do about XDMP-LISTCACHEFULL errors?
                4. Resolving XDMP-EXPNTREECACHEFULL errors
                5. When should I look into query or data model tuning?
                6. Performance Theory: Tales From MarkLogic Support

                Additional Resources

                1. Monitoring MarkLogic Guide
                2. Query Performance and Tuning Guide
                3. Performance: Understanding System Resources

                 

                ATTENTION

                This knowledgebase article dates from 2014 - which is a long time ago in terms of available hardware and MarkLogic development. While some of the fundamental principles in the article bellow still apply, you'll find more recent specific guidance in this "Performance Testing with MarkLogic" whitepaper.


                Performance Theory: Tales From MarkLogic Support

                This article is a snapshot of the talk that Jason Hunter and Franklin Salonga gave next at MarkLogic World 2014, also titled, “Performance Theory: Tales From The MarkLogic Support Desk.” Jason Hunter is Chief Architect and Frank Salonga is Lead Engineer at MarkLogic. 

                MarkLogic is extremely well-designed, and from the ground up it’s built for speed, yet many of our support cases have to do with performance. Often that’s because people are following historical conventions that no longer apply. Today, there are big-memory systems using a 64-bit address space with lots of CPU cores, holding disks that are insanely fast (but that haven’t grown in speed as much as they have in size*), hooked together by high-speed bandwidth. MarkLogic lives natively in this new reality, and that changes the guidelines you want to follow for finding optimal performance in your database.

                The Top 10 (Actually 16) Tips

                The following is a list of top 16 tips to realize optimal performance when using MarkLogic, all based on some of the common problems encountered by our customers:

                1. Buy Enough Iron
                MarkLogic is optimized for server-grade systems, those just to the left of the hockey-stick price jump. Today (April 2014) that means 16 cores, 128-256 Gigs of RAM, 8-20 TB of disk, 2 disk controllers.

                2. Aim for 100KB docs +/- 2 Orders of Magnitude
                MarkLogic’s internal algorithms are optimized for documents around 100 KB (remember, in MarkLogic, each document should be one unit of query and should be seen more like relational rows than tables). You can go down to 1 KB but below that the memory/disk/lock overhead per document starts to be troublesome. And, you can go up to 10 MB but above that line the time to read it off disk starts to be noticeable.

                3. Avoid Fragmentation
                Just avoid it, but if you must, then understand the tradeoffs.  See also Search and Fragmentation.

                4. Think of MarkLogic Like an Only Child
                It’s not a bug to use 100 percent of the CPU—that’s a feature. MarkLogic assumes you want maximum performance given available resources. If you’re using shared resources (a SAN, a virtual machine) you may want to impose restrictions that limit what MarkLogic can use.

                5. Six Forests, Six Replicas
                Every use case is different, but in general deployments of MarkLogic 7 are proving optimal with 6 forests on each computer and (if doing High Availability) 6 replicas.

                6. Earlier Indexing is Better Indexing
                Adding an index after loading requires touching every document with data relating to that index. Turning off an index is instant, but no space will be reclaimed until the re-index occurs. A little thought into index settings before loading will save you time.

                7. Filtering: Your Friend or Foe
                Indexes isolate candidate documents, then filtering verifies the hits. Filtering lets you get accurate results even without accurate indexes (e.g., a case sensitive query without the case sensitive index). So, watch out, as filtering can hide bad index settings! If you really trust the indexes, you can use “unfiltered.” It is best to perfect your index settings in a small test environment, then apply them to production.

                8. Use Meaningful Markup If You Can
                If you can use meaningful markup (where the tags describe the content they hold) you get both prettier XML and XML that’s easier to write indexes against.

                9. Don’t Try to Outsmart Merging
                Contact support if you plan to change any of the advanced merge settings (max size, min size, min ratio, timeout periods). You shouldn’t usually tweak these. If you’re thinking about merge settings, you’re probably underprovisioned (See Recommendation #1).

                10. Big Reads Go In Queries, Not Updates
                Hurrah! Using MVCC for transaction processing means lock-free reads. But, to be a “read” your module can’t include any update calls. This is determined by static analysis in advance, so even if the update call isn’t made, it still changes your behavior. Locks are cheap but they’re not free, and any big search to find the top 10 results will lock the full result set during the sort. Whenever possible, do update calls in a separate nested transaction context using xdmp:invoke() with an option specifying “different-transaction”.

                11. Taste Test
                Load a bit of data early, so you can get an idea about rates, sizes, and loads. Different index settings will affect performance and sizes. Test at a few sizes because some things scale linearly, some logarithmically.

                12. Measure
                Measure before. Measure after. Measure at all levels. When you know what’s normal, you can isolate when something goes different. MarkLogic 7 can internally capture “Monitoring History” to a Meters database. There are also tools such as Cacti, Ganglia, Nagios, Graphite, and others.

                13. Keep a Staging Box
                A staging box (or cluster) means you can measure changes in isolation (new application code, new indexes, new data models, MarkLogic upgrades, etc.). If you’re running on a cluster, then stage on a cluster (because you’ll see the effects of distribution, like net traffic and 2-phase commits). With AWS it’s easier than ever to “spin up” a cluster to test something.

                14. Adjust as Needed
                You need to be measuring so you know what is normal and then know what you should adjust. So, what can you adjust?

                • Code: Adjusting your code often provides the biggest bang
                • Memory sizes: The defaults assume a combo E-node/D-node server
                • Indexes: Best in advance, maybe during tasting. Or, try on staging
                • Cluster size and forest distribution: This is much easier in MarkLogic 7

                15. Follow Our Advice on Swap Space
                Our release notes tell you:

                • Windows: 2x the physical memory
                • Linux: 1x the physical memory (minus any huge pages), or 32GB, whichever is lower
                • Solaris: 1x-2x the physical memory

                MarkLogic doesn’t intend to leverage swap space! But, for an OS to give memory to MarkLogic, it wants the swap space to exist. Remember, disk is 100x cheaper than RAM, and this helps us use the RAM.

                16. Don’t Forget New Features
                MarkLogic has plenty of features that help with performance, including MLCP, tiered storage, and semantics. With the MLCP fast-load option, you can perform forest assignments on the client, and directly insert to that forest. It’s really a sharp tool, but you don’t use it if you’re changing forest topology or assignment policies. With tiered storage, you can use HDFS as cheap mass storage of data that doesn’t need high performance. Remember, you can “partition” data (i.e. based on dates) and let it age to slower disks. With semantics, you have a whole new way to model your data, which in many cases can produce easier to optimize queries.

                That’s it! With these pro tips, you should be able to handle the most common performance issues. 

                *With regard to storage, as you add capacity, it is critical that you add throughput in order to maintain a fast system (http://tylermuth.wordpress.com/2011/11/02/a-little-hard-drive-history-and-the-big-data-problem/)

                Summary

                This is a procedure to assist with maintenance activities that may require the MarkLogic service to be shutdown for a period of time, or for an OS reboot, while minimizing unavailability. It is assumed that High Availability (HA) is configured using local disk failover and all primary forests have a replica forest configured.

                NOTE: Security and App-Services databases must also be configured for HA.

                When a host in a MarkLogic cluster becomes unavailable, the host is not be fully disconnected from the cluster until the configured host timeout(default is 30 seconds) expires. If a primary forest resides on that host, the database and any application that references it will be unavailable from the time the host becomes unavailable until all replica forests assume the role of acting primary.

                If the host unavailability is planned, then you can take steps to minimize the database and application unavailability. This article discusses that a procedure.

                Planning

                When a host from the MarkLogic cluster is taken offline, all the remaining hosts must assume the workload previously performed by that host. For this reason, we recommend:

                • Scheduling server maintenance during low usage periods.
                • Evenly distributing a host's replica forests across the other nodes in the cluster so that the extra workload is evenly distributed when that host is unavailable.
                • Minimize the number of hosts removed for maintenance at any one time.

                If performing maintenance on more than one host at a time:

                • Define a maintenance group of hosts containing primary forests that have their local disk replica forests on hosts not in the maintenance group.
                • All required forests must have replica forests defined. This includes all content forest, security database forests and forests for all linked schema databases.

                Maintenance groups should be sized so that the remaining available hosts represents a reasonable portion of compute, memory and IO resources that can absorb the extra workload required during the maintenance period.

                Step 0: Verify all replica forests are synchronized

                Before initiating this procedure, verify that all replica forests are in sync with the primary forest by checking the forest status of the replicas are in the “sync replicating” state.

                This can be achieved using the MarkLogic Server administrative function xdmp:foreststatus or the Management API GET /manage/v2/forests/{id|name}?view=status endpoint.

                Step 1: Shutdown the host via REST API, forcing an immediate failover

                Make a call to the /manage/v2/hosts/{id|name} (POST) endpoint, setting failover to true.

                curl --anyauth --user user:password -X POST -i --data "state=shutdown&failover=true" 
                -H "Content-type: application/x-www-form-urlencoded" 
                http://localhost:8002/manage/v2/hosts/my-host?format=JSON Using this endpoint with the failover parameter tells the cluster to use fast failover, which immediately fails the primary forests managed by that host over to their replicas, instead of waiting 30 seconds for the host to timeout.

                Step 2: Verify failover succeeded

                Wait until all of the replica forests take over – configured replica forests are now the acting primary forests and in the “open” state, while the configured primary forest is now disabled. You can manually monitor forest status in the Admin UI by refreshing the Forest status display. Once all forests have assumed their new roles, the database will be online.

                This step can also be achieved using the methods identified in Step 0.

                Step 3: Verify forests are synchronized

                Once maintenance has been completed and all hosts are back online, some of the replica forests may still be the acting as primaries. Verify that all acting replicas are in sync with the acting primary forests by by checking the forest status, and checking that the acting replicas are in the "sync replicating" state.

                This step can also be achieved using the methods identified in Step 0.

                Step 4: Force configured primary forests to resume acting primary forest role

                In order to force the configured primary forests to assume the role of acting primary forests, restart the configured replica / acting primary forests together. Restarting all forests together will help minimize outage impact.

                This step can also be achieved using the MarkLogic Server administrative function xdmp:forest-restart or the Managment API POST /manage/v2/forests/{id|name} endpoint.

                Further Reading

                Introduction

                Rolling upgrades are used to upgrade a large cluster with many hosts to a newer version of MarkLogic Server without incurring any downtime in availability or interruption of transactions. 
                This article acts as a supplement to our Rolling upgrade documentation. It discusses the preconditions and assumptions that our feature documentation makes for a successful no-downtime Rolling Upgrade. It also makes a few suggestions to plan the overall approach.

                Assumptions:

                There are a few basic assumptions that our Rolling upgrade documentation makes, and they include ALL of the following:

                1. as suggested in the feature documentation, 'The security database and the schemas database must be on the same host, and that host should be the first host you upgrade when upgrading a cluster.'
                2. fast failover works, and it should be in the order of seconds. Use a xdmp:shutdown with the failover flag set to true.
                3. the MarkLogic node, taken down as part of Rolling Upgrade, does not have any in-flight transactions
                4. The load balancer should have all the requests drained for the node going down and it should not send any more requests to that node.
                5. all failed transactions will automatically retry and should succeed as soon as fast failover is complete.

                Suggestions and approaches:

                1. To avoid breaking open network connections while taking a node down for service during a Rolling upgrade, you must drain your requests from the load balancer before you shut down a particular MarkLogic node. Redirecting all new requests to the remaining available nodes allows you not to lose connections. Most modern-day load balancers (such as F5) should be able to perform such kinds of operations. So include these steps in your overall plan as there may be a need to trigger them manually. So that, while a Rolling upgrade is underway, your Load Balancer is accepting incoming requests and routing them to healthy instances only.

                2. While taking down a node, for a faster failover, use xdmp:shutdown with the failover flag set to true. In the case of REST manage call for the node, use the 'failover=true' URI parameter for faster failover.

                3. Take into account, that it takes time to remount the forests (not just the security forests).

                4. It is important to distribute replica forests evenly in your cluster so that when a cluster node is down, its forests failover adds an even load between the remaining up nodes. So, ensure not overloading a particular node to slow the overall process. Keep a close watch on the logs to see if there is any slowness.

                5. When possible, always plan a maintenance window for the upgrades.

                Summary:

                To plan a successful Rolling upgrade without downtime, keep in mind your whole stack, consider your whole approach, prepare your cluster in advance, and test the approach well.

                References:

                1. Rolling Upgrade Process
                2. Important Points to Note Before Performing Rolling Upgrades

                Introduction

                Administrators can achieve very fine granularity on restores when incremental backups are used in conjunction with log archiving.

                Details

                Journal archiving can enable a restore to be performed to any timestamp since the last incremental backup.  For example, when using daily incremental backups in conjunction with 24-hour log archive retention, a restore can be made to any point in the previous 24 hours.

                This capability enables administrators to go back to the exact point in time before a user error caused bad data to be ingested into the database, minimizing any data loss on the restore. Although this is a very powerful capability, the entire operation to perform a restore is simplified. Administrators can execute a simple operation as the server restores the backup set and replays the journal starting from the timestamp given by the admin.

                For further information, see the documentation Restoring from an Incremental Backup with Journal Archiving.

                Summary

                There are index settings that may be problematic if your documents contain encoded binary data (such as Base64 encoded binary).  This article identifies a couple of these index settings and explains the potential pitafall.

                Details

                When word lexicons or string range indexes are enabled, each stand in the database's forest will contain a file called the 'atom data' file.  The contents of this file includes all of the relevant unique tokens.  This could include all the unique tokens in the forest (stand).  If your documents contain encoded binary data, all of the encode binary may be replicated as atom data and stored in the atom data file.

                Pitfall: There is an undocumented limit on the size of the atom data file of 4GB.  If this limit is exceeded for the content of a forest, then stand merges will begin to fail with the error

                    "XDMP-FORESTERR: Error in merge of forest forest-nameSVC-MAPBIG: Mapped file too large to map: NNN bytes: '\path\Forests\forest-name\stand-id\AtomData'"

                Workarounds

                There are a few options that you can pursue to get around these problems

                1. Do not include encoded binary data in your documents.  An alternative is to store the binary content seperately using MarkLogic Server support for binary documents and to include a reference to the binary document in the original.

                2. If word lexicons are required, and the encoded binary data is limited to a finite number of elements in your documents, then you can create word query exclusions for those elements. In the MarkLogic Server Admin UI, word query element exclusions can be configured by navigating to -> Configure -> Databases -> {database-name} -> Word Query -> Exclude tab. 

                3. If a string range index is defined on an element that contains encoded binary, then you can either remove the string range index or change the document data model so that the element containing the encoded binary is not shared with an element that requires a string range index. 

                 

                 

                Product Alert - Optic security advisory

                Summary 

                The MarkLogic team recently discovered a vulnerability in two Optic query language operators introduced in MarkLogic Server 10.0.6. If you are using MarkLogic Server 10.0.6 or newer you will need to deploy the corresponding patched version.
                As part of our responsible disclosure approach, we are sharing details and remediation steps with our customers who are under active maintenance.
                Customers should upgrade to a patched version of MarkLogic Server as soon as possible.
                Customers unable to upgrade should harden their environment to help protect the query and eval endpoints.

                Who is Affected?

                You are affected if you use MarkLogic Server 10.0-6 or newer AND use RBAC (with or without Compartment Security) or QBAC to restrict read operations AND if one or more of the following is true:
                • You are using op.fromSearch() or op.fromSearchDoc() in your Optic queries
                • The v1/rows endpoint is exposed to allow execution of arbitrary Optic queries directly or via the client libraries.
                • The v1/eval endpoint or an XCC server is exposed, and users have been granted the privilege to execute arbitrary code against the server directly or via the client libraries.
                • Non-admin users have been granted access to Query Console.

                You are not affected if you are using earlier versions of MarkLogic or if you do not rely on role-based or query-based access control (for example if all your data is public) or if you rely on RBAC and QBAC solely to control write operations (inserting, deleting, or updating documents in a database).

                Timeline and Next Steps

                Customers should prioritize upgrade over all other forms of remediation.
                Once you establish that you are affected by verifying your configuration, please log a ticket with our technical support by visiting https://help.marklogic.com/Tickets/Submit/ and specify the exact version of MarkLogic Server you are running so the team can direct you to the appropriate remediation procedure. 
                The following patched MarkLogic Server releases are available:
                10.0-6.6
                10.0-7.4
                10.0-8.5
                10.0-9.7
                10.0-10.2
                11.0.3
                If you are unable to upgrade our technical support team can guide you through disabling the corresponding features.

                Impact and Remediation for DHS customers 

                AWS Data Hub Service (DHS) instances are impacted. MarkLogic CloudOps support team will initiate and open a ticket for affected customers arranging for any planned downtime and steps to remedy.
                DHS Customers do not have action to take for the services hosted by MarkLogic CloudOps, unless Customers also run their own on-prem clusters. In which case they should follow the outlined process for the non-DHS environment ("Impact on MarkLogic Server").

                Upgrade Resources:

                 Q&A

                Q:  What is "RBAC (with or without Compartment Security)"?  

                A:  This is the usual MarkLogic security where read permissions on documents are used to control access.

                Q:  What is in the patch releases?

                A:  Changes between versions can be checked as usual at help.marklogic.com .

                Q: Where can I download above patch releases?

                A:  Click on the latest binary for the MarkLogic version for your operating system that you want to download. For example, click 'MarkLogic Server x64 (AMD64, Intel EM64T) 64-bit Linux RPM' for linux binary.

                Supported product platform and software version compatible configurations can be found at https://developer.marklogic.com/products/support-matrix

                Introduction

                Looking at the MarkLogic Admin UI, you may have noticed that the status page for a given database displays the last backup date and time for a given database. We have been asked in the past how this gets computed so the same check can be performed using your own code. This Knowledgebase article shows examples that utilise XQuery to get this information and explores the possibility of retrieving this using the MarkLogic ReST API

                XQuery: How does the code work?

                The simple answer is in the forest status for each of the forests in the database (note these values only appear if you have created a backup already).  For the sake of these examples, let's say I have a database (called "test") which contains 12 forests (test-1 to test-12).  I can get the backup status using a call to our ReST API:

                http://localhost:8002/manage/LATEST/forests/test-1?view=status&format=html

                In the results returned, you should see something like:

                last-backup : 2016-02-12T12:30:39.916Z datetime
                last-incr-backup : 2016-02-12T12:37:29.085Z datetime
                

                In generating that status page in the MarkLogic Admin UI code, we create an aggregate - a database doesn't contain documents in MarkLogic, it contains forests and those forests contain documents.

                Continuing the example above (with a database called "test" containing 12 forests) if I run the following:

                This will return the forest status(es) for all forests in the database "test" and return the forest names using XPath, so in my case, I would see:

                <forest-name xmlns="http://marklogic.com/xdmp/status/forest">test-1</forest-name>
                [...]
                <forest-name xmlns="http://marklogic.com/xdmp/status/forest">test-12</forest-name>
                

                The MarkLogic Admin UI interrogate each forest in turn for that database and finds the metrics for the last backup.  To put that into context, if we ran the following:

                This gives us:

                <last-backup xmlns="http://marklogic.com/xdmp/status/forest">2016-02-12T12:30:39.946Z</last-backup>
                [...]
                <last-backup xmlns="http://marklogic.com/xdmp/status/forest">2016-02-12T12:30:39.925Z</last-backup>
                

                The code (or the status report) doesn't want values for all 12 forests, it just wants the time the last forest completed the backup (because that's the real time the backup completed), so our code is running a call to fn:max:

                Which gives us the max value (as these are all xs:dateTimes, it's finding the most recent date), which in the case of this example is:

                2016-02-12T12:30:39.993Z

                The same is true for the last incremental backup (note all that we're changing here is the XPath to get to the correct element):

                So we can get the max value for this by getting the most recent time across all forests:

                This would give us 2016-02-12T12:37:29.161Z

                Using the ReST API

                The ReST API does allow you to get this information but you'd need to jump through a few hoops to get to it:

                The ReST API status for a given database would give you the names of all the forests attached to that database:

                http://localhost:8002/manage/LATEST/databases/test

                And from there you could GET the information for all of those forests:

                http://localhost:8002/manage/LATEST/forests/test-1?view=status&format=html
                [...]
                http://localhost:8002/manage/LATEST/forests/test-12?view=status&format=html

                Once you'd got all those values, you could calculate the max values for them - but at this point, I think it would make more sense to write a custom endpoint that returns this information, something like:

                Where you could make a call to that module to get the aggregates (e.g.):

                http://[server]:[port]/[modulename.xqy]?db=test

                This would return the database status for any given parameter-name that is passed in.

                Introduction

                In this Knowledgebase article, we will discuss a technique which will allow you to scope queries in such a way to ensure that they occur only contained within a parent element.

                Details

                cts:element-query

                Consider a containment scenario where you have an XML document structured in this way:

                <rootElement>
                  <id>7635940284725382398</id>
                  <parentElement>
                  <childElement1>valuea</childElement1>
                  <childElement2>false</childElement2>
                  </parentElement>
                  <parentElement>
                  <childElement1>valuea</childElement1>
                <childElement2>truthy</childElement2>
                </parentElement>
                <parentElement>
                <childElement1>valueb</childElement1>
                <childElement2>true</childElement2>
                </parentElement>
                <childElement1>valuec</childElement1>
                </rootElement>

                And you want to find the document where where a parentElement has a childElement1 with a value of 'valuec'.

                A search like

                cts:search (/,
                    cts:element-value-query(xs:QName('childElement1'), 'valuec', 'exact')
                )

                will give you the above document, but doesn't consider where the childElement1 value is. This isn't what you want. Search queries perform matching per fragment, so there is no constraint that childElement1 be in any particular spot in the fragment.

                Wrapping a cts:element-query around a subquery will constrain the subquery to exist within an instance of the named element. Therefore,

                cts:search (/,
                    cts:element-query (
                        xs:QName ('parentElement'),
                        cts:element-value-query(xs:QName('childElement1'), 'valuec', 'exact')
                    )
                )

                will not return the above document since there is no childElement1 with a value of 'valuec' inside a parentElement.

                This applies to more-complicated subqueries too. For example, looking for a document that has a childElement1 with a value of 'valuea' AND a childElement2 with a value of 'true' as

                cts:search (/, 
                    cts:and-query ((
                        cts:element-value-query(xs:QName('childElement1'), 'valuea', 'exact'),
                        cts:element-value-query(xs:QName('childElement2'), 'true', 'exact')
                    ))
                )

                will return the above document. But you may want these two child element-values both inside the same parentElement. This can be accomplished with

                cts:search (/, 
                    cts:element-query (
                        xs:QName ('parentElement'),
                        cts:and-query ((
                            cts:element-value-query(xs:QName('childElement1'), 'valuea', 'exact'),
                            cts:element-value-query(xs:QName('childElement2'), 'true', 'exact')
                        ))
                    )
                )

                This should give you expected results, as it won't return the above document since the two child element-value queries do not match inside the same parentElement instance.

                Filtering and indexes

                Investigating a bit further, if you run the query with xdmp:query-meters you will see (depending on your database settings) 

                    <qm:filter-hits>0</qm:filter-hits>
                    <qm:filter-misses>1</qm:filter-misses>

                What is happening is that the query can only determine from the current indexes that there is a fragment with a parentElement, and a childElement1 with a value of 'valuea', and a childElement2 with a value of 'true'. Then, after retrieving the document and filtering, it finds that the document is not a complete match and so does not return it (thus filter-misses = 1).

                (To learn more about filtering, refer to Understanding the Search Process section in our Query Performance and Tuning Guide.)

                At scale you may find this filtering slow, or the query may hit Expanded Tree Cache limits if it retrieves many false positives to filter through.

                If you have the correct positions enabled, the indexes can resolve this query without retrieving the document and filtering. In this case, after setting both

                element-word-positions

                and

                element-value-positions

                to true on the database and reindexing, xdmp:query-meters now shows

                <qm:filter-hits>0</qm:filter-hits>
                <qm:filter-misses>0</qm:filter-misses>

                (To track element-value-queries inside element-queries you need element-word-positions and element-value-positions enabled. The former is for element-query and the latter is for element-value-query.)

                Now this query can be run without filtering. However, if you have a lot of relationship instances in a document, the calculations using positions can become quite expensive to compute.

                Position details

                Further details: Empty-element positions are problematic. Positions are word positions, and the position of an element is the word position of the first word after the element starts to the word position of the first word after the element ends. Positions of attributes are the positions of their element. If everything is an empty element, you have no words and everything has the same position and so positions cannot discriminate between elements.

                Reindexing

                Note that if you change these settings you will need to reindex your database, and the usual tradeoffs apply (larger indexes and slower indexing). Please see the following for guidance on adding an index and reindexing in general:

                See also:

                Reindexing impact
                Adding an index in production

                Summary

                This article explains why you may encounter Cross-Site Request Forgery (CSRF) error (SECURITY-BADREQUEST) when using MarkLogic Server's Query Console application and how the issue can be resolved.

                Details

                Since the 8.0-6 release of MarkLogic Server, the security of Query Console is increased. Every time you load the application in the browser, there is a handshake between the browser and server, generating a secure CSRF token for the logged in user. This pairs the client with the server, allowing for secure communication. If another person logs into Query Console as the same user, their browser will perform another handshake, generating a new token and storing it on the server for that user. The other user whom was previously paired with the server will now have the wrong token and will see that CSRF error when performing any actions in the app that make a request to the server, until they refresh.

                MarkLogic is implementing the industry standard recommendation for CSRF. At this time, there is no option to disable this security feature.

                Best Practice

                Best practice would be to create a new user on MarkLogic Server for each person using the system. The "qconsole-user" role is enough to use the Query Console application. If they must be administrators, you can give them the "admin" role, but note that with this special role, the user will have the authority to perform any activity in MarkLogic Server, including adding or deleting users, adding or deleting documents, changing passwords, and so on.

                Further Reading

                Summary

                The Admin user bypasses all Security settings (roles/privileges) in the Security database and bypasses the document permissions in user databases. Any benchmark load test should be performed using a "real world" user account (i.e. not with the Admin user). 

                Treat Admin as a super user.

                MarkLogic treats the Admin user as a super user. When an Admin user executes a query, the query is not evaluated against any Security database settings and it bypasses all document permission checks (i.e. read, write, update) and Query privileges.

                When comparing performance of a query run by Admin user versus a non-Admin user, all other non-Admin user queries may show longer execution run times, depending upon how many roles the user inherits, the size of the security database, and on the nature of the query. You may not notice difference for an isolated single query executions, but when run under a large load, difference may be noticeable.

                Question: What is expected performance difference between Admin and non-Admin user.

                Each non Admin-user is different and will likely inherit different number of roles and have different permissions on documents. Hence security evaluation overhead for different user is different and should be tested for their specific environment and bench marked.

                Recommendation:

                Non-Admin Application user and roles should be part of any query development process, which will give good measure of the performance impact of the Security schema from initial phase.

                Further Reading

                Summary

                There is a limit to the number of registered queries held in the forest registry.  If your application does not account for that fact, you may get unexpected results. 

                Where is it?

                If a specific registered query is not found, then a cts:search operation with an invalid cts:registered-query throws an XDMP-UNREGISTERED exception. The XDMP-UNREGISTERED error occurs when a query could not be found in a forest query registry. If a query that had been previously registered can not be found, it may have been discarded automatically.  (In the most recent versions of MarkLogic Server at the time of this writing) The forest query registry only contains up to about 48,000 of the most recently used registered queries. If you register more than that, the least recently used ones get discarded.

                Recommendation

                To avoid registered queries being dropped, it’s a good idea to unregister queries when you know they aren’t needed any more.

                This not-too-technical article covers a number of questions about MarkLogic Server and its use of memory:

                • How MarkLogic uses memory;
                • Why you might need more memory;
                • When you might need more memory;
                • How you can add more memory.

                Let’s say you have an existing MarkLogic environment that’s running acceptably well.  You have made sure that it does not abuse the infrastructure on which it’s running.  It meets your SLA (maybe expressed as something like “99% of searches return within 2 seconds, with availability of 99.99%”).   Several things about your applications have helped achieve this success:

                As such, your application’s performance is largely determined by the number of disk accesses required to satisfy any given query.  Most of the processing involved is related to our major data structures:

                Fulfilling a query can involve tens, hundreds or even thousands of accesses to these data structures, which reside on disk in files within stand directories.   (The triple world especially tends to exhibit the greatest variability and computational burden.)

                Of course, MarkLogic is designed so that the great majority of these accesses do not need to access the on-disk structures.  Instead, the server caches termlists, range indexes, triples, etc. which are kept in RAM in the following places:

                • termlists are cached in the List Cache, which is allocated at startup time (according to values found in config files) and managed by MarkLogic Server.  When a termlist is needed, the cache is first consulted to see whether the termlist in question is present.   If so, no disk access is required.  Otherwise, the termlist is read from disk involving files in the stand such as ListIndex and ListData.
                • range indexes are held in memory-mapped areas of RAM and managed by the operating system’s virtual memory management system.  MarkLogic allocates the space for the in-memory version of the range index, causes the file to be loaded in (either on-demand or via pre-load option), and thereafter treats it as an in-memory array structure.  Any re-reading of previously paged-out data is performed transparently by the OS.  Needless to say, this last activity slows down operation of the server and should be kept to a minimum. 

                One key notion to keep in mind is that the in-memory operations (the “hit” cases above) operate at speeds of about a microsecond or so of computation.  The go-to-disk penalty (the “miss” cases) cost at least one disk access which takes a handful of milliseconds plus even more computation than a hit case.  This represents a difference on the order of 10,000 times slower. 

                Nonetheless, you are running acceptably.  Your business is succeeding and growing.  However, there are a number of forces stealthily working against your enterprise continuing in this happy state. 

                • Your database is getting larger (more and perhaps larger documents).
                • More users are accessing your applications.
                • Your applications are gaining new or expanded capabilities.
                • Your software is being updated on a regular basis.
                • You are thinking about new operational procedures (e.g. encryption).

                In the best of all worlds, you have been measuring your system diligently and can sense when your response time is starting to degrade.  In the worst of all worlds, you perform some kind of operational / application / server / operating system upgrade and performance falls off a cliff.

                Let’s look under the hood and see how pressure is building on your infrastructure.  Specifically, let’s look at consumption of memory and effectiveness of the key caching structures in the server.

                Recall that the response time of a MarkLogic application is driven predominantly by how many disk operations are needed to complete a query.  This, in turn, is driven by how many termlist and range index requests are initiated by the application through MarkLogic Server and how many of those do not “hit” in the List Cache and in-memory Range Indexes.  Each one of those “misses” generates disk activity, as well as a significant amount of additional computation.

                All the forces listed above contribute to decreasing cache efficiency, in large part because they all use more RAM.  A fixed size cache can hold only a fraction of the on-disk structure that it attempts to optimize.  If the on-disk size keeps growing (a good thing, right?) then the existing cache will be less effective at satisfying requests.  If more users are accessing the system, they will ask in total for a wider range of data.  As applications are enriched, new on-disk structures will be needed (additional range indexes, additional index types, etc.)  And when did any software upgrade use LESS memory?

                There’s a caching concept from the early days of modern computing (the Sixties, before many of you were born) called “folding ratio”.  You take the total size of a data structure and divide it by the size of the “cache” that sits in front of it.  This yields a dimensionless number that serves as a rough indicator of cache efficiency (and lets you track changes to it).   A way to compute this for your environment is to take the total on-disk size of your database and divide it by the total amount of RAM in your cluster.  Let’s say each of your nodes has 128GB of RAM and 10 disks of about 1TB each that are about half full.  So, the folding ratio of each node of (the shared-nothing approach of MarkLogic allows us to consider each node individually) this configuration at this moment is (10 x 1TB x 50%) / 128GB or about 40 to 1.

                This number by itself is neither good nor bad.  It’s just a way to track changes in load.  As the ratio gets larger, cache hit ratio will decrease (or, more to the point, the cache miss ratio will increase) and response time will grow.   Remember, the difference between a hit ratio of 98% versus a hit ratio of 92% (both seem pretty good, you say) is a factor of four in resulting disk accesses!  That’s because one is a 2% miss ratio and the other is an 8% miss ratio.

                Consider the guidelines that MarkLogic provides regarding provisioning: 2 VCPUs and 8GB RAM to support a primary forest that is being updated and queried.  The maximum recommended size of a single forest is about 400 GB, so the folding ratio of such a forest is 400GB / 8GB or about 50 to 1.  This suggests that the configuration outlined a couple of paragraphs back is at about 80% of capacity.  It would be time to think about growing RAM before too long.  What will happen if you delay?

                Since MarkLogic is a shared-nothing architecture, the caches on any given node will behave independently from those on the other nodes.  Each node will therefore exhibit its own measure of cache efficiency.  Since a distributed system operates at the speed of its slowest component, it is likely that the node with the most misses will govern the response time of the cluster as a whole.

                At some point, response time degradation will become noticeable and it will become time to remedy the situation.  The miss ratios on your List Cache and your page-in rate for your Range Indexes will grow to the point at which your SLA might no longer be met. 

                Many installations get surprised by the rapidity of this degradation.  But recall, the various forces mentioned above are all happening in parallel, and their effect is compounding.  The load on your caches will grow more than linearly over time.  So be vigilant and measure, measure, and measure!

                In the best of all possible worlds, you have a test system that mirrors your production environment that exhibits this behavior in advance of production.  One approach is to experiment with reducing the memory on the test system by, say, configuring VMs for a given memory size (and adjusting huge pages and cache sizes proportionately) to see where things degrade unacceptably.  You could measure:

                • Response time: where does it degrade by 2x, say?
                • List cache miss ratio: at what point does it double, say?
                • Page-in rate: at what point does increase by 2x, say?

                When you find the memory size at which things degraded unacceptably, use that to project the largest folding ratio that your workload can tolerate.  Or you can be a bit clever and do the same additional calculations for ListCache and Anonymous memory:

                • Compute the sum of the sizes of all ListIndex + ListData files in all stands and divide by size of ListCache.  This gives the folding ratio for this host of the termlist world.
                • Similarly, compute the sum of the sizes of all RangeIndex files and divide by the size of anonymous memory.  This gives the folding ratio for the range index world on this host.  This is where encryption can bite you.  At least for a period of time, both the encrypted and the un-encrypted versions of a range index must be present in memory.  This effectively doubles your folding ratio and can send you over the edge in a hurry.  [Note: depending on your application, there may be additional in-memory derivatives of range indexes built to optimize for facets, sorting of results, … all taking up additional RAM.]

                [To be fair, on occasion a resource other than RAM can become oversubscribed (beyond the scope of this discussion):

                • IOPs and I/O bandwidth (both at the host and storage level);
                • Disk capacity (too full leads to slowness on some storage devices, or to inability to merge);
                • Wide-area network bandwidth / latency / consistency (causes DR to push back and stall primary);
                • CPU saturation (this is rare for traditional search-style applications, but showing up more in the world of SQL, SPARQL and Optic, often accompanied by memory pressure due to very large Join Tables.  Check your query plans!);
                • Intra-cluster network bandwidth (both at host and switch/backbone layer, also rare)].

                Alternatively, you may know you need to add RAM because you have an emergency on your hands: you observe that MarkLogic is issuing Low Memory warnings, you have evidence of heavy swap usage, your performance is often abysmal, and/or the operating system’s OOM (out of memory) killer is often taking down your MarkLogic instance.  It is important to pay attention to the warnings that MarkLogic issues, above and beyond any that come from the OS. 

                You need to tune your queries so as to avoid bad practices (see the discussion in the beginning of this article) that waste memory and other resources, and almost certainly add RAM to your installation.   The tuning exercise can be labor-intensive and time-consuming; it is often best to throw lots of RAM at the problem to get past the emergency at hand.

                So, how to add more RAM to your cluster?  There are three distinct techniques:

                • Scale vertically:  Just add more RAM to the hosts you already have.
                • Scale horizontally:  Add more nodes to your cluster and re-distribute the data
                • Scale functionally:  Convert your existing e/d-nodes into d-nodes and add new e-nodes

                Each of these options has its pros and cons.   Various considerations:

                • Granularity:   Say you want to increase RAM by 20%.  Is there an option to do just this?
                • Scope:  Do you upgrade all nodes?  Upgrade some nodes?   Add additional nodes?
                • Cost:  Will there be unanticipated costs beyond just adding RAM (or nodes)?
                • Operational impact:  What downtime is needed?  Will you need to re-balance?
                • Timeliness: How can you get back to acceptable operation as quickly as possible?

                Option 1: Scale Vertically

                On the surface, this is the simplest way to go.  Adding more RAM to each node requires upgrading all nodes.  If you already have separate e- and d-nodes, then it is likely that just the d-nodes should get the increased RAM.

                In an on-prem (or, more properly, non-cloud) environment this is a bunch of procurement and IT work.  In the worst case, your RAM is already maxed out so scaling vertically is not an option.

                In a cloud deployment, the cloud provider dictates what options you have.  Adding RAM may drag along additional CPUs to all nodes also, which requires added MarkLogic licenses as well as larger payment to the cloud provider.  The increased RAM tends to come in big chunks (only 1.5x or 2x options).  It’s generally not easy to get just the 20% more RAM (say) that you want.  But this may be premature cost optimization; it may be best just to add heaps of RAM, stabilize the situation, and then scale RAM back as feasible.  Once you are past the emergency, you should begin to implement longer-term strategies.

                This approach also does not add any network bandwidth, storage bandwidth and capacity in most cases, and runs the small risk of just moving the bottleneck away from RAM and onto something else.

                Option 2: Scale Horizontally

                This approach adds whole nodes to the existing complex.  It has the net effect of adding RAM, CPU, bandwidth and capacity.   It requires added licenses, and payment to the cloud provider (or a capital procurement if on-prem).  The granularity of expansion can be controlled; if you have an existing cluster of (2n+1) nodes, the smallest increment that makes sense in an HA context is 2 more nodes (to preserve quorum determination) giving (2n+3) nodes.  In order to make use of the RAM in the new nodes, rebalancing will be required.  When the rebalancing is complete, the new RAM will be utilized.

                This option tends to be optimal in terms of granularity, especially in already larger clusters.  To add 20% of aggregate RAM to a 25-node cluster, you would add 6 nodes to make a 31-node cluster (maintaining the odd number of nodes for HA).  You would be adding 24%, which is better than having to add 50% if you had to scale all 25 nodes by 50% because that was what your cloud provider offered.

                Option 3: Scale Functionally

                Scaling functionally means adding new nodes as e-nodes to cluster and reconfiguring existing e/d-nodes to be d-nodes.  This frees up RAM on the d-side (specifically by dramatically reducing the need for Expanded Tree Cache and memory for query evaluation) which will go towards restoring good folding ratio.  Recent experience says about 15% of RAM could be affected in this manner.

                More licenses are again required, plus installation and admin work to reconfigure the cluster.  You need to make sure that network can handle increases in XDMP traffic from e-nodes to d-nodes, but this is not typically a problem.  The resulting cluster tends to run more predictably.  One of our largest production clusters typically runs its d-nodes at nearly 95% memory usage as reported by MarkLogic as the first number in an error log line.  It can get away with running so full because it is a classical search application whose d-node RAM usage does not fluctuate much.  Memory usage on e-nodes is a different story, especially when the application uses SQL or Optic.  In such a situation, on-demand allocation of large Join Tables can cause abrupt increase in memory usage. That’s why our advice on combined e/d nodes is to run below 80% to allow for query processing.

                Thereafter, the two groups of nodes can be scaled independently depending on how the workload evolves.

                Here are a few key takeaways from this discussion:

                • Measure performance when it is acceptable, not just when it is poor.
                • Do whatever it takes to stabilize in an emergency situation.
                • Correlate metrics with acceptable / marginal performance to determine a usable folding ratio.
                • If you have to make a guess, try to achieve no worse than a 50:1 ratio and go from there.
                • Measure and project the growth rate of your database.
                • Figure out how much RAM needs to be added to accommodate projected growth.
                • Test this hypothesis if you can on your performance cluster.

                Range indexes and invalid values

                We will discuss range index type casting and the behavior based the invalid-values setting.

                Casting values

                We can cast a string to an unsignedLong as
                xs:unsignedLong('4235234')
                and the return is 4235234 as an unsignedLong.  However, if we try
                xs:unsignedLong('4235234x')
                it returns an error
                XDMP-CAST: (err:FORG0001) xs:unsignedLong("4235234x") -- Invalid cast: "4235234x" cast as xs:unsignedLong
                Similarly,
                xs:unsignedLong('')
                returns an error
                XDMP-CAST: (err:FORG0001) xs:unsignedLong("") -- Invalid cast: "" cast as xs:unsignedLong
                This same situation can arise when a document contains invalid values.  The invalid-values setting on the range index determines what happens in the case of a value that can't be cast to the type of the range index.

                Range indexes---values and types

                Understanding Range Indexes discusses range indexes in general, and Defining Element Range Indexes discusses typed values.
                Regarding the invalid-values parameter of a range index:
                In the invalid values field, choose whether to allow insertion of documents that contain elements or JSON properties on which range index is configured, but the value of those elements cannot be coerced to the index data type. You can choose either ignore or reject. By default, the server rejects insertion of such documents. However, if you choose ignore, these documents can be inserted. This setting does not change the behavior of queries on invalid values after documents are inserted into the database. Performing an operation on an invalid value at query time can still result in an error.

                Behavior with invalid values

                Create a range index

                First, create a range index of type unsignedLong on the id element in the Document database:
                import module namespace admin = "http://marklogic.com/xdmp/admin"
                    at "/MarkLogic/admin.xqy";
                let $config := admin:get-configuration()
                let $dbid := xdmp:database('Documents')
                let $rangespec := admin:database-range-element-index('unsignedLong', '', 'id', (), fn:false())
                return
                     admin:save-configuration (admin:database-add-range-element-index($config, $dbid, $rangespec))

                Insert a document with a valid id value

                We can insert a document with a valid value:
                xdmp:document-insert ('test.xml', <doc><id>4235234</id></doc>)
                Now if we check the values in the index as
                cts:values (cts:element-reference (xs:QName ('id')))
                we get the value 4235234 with type unsignedLongWe can search for the document with that value as
                cts:search (/, cts:element-range-query (xs:QName ('id'), '=', 4235234), 'filtered')
                and the document is correctly returned.

                Insert a document with a invalid id value

                With the range index still set to reject invalid values, we can try to insert a document with a bad value
                xdmp:document-insert ('test.xml', <doc><id>4235234x</id></doc>)
                That gives an error as expected:
                XDMP-RANGEINDEX: xdmp:eval("xquery version &quot;1.0-ml&quot;;&#10;xdmp:document-insert ('te...", (), <options xmlns="xdmp:eval"><database>16363513930830498097</database>...</options>) -- Range index error: unsignedLong fn:doc("test.xml")/doc/id: XDMP-LEXVAL: Invalid lexical value "4235234x"

                and the document is not inserted.

                Setting invalid-values to ignore and inserting an invalid value

                Now we use the Admin UI to set the invalid-values setting on the range index to ignore.  Inserting a document with a bad value as
                xdmp:document-insert ('test.xml', <doc><id>4235234x</id></doc>)
                now succeeds.  But remember, as mentioned above, "... if you choose ignore, these documents can be inserted. This setting does not change the behavior of queries on invalid values after documents are inserted into the database. Performing an operation on an invalid value at query time can still result in an error."

                Values.  Checking the values in the index 

                cts:values (cts:element-reference (xs:QName ('id')))
                does not return anything.
                Unfiltered search.  Searching unfiltered for a value of 7 as
                cts:search (/, cts:element-range-query (xs:QName ('id'), '=', xs:unsignedLong (7)), 'unfiltered')
                returns our document (<doc><id>4235234x</id></doc>).  This is a false positive.  When you insert document with an invalid value, that document is returned for any search using the index.
                Filtered search.  We can search filtered for a value of 7 to see if the false positive can be removed from the results:
                cts:search (/, cts:element-range-query (xs:QName ('id'), '=', xs:unsignedLong (7)), 'filtered')
                throws an error 

                XDMP-CAST: (err:FORG0001) cts:search(fn:collection(), cts:element-range-query(fn:QName("","id"), "=", xs:unsignedLong("7")), "filtered") -- Invalid cast: xs:untypedAtomic("4235234x") cast as xs:unsignedLong

                That's because when the document is used in filtering, the invalid value is cast to match the query and it throws an error as in the earlier cast test.

                Adding a new index and reindexing

                If you have documents already in the database, and add an index, the reindexer will automatically reindex the documents.

                If there are invalid values for one of your indexes index then the reindexer will reindex the document but will issue a Debug-level message about the problem:

                2023-06-26 16:44:28.646 Debug: IndexerEnv::putRangeIndex: XDMP-RANGEINDEX: Range index error: unsignedLong fn:doc("/test.xml")/doc/id: XDMP-LEXVAL: Invalid lexical value "4235234x"

                The reindexer will not reject or delete the document.  You can use this URI given to find the document and correct the issue.  

                Finding documents with invalid values

                Since documents with invalid values always are returned by searches, you can use this to find the documents by doing an and-query of two searches that are normally mutually exclusive.  For the document with the invalid value, 

                cts:uris ((), (),
                    cts:and-query ((
                        cts:element-range-query (xs:QName ('id'), '=', 7),
                        cts:element-range-query (xs:QName ('id'), '=', 8)
                    ))
                )

                returns /test.xml.

                Summary

                If your index settings have a very large number of range indexes specified (on the order of thousands or even tens of thousands), you may find your MarkLogic Server instance returning a message saying that it "Cannot allocate memory" - even when your OS monitoring metrics indicate that there appears to be plenty of unused RAM.

                XDMP-FORESTERR: Error in startup of forest: SVC-MAPINI: Mapped file initialization error: mmap: Cannot allocate memory

                Detail

                The issue is not how much memory a system has, but how it's being used. In the interests of performance, MarkLogic Server indexes your content upon ingestion to the system, then memory maps those indexes to serialized data structures on disk. While it's true that each of those memory maps requires some amount of RAM, if you've got thousands of indexes and system monitoring is reporting RAM to spare, then you might be running up against Linux's default vm.max_map_count value.

                While it's possible to get past this issue by simply increasing the vm.max_map_count limit, you should seriously consider revisiting your index usage towards the use of Template Driven Extraction (TDE), as 1) it's likely the current indexing scheme could be replaced by a different one that uses far fewer indexes and 2) when your configuration exceeds on the order of 100 or so range indexes, you'll likely need to take special care to size and manage your topology so that you don’t run out of system resources, as well as potentially make configuration changes to the linux kernel on the d-nodes to which the relevant forests are assigned.

                ---

                Additional Reading:

                • "The index behind TDE views is the triple index, not range indexes. We heard that people hit a memory limit when they created many columns (and therefore many range indexes) in the same database – the limit was hundreds, and people wanted to create tens of thousands. Now there is no effective limit to the number of columns you can create in a database." (https://developer.marklogic.com/learn/tde-faq/)
                • "If you find yourself planning to make thousands (or even hundreds) of range indexes, it’s probably worth stepping back and rethinking about how the data will be represented." (https://www.marklogic.com/blog/10000-range-indexes/)

                Introduction

                Seeing too many "stand limit" messages in your logs frequently? This article explains what this message means to your application and what actions should you take.

                 

                What are Stands and how their numbers can increase?

                A stand holds a subset of the forest data and exists as a physical subdirectory under the forest directory. This directory contains a set of compressed binary files with names like TreeData, IndexData, Frequencies, Qualities, and such. This is where the actual compressed XMLdata (in TreeData) and indexes (in IndexData) can be found.

                At any given time, a forest can have multiple stands. To keep the number of stands to a manageable level MarkLogic runs merges in the background. A merge takes some of the stands on disk and creates a new singular stand out of them, coalescing and optimizing the indexes and data, as well as removing any previously deleted fragments.

                MarkLogic Server has a fixed limit for the maximum number of stands (64). When that limit is reached you will no longer be able to update your system. While MarkLogic automatically manage merges and it is unlikely to reach this limit, there are few configurations under user control that may impact merges and you may see this issue. e.g.

                1.) You can manage merges using Merge Policy Controls. e.g. setting a low merge max size would stop merges beyond the configured size and hence the overall number of stands would keep growing.

                2.) Low value of background-io-limit would mean less amount of I/O for background tasks such as merges. This may also adversely affect the merge rate and hence the number of stands may grow.

                3.) Low in-memory settings not keeping up with an aggressive data load. e.g. If you are bulk loading large documents and have low in memory tree size then stands may accumulate and reach the hard limit.

                 

                What you can do to keep the number of stands within manageable limit?

                While MarkLogic automatically manage merges to keep the number of stands at a manageable level, it adds WARNING entry to the logs when it sees the number of stands growing alarmingly! e.g. Warning: Forest XXXXX is at 92% of stand limit

                If you see such messages in your logs, you should take some action as reaching the hard limit of 64 would mean you will no longer be able to update your system.

                Here's what you can check and do to lower the number of stands.

                1.) If you have configured merge policy controls then check if they actually match with your application usage. You could change the required settings as needed. For instance:

                2.) There should be no merge blackouts during ingestion, or any time there is heavy updating of your content.

                3.) Beginning with MarkLogic version 7, the server is able to manage merges with less free space required on your drives (1.5 times the size of your content). This is accomplished by setting the merge max size to 32768 (32GB). Although this does create more stands, this is OK on newer systems, since the server is able to use extra CPU cores in parallel.

                2.) If you have configured background-io-limit then check if that is sufficient for your application usage. If needed, increase the value so that merges can make use of more IO. You should only use this setting on systems that have limited disk IO. In general you want to first set it to 200, and if the disk IO seems to still be overwhelmed, set it to 150 and so on. A setting of 1oo may be too low for systems that are doing ingestion, since the merge process needs to be able to keep up with stand creation.

                3.) If you are performing bulk loads then check if the in-memory settings are suffificient and can be increased. If needed, increase the required value so that in-memory stands (and as a result on-disk stands) accomodate more data and thereby decreases the number of stands. If you do grow the in-memory caches, make sure to grow the database journal files by a corresponding amount. This will insure that a single large transaction will be able to fit in the journals.

                 

                Conclusion 

                If you decide to control MarkLogic's merge process, you should monitor the system for any adverse effect that it may cause and take actions accordingly. MarkLogic Server continuously assesses the state of each database and the default merge settings and the dynamic nature of merges will keep the database tuned optimally at all times. So if you are unsure - let MarkLogic handle the merges for you!

                Introduction

                This article presents the steps to create a Read only Access User and a full access user to a Webdav Server.

                Details

                For read-only WebDAV access you can connect to WebDAV using the credentials of a user who does not have the rights to insert/update documents. This can be accomplished by creating a user and assigning roles to them through steps given below.

                1. If one does not already exist, create a WebDAV server (Instructions available in the MarkLogic Server Administrators Guide)

                • leave default user to "nobody", and 
                • leave required privilege empty

                2. Create a role - for the purpose of these instructions, call the new role "Read_only_Access" 

                • After you have entered a name for the new role (Read-Only-Access),  refresh the page and scroll to the "Default Permissions" section near the end of the page. The default permissions section will allow you to assign a capability to a particular role. In this case, we would select the "Read-Only-Access" role from the role drop down as well as the "read" capability.

                3. Create a user and grant that user the "Read_only_Access" role.

                4. Create another role - for the purpose of these instructions, call the new role "Write_only_Access"

                • After you have entered a name for the new role (Write_only_Access), you can refresh the page and scroll to the "Default Permissions" section near the end of the page. The default permissions section will allow you to assign a capability to a particular role. In this case, we would select the "Write_only_Access" role from the role drop down as well as the "read", "insert","execute" and "update"capabilities.

                5. Create another user and grant that user the "Write_only_Access" role.

                6. Set permission on the "/" directory so the "Read_only_Access" / "Write_only_Access" role can view/make changes respectivley.  This can also be accomplished by code as well.

                   xdmp:document-add-permissions("/",xdmp:permission("Read_only_Access","read"))

                  xdmp:document-add-permissions("/",xdmp:permission("Write_only_Access",("read", "insert","execute","update"))

                7. When you connect to a WebDAV client, both user will be able to view the root "/" directory, but cannot create files or folders. For this you will need to create a URI privilege for the "/" URI and add the  "Write_only_Access" role.

                Now the "Read_only" user can read those documents, and the "Write_only" user can both read and update the documents.

                Existing Documents

                While the user just created will have expected access to all the new documents, for previously existing documents in the database you will need to add the read permission to the documents contained in your database. This can be accomplished with xdmp:document-add-permission().

                For example:
                    xdmp:document-add-permissions("/example.xml", xdmp:permission("Read_only_Access", "read"))

                MarkLogic Documentation

                For more details on how to manage security. please refer to the Security Administration section of our Administrators Guide.

                 

                 

                 

                 

                Overview

                Update transactions run with readers/writers locks, obtaining locks as needed for documents accessed in the transaction. Because update transactions only obtain locks as needed, update statements always see the latest version of a document. The view is still consistent for any given document from the time the document is locked. Once a document is locked, any update statements in other transactions wait for the lock to be released before updating the document.

                Read only query transactions run at a particular system timestamp, instead of acquiring locks, and have a read-consistent view of the database. That is, the query transaction runs at a point in time where all documents are in a consistent state.

                The system timestamp is a number maintained by MarkLogic Server that increases every time a change or a set of changes occurs in any of the databases in a system (including configuration changes from any host in a cluster). Each fragment stored in a database has system timestamps associated with it to determine the range of timestamps during which the fragment is valid.

                On a clustered system where there are multiple hosts, the timestamps need to be coordinated accross all hosts. Marklogic Server does this by passing the timestamp in every message communicated between hosts of the cluster, including the heartbeat message. Typically, the message carries two important pieces of information:

                • The origin host id
                • The precise time on the host at the time that heartbeat took place

                In addition to the heartbeat information, the "Label" file for each forest in the database is written as changes are made. The Label file also contains timestamp information; this is what each host uses to ascertain the current "view" of the data at a given moment in time. This technique is what allows queries to be executed at a 'point in time' to give insight into the data within a forest at that moment.

                You can learn more about transactions in MarkLogic Server by reading the Understanding Transactions in MarkLogic Server section of the MarkLogic Server Application Developers Guide.

                The distribute timestamps option on Application Server can specify how the latest timestamp is distributed after updates. This affects performance of updates and the timeliness of read-after-write query results from other hosts in the group.

                When set to fast, updates return as quickly as possible. No special timestamp notification messages are broadcasted to other hosts. Instead, timestamps are distributed to other hosts when any other message is sent. The maximum amount of time that could pass before other hosts see the update timestamp is one second, because a heartbeat message is sent to other hosts every second.

                When set to strict, updates immediately broadcast timestamp notification messages to every other host in the group. Updates do not return until their timestamp has been distributed. This ensures timeliness of read-after-write query results from other hosts in the group.

                When set to cluster, updates immediately broadcast timestamp notification messages to every other host in the cluster. Updates do not return until their timestamp has been distributed. This ensures timeliness of read-after-write query results from any host in the cluster, so requests made to any app server on any host in the cluster will see immediately consistent results.

                The default value for "distribute timestamps" option is fast. The remainder of this article is applicable when fast mode is used.

                Read after Write in Fast Mode

                We will look at the different scenario for the case where a read occurs in a transaction immediately following an update transaction.

                • If the read transaction is executed against an application server on the same node of the cluster (or any node that participated in the update) then the read will execute at a timestamp equal to or greater than the time that the update occurred.
                • If the read is executed in the context of an update transaction, then, by acquiring locks, the view of the documents will be the latest version of the documents.
                • If the read is executed in a query transaction, then the query will execute at the latest timestamp that the host on which it was executed is aware of. Although this will always produce a transactionally consistent view of the database, it may not return the latest updates. The remainder of this article addresses this case.

                Consider the following code:

                The above example performs the following steps:

                • Instantiates two XCC ContentSource Objects - each connecting to a different host in the cluster.
                • Establishes a short loop (which runs the enclosed steps 10 times)
                  • Creates a unique UUID which is used as a URI for the Document
                  • Establishes a session with the first host in the cluster and performs he following:
                    • Gets the timestamp (session.getCurrentServerPointInTime()) and writes it out to the console / stdout
                    • Inserts a simple, single element () as a document-node into a given database
                    • Gets the timestamp again and writes it out to the console / stdout
                  • The session with the first host is then closed. A new session is established with the second host and the following steps are performed:
                    • Gets the timestamp at the start of the session and writes it out to the console / stdout
                    • An attempt is made to retrieve the document which was just inserted
                  • On success the second session will be closed.
                  • If the document could not be read successfully, an immediate retry attempt follows thereafter - which will result a successful retrieval.

                Running this test will yield one of two results for each iteration of the loop:

                Query Transaction at Timestamp that includes Update

                Most of the time, you will find that the timestamps will be in lockstep with the host before - note that there is no time difference between the output from getCurrentServerPointInTime() after the document has been inserted and before the attempt is made to retrieve the document from the connection to the second host in the cluster.

                ----------------- START OF INSERT / READ CYCLE (1) -----------------
                First host timestamp before document is inserted: 	13673327800295300
                First host timestamp after document is inserted: 	13673328229180040
                Second host timestamp before document is read: 	13673328229180040
                ------------------ END OF INSERT / READ CYCLE (1) ------------------

                However, you may also see this:

                ----------------- START OF INSERT / READ CYCLE (10) -----------------
                First host timestamp before document is inserted: 	13673328311216780
                First host timestamp after document is inserted: 	13673328322546380
                Second host timestamp before document is read: 	13673328311216780
                ------------------ END OF INSERT / READ CYCLE (10) ------------------

                Note that on this run, the timestamps are out of sync; at the point where getCurrentServerPointInTime() is called, the timestamp for the second connection is at that point just before the document is inserted.

                Yet this also returns results that include the updates; in the interval between the timestamp being written to the console and the construction and submission of the newAdhocQuery(), the document has become available and was successfully retrieved during the read process.

                The path with an immediate retry

                Now let's explore what happens when the read only query transaction runs at a point in time that does not include the updates:

                ----------------- START OF INSERT / READ CYCLE (2) -----------------
                First host timestamp before document is inserted: 	13673328229180040
                First host timestamp after document is inserted: 	13673328240679460
                Second host timestamp before document is read: 		13673328229180040
                WARNING: Immediate read failed; performing an immediate retry
                Second host timestamp for read retry: 		13673328240679460
                Result Sequence below:
                <?xml version="1.0" encoding="UTF-8"?>
                <ok/>
                ------------------ END OF INSERT / READ CYCLE (2) ------------------

                Note that on this occasion, we see an outcome that starts much like the previous example; the timestamps mismatch and we see that we've hit the point in the code where our validation of the response fails.

                Also note that the timestamp at the point where the retry takes place is now back in step; from this, we can see that the document should be available even before the retry request is executed. Under these conditions, the response (the result) is also written to stdout so we can be sure the document was available on this attempt.

                Multi Version Concurrency Control

                In order to gurarantee that the "holistic" view of the data is current and available in a read only query transaction across each host in the cluster, two things need to take place:

                • All forests need to be up-to-date and all pending transactions need to be committed.
                • Each host must be in complete agreement as to the 'last known good' (safest) timestamp from which the query can be allowed to take place.

                In all situations, to ensure a complete (and reliable) view of the data, the read only query transaction must take place at the lowest known timestamp across the cluster

                With every message between nodes in the cluster, the latest timestamp information is communicated across each host in the cluster - the first "failed" attempt to read the document necessitates communication between each host in the cluster - and by doing so, this action propagates a new "agreed" timestamp across every node in the cluster.

                It is because of this, the retry will always work; at the point where the immediate read after write fails, timestamp changes are propagated, and the new timestamp is now at a waypoint for the retry query to take place. This is why the single retry is always guaranteed to work.

                Context

                This KB article talks specifically about how the Rebalancer interacts with database replication, and how to solve the issues that may arise if not configured correctly.

                For a general discussion on how rebalancing works in MarkLogic, refer to this article and the server documentation.

                Rebalancing and Database Replication

                When database replication is configured for a database, rebalancing is disabled by default on the Replica database and no rebalancing will occur until the database replication configuration is deleted. Until the time when the primary is available, forest to forest mapping will remain.

                Note that the Replica databases must have at least as many forests as the Master database. Otherwise, not all of the data on the Master database will be replicated.

                It is important to make sure that the assignment policy on the Replica is the same as the Master - so that in a DR situation, when the Replica takes over as the Primary, rebalancing is not triggered.

                Forest order mismatch can cause Rebalancing

                Forest order is the order in which forests are attached to the database. When the document assignment policy is set to 'Segment', 'Legacy' or 'Bucket', it is required that the Replica database configuration should have the same forest order as the Master to ensure rebalancing does not occur if or when replication is deconfigured.

                If there is a difference in forest orders between the Master and the Replica, a Warning level message is logged on the Replica, which looks like this:

                2015-10-21 13:34:59.359 Warning: forest order mismatch: local forest Test_12 is at position 15 
                while foreign master forest 2108358988113530610 (cluster=8893136914265436826) is at position 12

                In this state, when database replication is deleted between the clusters, the database on the Replica cluster will start to rebalance right away and it could take variable amount of time depending on how many documents need to be rebalanced.

                Fixing the forest order:

                On clusters with database replication enabled and both Master and Replica databases in sync (document counts match and all primary forests on Replica db are in 'open replica' state), the following steps help in removing the mismatch and making the forest order same on both Master and Replica

                i. Make sure that both Master and Replica databases have the same rebalancer assignment policy.

                ii. Disable rebalancer and reindexer, if you have them enabled on both clusters for the database in question.

                iii. Obtain the forest order from the Master cluster - below is the query for an example database:

                xquery version "1.0-ml";

                (: Returns a list of forests in order for a given database :)

                import module namespace admin = "http://marklogic.com/xdmp/admin" at "/MarkLogic/admin.xqy";

                let $config := admin:get-configuration()
                let $dbid := admin:database-get-id($config, "content-db-master")

                return admin:database-get-attached-forests($config,$dbid) ! xdmp:forest-name(.)

                Example output for this query is

                content-forest-2, content-forest-1, content-forest-3

                iv. On the Replica cluster, reorder the forests according to the order returned on the Master from step iii:

                xquery version "1.0-ml";

                import module namespace admin = "http://marklogic.com/xdmp/admin" at "/MarkLogic/admin.xqy";
                let $config := admin:get-configuration()
                let $dbid := admin:database-get-id($config, "content-db-replica")
                let $forest-names-in-order := (
                "content-forest-2",
                "content-forest-1",
                "content-forest-3"
                )

                let $forest-ids := $forest-names-in-order ! xdmp:forest (.)
                let $config := admin:database-reorder-forests($config, $dbid, $forest-ids)
                return (
                'reordering to: ' || fn:string-join ($forest-names-in-order, ', '),
                admin:save-configuration($config)
                )

                v. Re-enable rebalancer and reindexer on both clusters, if you had them enabled previously.

                vi.Verify that the Warning messages on the Replica cluster do not appear anymore. (these messages are logged once every hour)

                Further Reading:

                Database Rebalancing

                Understanding what work the rebalancer will do

                Using the rebalancer to move the content in one forest to another location

                Checking database replication status

                On a MarkLogic 7 cluster or a MarkLogic 8 cluster that was previously upgraded from MarkLogic Server version 6, reindexing of the triple index does not always get triggered when the triple index is turned off. Reindexing is performed after turning off an index in order to reclaim space that the index was using.

                The workaround is to force a manual reindexing.

                Summary

                When used as a file system, GFS needs to be tuned for optimal performance with MarkLogic Server.

                Recommendations

                Specifically, we recommend tuning the demote_secs and statfs_fast parameters. The demote_secs parameter determines the amount of time GFS will wait before demoting a lock on a file that is not in use. (GFS uses a time-based locking system.) One of the ways that MarkLogic Server makes queries go fast is its use of memory mapped index files. When index files are stored on a GFS filesystem, locks on these memory-mapped files are demoted purely on the basis of demote_secs, regardless of use. This is because they are not accessed using a method that keeps the lock active -- the server interacts with the memory map, not direct access to the on-disk file.

                When a GFS lock is demoted, pages from the memory-mapped index files are removed from cache. When the server makes another request of the memory-mapped file, GFS must acquire another lock and the requested page(s) from the on-disk file must be read back into cache. The lock reacquisition process, as well as the I/O needed to load data from disk into cache, may causes noticeable performance degradation.

                Starting with MarkLogic Server 4.0-4, MarkLogic introduced an optimization for GFS. From that maintenance release forward, MarkLogic gets the status of its memory-maps files every hour, which results in the retention of the GFS locks on those files so that they do not get demoted. Therefore, it is important that demote_secs is equal to or greater than one hour. It is also recommended that the tuning parameter statfs_fast is set to "1" (true), which makes statfs on GFS faster.

                Using gfs_tool, you should be able to set the demote_secs and statfs_fast parameters to the following values:

                demote_secs 3600

                statfs_fast 1

                While we're discussin tuning a Linux filesystem, it is worth noting the following Linux tuning tips also:

                • Use the deadline elevator (aka I/O scheduler), rather than cfq, on all hosts in the cluster. This has been added to our installation requirements for RHEL. With RHEL-4, this requires the elevator=deadline option at boot time. With RHEL-5, this can be changed at any time via /sys/block/*/queue/scheduler
                • If you are running on a VM slice, then no-op I/O scheduler is recommended.
                • Set the following kernel tuning parameters:

                Edit /etc/sysctl.conf:

                vm.swappiness = 0

                vm.dirty_background_ratio=1

                vm.dirty_ratio=40

                Use sudo sysctl -f to apply these changes.

                • It is very important to have at least one journal per host that will mount the filesystem. If the number of hosts exceeds the number of journals, performance will suffer. It is, unfortunately, impossible to add more journals without rebuilding the entire filesystem, so be sure to set journals up for each host during your initial build.

                 

                Working with RedHat

                Should you run into GFS-related problems, running the following Script will provide all the information that you need in order to work with the Redhat Support Team:


                mkdir /tmp/debugfs

                mount -t debugfs none /tmp/debugfs

                mkdir /tmp/$(hostname)-hangdata

                cp -rf /tmp/debugfs/dlm/ /tmp/$(hostname)-hangdata

                cp -rf /tmp/debugfs/gfs2/ /tmp/$(hostname)-hangdata

                echo 1 > /proc/sys/kernel/sysrq 

                echo 't' > /proc/sysrq-trigger 

                sleep 60

                cp /var/log/messages /tmp/$(hostname)-hangdata/

                clustat > /tmp/$(hostname)-hangdata/clustat.out

                cman_tool services > /tmp/$(hostname)-hangdata/clustat.out

                mount -l > /tmp/$(hostname)-hangdata/mount-l.out

                ps aux > /tmp/$(hostname)-hangdata/ps-aux.out

                tar cjvf /tmp/$(hostname)-hangdata.tar.bz /tmp/$(hostname)-hangdata/

                umount /tmp/debugfs/

                rm -rf /tmp/debugfs

                rm -rf /tmp/$(hostname)-hangdata

                Introduction

                MarkLogic is supported on XFS filesystem. The minimum system requirements can be found here:

                https://developer.marklogic.com/products/marklogic-server/requirements-9.0

                The default mount options will generally give good performance, assuming the underlying hardware is capable enough in terms of IO performance and durability of writes, but if you can test your system adequately, you can consider different mount options.

                The values provided here are just general recommendations, if you wish to fine tune your storage performance, you need to ensure that you do adequate testing both with MarkLogic and low level tools such as fio:

                http://freecode.com/projects/fio

                1. I/O Schedulers

                Unless you have a directly connected single HDD or SSD, noop is usually the best choice, see here for more details:

                https://help.marklogic.com/Knowledgebase/Article/View/8/0/notes-on-io-schedulers

                2. XFS Mount options

                relatimeThe default atime behaviour is relatime, which has almost no overhead compared to noatime but still maintains sane atime values. All Linux filesystems use this as the default now (since around 2.6.30), but XFS has used relatime-like behaviour since 2006, so no-one should really need to ever use noatime on XFS for performance reasons.

                attr2 This options enables an "opportunistic" improvement to be made in the way inline extended attributes are stored on-disk. It's the default and should be kept as such in most scenarios.

                inode64 - to sum up this allows xfs to create nodes anywhere and not worry about backwards compatibility, which should result in better scalability. See here for more information: https://access.redhat.com/solutions/67091

                sunit=x,swidth=y XFS allows you to specify RAID settings. This enables the file system to optimize its read and write access for RAID alignment, e.g. by committing data as complete stripe sets for maximum throughput. These RAID optimizations can significantly improve performance, but only if your partition is properly aligned or of you are avoiding misalignment by creating the xfs on a device without partitions. 

                largeio, swalloc - these are intended to further optimize streaming performance on RAID storage. You need to do your own testing.

                isize=512 - XFS allow inlinings of data into inodes to avoid the need for additional blocks and the corresponding expensive extra disk seeks for directories. In order to use this efficiently, the inode size should be increased to 512 bytes or larger.

                allocsize=131072k (or larger) XFS can be tuned to a fixed allocation size, for optimal streaming write throughput. This setting could have a significant impact on the interim space usage in systems with many parallel write and create operations.

                As with any advice of this nature, we strongly advise that you always do your own testing to ensure that options you choose are stable and reliable for your workload.

                Summary

                The XDMP-LABELBADMAGIC error appears when attempting to mount a forest with a corrupted or zero length Label file.  This article identifies a potential cause and provides the steps required to work around this issue.

                Details

                The XDMP-LABELBADMAGIC error is often seen on systems where the server was running out of disk space.  If there is no space for MarkLogic Server to write the forest's Label file, a zero length Label file may result. The side effect of that would be the XDMP-LABELBADMAGIC error.

                Below is an example showing how this error might appear in ErrorLog.txt when the Triggers forest has a zero length Label file.

                2013-03-21 13:02:11.835 Alert: XDMP-FORESTERR: Error in mount of forest Triggers: XDMP-LABELBADMAGIC: Bad forest label magic number: 0x0 instead of 0x1020304

                2013-03-21 13:02:11.835 Error: NullAssignment::localMount: XDMP-LABELBADMAGIC: Bad forest label magic number: 0x0 instead of 0x1020304

                In order to recover from this error, you will need to manually remove the bad Label file.  Removing the Label file will force MarkLogic Server to recreate the file and will allow the forest to be mounted.

                Steps for recovery:

                1. Make sure MarkLogic Server is shutdown on the affected host.

                2. Remove the Label file for the forest displaying the error

                a. In Linux the default location is "/var/opt/MarkLogic/Forests/[Forest-Name]/Label"

                b. In Windows the default location is "c:\Program Files\MarkLogic\Data\Forests\[Forest-Name]\Label"

                3. Restart MarkLogic Server.

                Introduction

                In some situations an existing cluster node needs to be replaced. There are multiple reasons for this activity like hardware failure or hardware replacement.

                In this Knowledgebase article we will outline the steps necessary to replace the node by reusing the existing cluster configuration without registering it again.

                Important notes:

                • The replacement node must have the same architecture as all other nodes of the cluster (e.g., Windows, Linux, Solaris). The CPUs must also have the same number of bits (e.g., 64, 32).
                • The replacement node must have the same (or higher) count of CPU cores
                • The replacement node must have the same (or higher) allocated disk space and mount points as the old node
                • The replacement node must have the same hostname as the old node, unless the node is an AWS EC2 instance using MARKLOGIC_EC2=1(default when using MarkLogic AMIs)

                Preparation steps for re-joining a node into the cluster

                • Install and configure the operating system
                  • make sure the mount points are matching the old setup
                  • in case the previous storage is healthy it can be reused (forests located on it will be mounted)
                • For any non-MarkLogic data (such as XQuery modules, Deployment scripts etc.) required to run on this node, ensure these are manually zipped and copied over as part of the staging process
                • Copy over MarkLogic configuration files (/var/opt/MarkLogic/*.xml) from a backup of the old node
                  • If xdqp ssl enabled is set to true, change the setting to false.  If you can’t do this through the Admin UI, you can manually update the value of xdqp-ssl-enabled to false.
                  • To re-enable ssl for xdqp connections once the node has rejoined the cluster, you will need to regenerate the replacement host certificate.  Follow the instructions in theRegenerating a XDQP Host Certificatessection of this article.

                Downloading MarkLogic for the New Host

                MarkLogic Server, and the optional MarkLogic Converters and Filters, can be downloaded from the MarkLogic Developer Community, the most recent versions can be found at the following URLS, and will provide you the option of downloading by either https or curl:

                If the exact version you are running is not available, you may still be able to download it by getting the download link for the closest current version (8,9 or 10), and editing the minor version number in the link.

                So if you need 10.0-1, and the current available version is 10.0-2, when you choose the Download via Curl option, you will get a download link that looks like this:

                https://developer.marklogic.com/download/binaries/10.0/MarkLogic-10.0-2-amd64.msi?t=SomeHashValue/1&email=myemail%40mycompany.com

                Update the URL with the minor release version you need:

                https://developer.marklogic.com/download/binaries/10.0/MarkLogic-10.0-1-amd64.msi?t=SomeHashValue/1&email=myemail%40mycompany.com

                If you are unable to get the version you need this way, then contact MarkLogic Support.

                Rejoining the Replacement Node to the Cluster

                There are two methods to rejoin a host into the cluster, depending on the availability of configuration files.

                1. Using an older set of configuration files from the node being replaced
                2. Creating a new set of configuration files from another node in the cluster

                Method 1: Rejoining the Cluster With Existing Configuration Files

                This procedure can be only performed if existing configuration files from /var/opt/MarkLogic/*.xml are available from the lost/old node otherwise it will fail causes a lot of problems.

                • Perform a standard MarkLogic server installation on the new target node
                  • $ rpm -Uvh /path/to/MarkLogic-<version>.x86_64.rpm or yum install /path/to/MarkLogic-<version>.x86_64.rpm
                  • $ rpm -Uvh /path/to/MarkLogicConverters-<version>.x86_64.rpm or yum install /path/to/MarkLogicConverters-<version>.x86_64.rpm (optional)
                  • Verify local configuration settings in/etc/marklogic.conf (optional)
                  • Do not start MarkLogic server
                • Create a new data directory
                  • $ mkdir /var/opt/MarkLogic (default location; might already exist if this separate mount point)
                  • Verify ownership of the data directory, daemon.daemon by default.
                    • To fix: $ chown -R daemon:daemon /var/opt/MarkLogic
                • Copy an existing set of configuration files into the data directory
                  • $ cp /path/to/old/config/*.xml /var/opt/MarkLogic
                  • Verify ownership of the configuration files, daemon.daemon by default.
                    • To fix: $ chown daemon:daemon /var/opt/MarkLogic/*.xml
                • Perform a last sanity check
                  • Hostname must be the same as the old node, except for AWS EC2 nodes as mentioned above
                  • Verify firewall or Security Group rules are correct
                  • Verify mount points, file ownership and permissions are correct
                • Start MarkLogic
                  • $ service MarkLogic start
                • Monitor the startup process

                After starting the node it will reuse the existing configuration settings and assume the identity of the missing node. 

                Method 2: Rejoining the Cluster With Configuration Files From Another Node

                This procedure is required if there is no older configuration file set available. For example no file backup was made from /var/opt/MarkLogic/*.xml. It requires manual editing of a configuration file.  

                • Perform a standard MarkLogic server installation on the new target node
                  • $ rpm -Uvh /path/to/MarkLogic-<version>.x86_64.rpm or yum install /path/to/MarkLogic-<version>.x86_64.rpm
                  • $ rpm -Uvh /path/to/MarkLogicConverters-<version>.x86_64.rpm or yum install /path/to/MarkLogicConverters-<version>.x86_64.rpm (optional)
                  • Verify local configuration settings in /etc/marklogic.conf (optional)
                • Start MarkLogic, and perform a normal server setup as a single node. DO NOT join the cluster now.
                  • $ service MarkLogic start
                  • Perform a basic setup
                  • DO NOT join the host to the cluster!
                • Stop MarkLogic, and move current configuration files in /var/opt/MarkLogic to a new location
                  • $ service stop MarkLogic
                  • $ mv /var/opt/MarkLogic/*.xml/some/place
                • Copy a configuration files set from one of the other nodes over
                  • $ scp <othernode>:/var/opt/MarkLogic/*.xml /var/opt/MarkLogic
                  • Verify ownership of the data directory, daemon.daemon by default.
                    • To fix: $ chown -R daemon:daemon /var/opt/MarkLogic
                • Make note of the <host-id> for the node be recreated in hosts.xml
                  • $ grep -B1 hostname /var/opt/MarkLogic/hosts.xml
                • Edit /var/opt/MArkLogic/server.xml **Note: This step is critically important to ensure correct operation of the cluster.
                  • Use a UTF-8 safe editor like nano or vi
                  • Update <host-id> with the value found in/var/opt/MarkLogic/hosts.xml
                  • Update <license-key> value if necessary.
                  • Update <licensee> value if necessary.
                  • Save the changes
                • Perform a last sanity check
                  • <host-id> must match the <host> defined in hosts.xml.
                    • Important: host will not start if these values do not match 
                  • Hostname must be the same as the old node, unless the node is an AWS EC2 instance using the configuration option MARKLOGIC_EC2=1, which is the default when using the MarkLogic provided AMIs.
                  • Firewall or Security Group rules are correct
                  • Mount points, ownership and permissions are correct
                • Start MarkLogic and monitor the startup process

                As emphasized in the procedures, it is very important to update server.xml and change the <host-id> to match the value defined in hosts.xml and apply the correct license information. Without these changes the node may not start up, may confuse the other nodes, or it may exhibit unexpected behavior.

                Wrapping Up

                For both methods, the startup process is the same. MarkLogic will use the configuration files to rejoin the cluster. Forests that no longer exist will automatically be recreated. Existing forests that have been mounted or copied to the correct location, will be mounted like before. Forests configured for local disk failover will automatically start synching with the online forests.  If configured, replication will start replicating the forests after the node is started. The forests can also be restored from backup, in case there is no local disk failover, or replication configured.

                Regenerating a XDQP Host Certificates

                The first step in the process is to check the Certificate to see whether it is valid or not.  If you replaced your node using method 1, the certificate is likely to be valid.  If you replaced your node using method 2, then the certificate is likely to be invalid.

                Log into a terminal on the newly replaced host, and extract the private key from /var/opt/MarkLogic/server.xml and the hosts certificate from /var/opt/MarkLogic/hosts.xml:

                • $ cp /var/opt/MarkLogic/server.xml /tmp/server.key
                • Edit /tmp/server.key to remove all XML formatting
                  • File should start with "-----BEGIN PRIVATE KEY-----"
                  • File should end with "-----END PRIVATE KEY-----"

                Now extract the certificate for the new host from/var/opt/MarkLogic/hosts.xml.

                • $ grep -A25 my-host.name /var/opt/MarkLogic/hosts.xml > /tmp/server.crt
                • Remove all the data from the file, except the certificate for the new host
                  • File should start with "-----BEGIN CERTIFICATE-----"
                  • File should end with "-----END CERTIFICATE-----"

                Once you have the private key, and the certificate, you can compare the md5 signatures of the files usingopenssl, to see if they match.

                • $ openssl rsa -in /tmp/server.key -noout -modulus | openssl md5; openssl x509 -in /tmp/server.crt -noout -modulus | openssl md5

                If the values match, STOP HERE.  The certificate is valid and does not need to be regenerated. If the values do not match, then the certificate needs to be regenerated.

                Make note of the <host-id> from /var/opt/MarkLogic/server.xml.  This will be used to populate the value for the Common Name (CN) when the certificate is generated.

                • $ grep -B1 hostname /var/opt/MarkLogic/hosts.xml

                Create the new self-signed certificate using the servers private key.  Typically these are set to 10 years (3650 days) by default when MarkLogic first runs, but you can choose another value if needed.  Use the <host-id> from the previos step as the CN.

                • $ sudo openssl req -key /tmp/server.key -new -x509 -days 3650 -out /tmp/new-server.crt -subj "/CN=[server-id-number]"

                Compare the MD5 Checksums with openssl, this time they should match:

                • $ openssl rsa -in /tmp/server.key -noout -modulus | openssl md5; openssl x509 -in /tmp/new-server.crt -noout -modulus | openssl md5

                Make a copy of hosts.xml to replace the certs, also note the host-id for use in a later step.

                • $ cp -p /var/opt/MarkLogic/hosts.xml /tmp/hosts.xml

                Edit /tmp/hosts.xml and replace the old certificate for the host with the new certificate.  Find the entry with the correct <host-id> and replace the <ssl-certificate> field with the new certificate in /tmp/new-server.crt

                Replace the existing hosts.xml with our updated copy

                • $ cp -p /tmp/hosts.xml /var/opt/MarkLogic/hosts.xml

                Restart MarkLogic on the node.  This can be done from any host in the cluster, using the Admin Interface, the REST Management API endpoint, or Query Console.

                • Admin Interface: In the left tree menu, click onConfigure à Hosts à [Hostname], then select theStatus tab and click Restart
                • REST Management API: $ curl --anyauth --user password:password -X POST -i --data "state=restart" -H "Content-type: application/x-www-form-urlencoded" http://localhost:8002/manage/v2/hosts/[host-name]
                • Query Console: xdmp:restart((xdmp:host("engrlab-129-179.engrlab.marklogic.com")), "To reload hosts.xml after certificate update")

                Verify the changes to hosts.xml have propagated to all hosts in the cluster.  Check that the hosts.xml is now the same for the hosts in the cluster.  One way of doing this is comparing md5 checksums.

                • $ md5sum /var/opt/MarkLogic/hosts.xml

                You should now be able to set xdqp ssl enabled to true in the group configurations.  Check the cluster status page in the Administrative Interface to ensure all the hosts have reconnected successfully, or review the ErrorLog files to ensure there are no SVC-SOCACC errors in the log.

                Additional Notes

                This article explains how to directly replace a node in a cluster by using the same host name. Another way is to add a new node to the cluster and transfer the forests which is explained in the following knowledge base article "Replacing a D-Node with local disk failover".

                Some of these steps may differ, such as operating system calls or file system locations. On a different OS, the specific commands will need to be adjusted to match the environment.

                Related Reading

                Replacing a failed MarkLogic node in a cluster: a step by step walkthrough

                Stemming:

                MarkLogic Server supports stemming in English and other languages. If stemmed searches are enabled in the database configuration, MarkLogic Server automatically searches for words that come from the same stem of the word specified in the query, not just the exact string specified in the query. A stemmed search for a word finds the exact same terms as well as terms that derive from the same meaning and part of speech as the search term.

                For e.g. in a stemmed search, a query for 'running' will match 'running', 'run' and 'ran' as they all stem to 'run'. The query is actually stemmed before being resolved, so queries for both 'running' and 'ran' are actually performed as queries for 'run', and they return similar results.

                 

                Relevance score for stemmed searches:

                 

                Search results in MarkLogic Server return in relevance order; that is, the result that is most relevant to the cts:query expression in the search is the first item in the search return sequence, and the least relevant is the last. (Documentation at http://docs.marklogic.com/guide/search-dev/relevance#chapter gives detailed information of how relevance score is computed).

                However, when using stemmed searches, the original query term and its stemmed matches are both ranked equally. That is, higher relevance score is not given to the exact match of the word.

                 

                For example, consider the following 3 documents:

                 

                run.xml

                <root>

                  <id>001</id>

                  <text>run out of time</text>

                </root>

                 

                running.xml

                <root>

                  <id>002</id>

                  <text>running out of time</text>

                </root>

                 

                ran.xml

                <root>

                  <id>003</id>

                  <text>ran out of time</text>

                </root>

                 

                The below search query for "running" returns all 3 documents ranked equally.

                 

                let $query:= cts:word-query("running")

                 

                for $hit in cts:search(doc(), $query,"relevance-trace")

                 

                return element hit {

                attribute score { cts:score($hit) },

                xdmp:node-uri($hit)

                }

                 

                ==>

                 

                <hit score="2048">run.xml</hit>

                <hit score="2048">running.xml</hit>

                <hit score="2048">ran.xml</hit>

                This behavior is desirable  in most search applications. However, to give higher score for the original query term, so that it comes up first in the search results, stemmed and unstemmed word-queries should be combined in an or-query.

                let $query:=

                cts:or-query(

                (cts:word-query("running","stemmed"),

                cts:word-query("running","unstemmed")))

                 

                 

                for $hit in cts:search(doc(), $query)

                return element hit {

                attribute score { cts:score($hit) },

                xdmp:node-uri($hit)

                }

                 

                ==>

                 

                <hit score="11264">running.xml</hit>

                <hit score="1024">run.xml</hit>

                <hit score="1024">ran.xml</hit>

                Note that for the above cts:or-query, 'word searches' option should be enabled for the database, else  the query returns an XDMP-WORDSEARCH  error.

                Introduction

                MarkLogic Server offers Fast Data Directories, and Large Data Directories to allow customers to better utilize their available infrastructure. This allows an organization to offload large objects to cheaper storage, or improve performance with SSDs for portions of a forest.  These directories are defined at the forest level, usually when the forest was created.

                Removing Fast or Large Data Directories

                There are two primary methods to remove these directories from a forest.

                • Rebalance to a new forest
                • Backup/Restore to a new forest

                Rebalancing to a New Forest

                This method takes advantage of the rebalancing mechanism in the server to move data from the forest with the Fast/Large Data Directories. New forests can be defined as part of this process, but it is not required.  The advantage of this method is that it does not require any downtime.  The primary disadvantage is that in can increase the IO, and CPU load on the servers as the data is moved between forests, and can result in data being moved more than once. If needed, these issues can be mitigate by adjusting the rebalancer priority and merge settings.

                Backup/Restore to a new forest

                This method allows a simple 1 for 1 swap of a forest with a Fast/Large Data Directory to one without these directories.  The advantage of this method is that, depending on the size of the forest, it can be completed faster than rebalancing.  There are a couple of disadvantages to this method.  The first is that the forest being replaced needs to be in read only mode when the backup is taken, until the restore is complete to the new forest.  The second is that it does require some downtime when switching between the old and new forests.  These issues can be mitigated with some careful planning.

                Procedures for Using Rebalance

                • Create the new forest/s
                • Attach the new forest/s to the database AND retire the existing forest/s
                  • This will cause the database to rebalance, and move the data from the old forest/s to the new forest/s.
                • Detach the old forest/s from the database once the forest/s no longer have active documents or active fragments.
                • Delete the old forest/s

                Procedures for Using Backup/Restore

                • Put the forest/s in read only mode and perform a forest level backup
                  • Database level backups can be used, but the whole database will need to be in read only mode when the backups are started.
                • Create a new forest.  Do not attach it to the database yet.
                • Restore the backup to the new forest/s
                • Verify the old forest/s and new forest/s have the same active document and active fragment count.
                • Detach the old forest/s and attach the new forest/s
                • Delete the old forest/s

                References

                Removing Hosts From a MarkLogic Cluster Minimizing Downtime

                This is a procedure to remove hosts from a MarkLogic cluster while minimizing unavailability. It is assumed that High availability is configured using local disk failover and all primary forests have at least one replica forest configured.

                Typically when a host is removed from a cluster, it will trigger a restart of the cluster to apply the new configuration. In some environments, this could be burdensome if it is a large cluster and a number of hosts are being removed.

                Planning

                For hosts to be removed from the cluster, they must meet the following criteria:

                • The host must not be a bootstrap host for database replication.
                • The host must not have any forests configured.
                • The host must not serve as a failover host for shared disk failover.can not hosting any forests for the cluster, nor can they be coupled to any other clusters as a foriegn host.

                We also recommend scheduling cluster maintenance during low usage periods.

                Removing more than one host

                Hosts can only be removed one at a time, but you can remove multiple hosts without restarting the cluster by using the Management API DELETE /admin/v1/host-config, with the remote-host parameter.

                curl --anyauth --user user:password -X DELETE -i \ http://cluster-host:8001/admin/v1/host-config?remote-host=departing-host

                We recommend that the hosts being removed be offline, either by shutting down the MarkLogic service, or by shutting down host at the OS level, but this can also be done while a host is online.

                State of removed hosts

                When a host is removed using the remote-host parameter, the host information is removed from the cluster, but the host itself still retains the cluster configuration and will continue to attempt to connect to the cluster unsuccessfully. This means you may see errors in the logs until the MarkLogic service on the removed host is shutdown.

                If the removed host will be added to another cluster, the existing MarkLogic configuration will need to be reset. This can be done by stopping the MarkLogic service, removing the contents of /var/opt/MarkLogic, and starting the MarkLogic service. The removed host will now be in an uninitialized state, and can be added to a new cluster.

                Wrapping Up

                Once the all the hosts have been removed from the cluster, we do recommend performing a restart of the cluster to ensure the configuration change has been fully committed.

                Further reading

                Introduction

                Using MarkLogic Server's Admin UI, it is possible to modify the name of a single host via Admin UI -> Configure -> Hosts -> 'Select Host in question' and update the name and click ok.

                However, if you would want to change/update the hostnames across cluster, we recommend that you follow the below steps:

                1) Renaming hosts in a cluster

                • Add the new hostnames to the DNS or /etc/hosts on all hosts.
                • Make sure all new hostnames can be resolved from the nodes in the cluster.
                • Rename all host-names using one of the following:
                • Host/cluster should come up if the DNS entries have been set up correctly.
                • Remove old host names.

                2) Once the hostnames are updated, we recommend you verify the items below that may be affected by hostname changes:

                • Application Servers
                • PKI Certificates
                • Database replication
                • Flexible replication
                • Application code

                Introduction

                In a multiple node cluster with local disk failover configured, there may be a need to replace a server with new hardware. This article explains how to do that while preserving the failover configuration.

                Sample configuration

                Consider a 3-node cluster with local disk failover for database Test, and the forest assignment for the hosts looks like this:  (all forests ending with 'p' are primary and those ending with 'r' are replica)

                Host A Host B Host C
                forest a-1p forest b-3p forest c-5p
                forest a-2p forest b-4p forest c-6p
                forest a-3r forest b-1r forest c-2r
                forest a-6r forest b-5r forest c-4r

                With this configuration under normal operations, each host will have the two primary forests "open" and the replica forests "sync replicating".

                Failover Example

                In the event of a node failure of say, Host B, primary forests on Host B will failover to Hosts A & C as expected. The forests a-3r and c-4r are now "open" and acting as master forests. 

                When Host B comes back online, the replica forests a-3r and c-4r will continue as acting masters, and forests b-3p & b-4p on Host B will now act as replicas; This state will persist until another failover event occurs or the forests are manually restarted.

                Replacing a Host 

                In the case where a node in the cluster needs to be physically replaced with another node, it is important to preserve the original master-replica configuration of the forests, so that there is no performance burden on a single node hosting all the primary forests.

                Example: replacing Host-B with a new Host-D

                The steps listed below show how to replacing a node (old Host-B with new Host-D) without affecting the failover configuration:

                1. Shut down Host B and make sure forest failover successful - Forests c-4r & a-3r are "open" (acting masters).
                2. Add Host D as a node to the cluster; 
                3. Create new replica forests (d-1r and d-5r) on Host D and make them replicas of the corresponding primary forests on Host A & C. 
                4. Create new primary forests 'd-3p' and 'd-4p' on Host D  (These will replace b-3p and b-4p); 
                5. Break replication between a-1p and b-1r, and between c-5p and b-5r by updating the forest configuration for the primary forests.
                6. Take forest level backup of the failed over forests ('forest a-3r' and 'forest c-4r')
                7. Restore the backups from step 6 to the new primary forests 'forest d-3p' and 'forest d-4p' on Host D
                8. Attach forests 'forest d-3p' and 'forest d-4p' to the database and make forests 'forest a-3r' and 'forest c-4r'  their replicas.

                This will replace Host B with Host D, as well as preserve the previously existing primary-replica configuration among the hosts.

                Host A Host D Host C
                forest a-1p forest a-2p forest a-3r
                forest d-3p forest d-4p forest d-1r
                forest c-5p forest c-6p forest c-2r
                forest a-6r forest d-5r forest c-4r

                Additional Notes

                It is important to make sure that the database is quiesced before taking the forest backups. The idea is to disallow ingestion/updates on the database: One technique is to quiesce a database by making all of its forests 'read-only' -  http://docs.marklogic.com/guide/admin/forests#id_72520 during the process and revert once complete.

                Note: This example assumes a distributed master-replica configuration of a 3-node cluster. However, the same procedure works with other configurations with some careful attention to the number of forests on each host and breaking replication between the right set of hosts.

                 

                 

                 

                 

                 

                 

                Introduction

                MarkLogic uses two different Certificates for different objective. 1) Host certificate, and 2) AppServer Certificate

                Host Certificates are used for inter cluster communication between nodes. Certificates are stored locally only individual nodes hosts.xml file. While most other configuration files are common and have same content on all nodes, hosts.xml file having local certificate are specific to that node only and have different certificate on each node.

                App Certificate through Certificate Template used by Application Server running on specific application port.  are different than SSL certificates configured for app server,SSL certificates are configured for secure connection between server and client.

                There is a third intra cluster certificates that are used mainly for communication between a cluster and foreign cluster, which are stored in clusters.xml. All nodes in cluster have common clusters.xml file and hence common certificate to communicate with foreign cluster.  This cluster certificates can be replaced using 'admin:cluster-set-xdqp-ssl-certificate' api. 

                Issue

                MarkLogic installation comes with Host certificate valid for 10 years. MarkLogic currently does not have any Admin GUI based path to replace host certificate which has expired after 10 years, which results in inter cluster XDQP SSL communication failure when Certificate expires after 10 years.

                Solution

                Admin can generate host certificate through query in Query console and replace certificate and key in individual node hosts.xml file.

                Objective is to generate a new public and private key for a node. Private key is stored in server.xml and public key is stored in hosts.xml. Private key stored in server.xml is specific to that node, whereas hosts.xml that has public key stores the cert information of all the hosts in that cluster.

                Admin can generate the certificate for host by running a query with host-id for which certificate is expired, which would give us both Private and Public key for that specific host. Private key is copied to server.xml of a node whereas public key needs to be updated on hosts.xml of all nodes in a cluster for that particular host-id. 

                Steps to generate host certificate and private key file.

                1. If cert needs to be generated for Node A, then replace host-id for the Node A at <host-id of your node>. This query will generate public and private key for Node A.

                xquery version "1.0-ml";
                import module namespace pki = "http://marklogic.com/xdmp/pki" at "/MarkLogic/pki.xqy";
                declare namespace x509="http://marklogic.com/xdmp/x509";
                
                let $host-id as xs:unsignedLong := <host-id of your node>
                
                let $keys as xs:string* := 
                  xdmp:rsa-generate(
                    <options xmlns="ssl:options">
                      <key-length>2048</key-length>
                    </options>)
                
                let $cert :=
                  xdmp:x509-certificate-generate(
                    element x509:cert {
                      element x509:version {2},
                      element x509:serialNumber {pki:integer-to-hex(xdmp:random())},
                      element x509:issuer {
                        element x509:commonName {$host-id}
                      },
                      element x509:validity {
                        element x509:notBefore {fn:current-dateTime()},
                        element x509:notAfter {fn:current-dateTime() + xs:dayTimeDuration("P3650D")}
                      },
                      element x509:subject {
                        element x509:commonName {$host-id}
                      },
                      element x509:publicKey {$keys[2]},
                      element x509:v3ext {
                        element x509:basicConstraints {
                          attribute critical {"false"},
                          "CA:FALSE"
                        }
                      }
                    },
                    $keys[1]
                  )
                
                return ($cert, $keys)
                

                2. Stop the service on all nodes.

                3. Copy the generated private key to server.xml of Node A (Default location: /var/opt/MarkLogic/server.xml) on the that specific node and save under tag - <ssl-private-key>. server.xml is a node specific file

                4. Copy the public key that was generated to the hosts.xml of Node A (Default location: /var/opt/MarkLogic/hosts.xml)and save it under the <host><ssl-certificate> for our specific host-id. hosts.xml is cluster wide common file, hence Admin will need to copy updated hosts.xml file on all nodes in cluster when changing certificate for Node A.

                5. Repeat above steps for all 7 Nodes requiring new certificate

                6. Start MarkLogic service for all nodes and enable XDQP SSL.

                Further Reading

                Introduction and Pre-requisites

                MarkLogic provides and manages PKCS #11 secured wallet which can be used as the KMS aka keystore for encryption at rest. When MarkLogic server starts for the first time, the server prompts to configure the wallet password. This article describes the way to reset the wallet password if you forget the one that was set at the time of initial launch. 

                As the encryption at rest is enabled for databases, first you will need to decrypt all of the encrypted data, otherwise you will lose access to it.

                To disable encryption, at the cluster level, you will need to change the cluster setting of Data Encryption from 'force' to 'default-off' under the key store tab of the cluster. All the databases that have encryption enabled, please change them to disable encryption. You will also need to disable log encryption as well if enabled. Once this change is complete, all the databases will need to be reindexed, which will decrypt the databases. Once you make sure all the databases are decrypted and reindexed before resetting the password.

                Steps to  reset the wallet password:

                1. Stop MarkLogic server on all hosts

                2. On all of the nodes,

                move the following files/directories to a secure location in case they need to be restored

                /var/opt/MarkLogic/keystore*.xml

                /var/opt/MarkLogic/kms

                Please make sure you have backup of the above.

                3. Once those files are deleted, Copy the new/clean bootstrap keystore.xml from the MarkLogic install directory on all the nodes

                cp /opt/MarkLogic/Config/keystore.xml /var/opt/MarkLogic/

                4. Make sure step 2 and 3 are performed on all the nodes and then start MarkLogic server on all nodes.

                5. Reset your wallet password from Cluster->Keystore->password change page refer to https://docs.marklogic.com/guide/security/encryption#id_61056

                Note: In the place of current password, you can provide any random password or even leave it blank.

                Once complete, your wallet password should be set to the new value. Then you can configure your encryption at rest for data again.

                (NOTE: AS WE ARE CHANGING THE ENCRYPTION CONFIGURATION AND RESETTING WALLET PASSWORDS, IT IS HIGHLY RECOMMENDED THAT YOU HAVE A PROPER BACK UP OF YOUR DATA AND CONFIGURATION. Please try the above mentioned steps in any of lower environments before you are implementing in your production)

                Summary

                A Socket bind error will occur when there are more than two MarkLogic Server instances running simultaneously in the same host. Two simultaneous instances of MarkLogic Server might occur if a MarkLogic Server process did not gracefully shutdown while a new one was spawned.

                Example error messages seen in the MarkLogic Server ErrorLog.txt file:

                Critical: Server::updateConfigServers: SVC-SOCBIND: Socket bind error: bind 0.0.0.0:8000: Address already in use
                Critical: Server::updateConfigServers: SVC-SOCBIND: Socket bind error: bind 0.0.0.0:8001: Address already in use
                Critical: Server::updateConfigServers: SVC-SOCBIND: Socket bind error: bind 0.0.0.0:8002: Address already in use

                It is dangerous for two instances of the server to be running simultaneously on the same host. Both instances will attempt to operate from the same server configuration files and on the same forest data files. The behavior is unpredictable and, in the worst case, it might lead to inconsistent data.

                Mitigation

                If you suspect that there are multiple MarkLogic Server instances running at the same time on the same host, you should follow these steps:

                1. To get a list of MarkLogic processes running, execute

                ps -ef | grep -i mark

                Under normal circumstances, it will return 2 process - a watchdog process running at root and the main MarkLogic Server process.  For example, the ps command  should return something like. 

                root 1766 1 0 Apr03 ? 00:00:00 /opt/MarkLogic/bin/MarkLogic
                daemon 1767 1766 0 Apr03 ? 04:00:24 /opt/MarkLogic/bin/MarkLogic

                2. Run the above command on all hosts in your MarkLogic cluster. If you discover more than the expected 2 processes on any single host, then 
                    -  Shutdown MarkLogic on the node and verify that no MarkLogic processes are running.
                    -  If there are still MarkLogic processes running, kill the processes by executing

                     kill -9 <pid>

                where <pid> is the process id discovered while executing the ps command.

                    -  If that still does not clear the errant MarkLogic process, reboot the host machine.

                3. Once there are no more MarkLogic Server processes running, restart MarkLogic Server.

                 

                Backwards Compatibility

                Newer versions of MarkLogic will support backups taken from older versions of the software.  This restore may cause a reindex of the data in order to upgrade the database to the current feature release version.  Information on backing-up/restoring can be found in the following documentation:

                Database Level Backups: Backing Up and Restoring a Database

                Forest Level Backups and Restores: Making Backups of a Forest, Restoring a Forest

                Upgrade compatibility: Upgrades and Database Compatibility

                Downgrading

                MarkLogic does not support downgrading to an older version.  Therefore, backups that were taken on a newer version of MarkLogic will not be compatible with older versions of MarkLogic.  For more details please see MarkLogic Server Version Downgrades are Not Supported.

                Backup and Restore Across OS Versions

                Notes about Backup and Restore Operations

                • The backup files are platform specific--backups on a given platform should only be restored onto the same platform. This is true for both database and forest backups.

                Platform is used to indicate OS Families, e.g. Windows, Linux and MacOS. MarkLogic supports backup and restore operations across OS version changes, e.g. from RHEL 6 to RHEL 7, but not across OS changes such as Windows to Linux.

                Introduction

                When a database backup taken on Cluster A is restored (using incremental backup) on Cluster B, sometimes it fails with the message on the admin screen  -

                The database restore has failed. Please check the server logs for details.

                A quick look at the logs will show an error indicating that the backup directory does not exist, even though the backup was copied from Cluster A to Cluster B

                Error: TaskManager::runTask: XDMP-FORESTRESTOREFAILED: Restore failed for forest Documents: SVC-DIROPEN: Directory open error: opendir '/tmp/backup/20180827-1607002170310/20180827/1609002389230/Forests/Documents': No such file or directory

                Error: 1-forest database restore from /space/backup/20180827-1607002170310, jobid=472666486696782942 failed: XDMP-FORESTRESTOREFAILED: Restore failed for forest Documents: SVC-DIROPEN: Directory open error: opendir '/tmp/backup/20180827-1607002170310/20180827/1609002389230/Forests/Documents': No such file or directory

                This happens when the backup directory structure is different between the clusters. For example, on Cluster A, the backup directory exists under /tmp/backup.

                When copying the backup for restore on Cluster B, it was copied to /space/backup.

                Even though the backup directory was moved to a different location, per the error logs, the restore job is looking to find it in the old location (/tmp/backup) and fails as it does not find it.

                Resolution

                Every incremental backup will store a reference to the location of the previous incremental backup and the very first one will store a reference to the location of the full backup. These are stored in a file by the name BackupTag.txt .It is from here that the restore job fetches the backup locations and if they still point to an older location, then incremental restore will fail.

                To get past this, BackupTag.txt which is located under incremental-backup-directory/incremental-backup/Forests/forest-name/ should be edited such that the BasePath parameter reflects the current backup directory.

                For example, on Cluster B, BasePath in BackupTag.txt(/space/backup/20180827-1607002170310/20180827/1609002389230/Forests/Documents) should be changed from


                BasePath /tmp/backup/20180827-1607002170310  to 

                BasePath /space/backup/20180827-1607002170310

                This should be done on every incremental backup in the directory.

                Note that the example presented in this article does not specify a separate location for incremental backups.

                Further Reading

                Backup Directory Structure

                Notes about Backup and Restore Operations

                Incremental Backup

                Incremental Backup - RTO and RPO Considerations

                Disaster Recovery

                 

                Summary

                When performing a Security database backup on one cluster and restoring on another cluster, there are precautionary measures to be taken. 

                Details

                Since MarkLogic Server version 4.1-5,  the internal user IDs are derived from the hash of the user name when the user object is created. Thus, two user objects created on two different Security databases should have the same user ID if they are created with the same name. This makes it possible to restore a Security database from one environment to another.

                However, we strongly recommend checking for the below conditions before restore in order to avoid any serious damage to the Security database. 

                • Ensure that both the environments are running the same MarkLogic Server versions and are on the same Operating System.
                • Verify that no Users, Roles or Amps have been added to the new cluster, that are not also present in the original cluster. Restoration of the Security database is a complete replacement, and any intentional differences in the two clusters will be lost.   Any applications using obsolete roles might become inaccessible.

                Although the user IDs are derived from the hash of the username, the id's can be different in some cases:

                • If there is already was an existing user object with that id when a new user was created (i.e. hash collision)
                • The username was changed on an existing user object.

                Review all the above conditions before restoring the Security database.

                Note: It is recommended that a backup of the security database from the new cluster is created and saved before performing the restore of a Security database from a different cluster.

                Restoring from a different server version

                When restoring the Security database from a backup made on an older version of ML server to a newer version of ML, a manual upgrade of the Security db is also required after the restore. Without this additional step, there is a mismatch between the server version and the security database version and some features will not work as expected. There will be issues with reindexing, query results,etc.

                A security database upgrade can be done by navigating to Admin UI -> 'Support' tab -> click on 'Upgrade' button on the bottom right corner

                Note that MarkLogic does not support restoring a backup made on a newer version of MarkLogic Server onto an older version of MarkLogic Server.

                Restoring Security Database with different Certificate template content

                If your AppServer is associated with Template and Security DB you intend to restore has different Template then to avoid lingering Template ID, we recommend that you detach AppServer to Template association for app servers(disabling SSL) prior to restoring security DB, please read -  Security Database restore leading to lingering Certificate Template id in Config files 

                 

                 

                 

                 

                Introduction

                While launching the CloudFormation Templates to create a managed cluster on AWS, the variables MARKLOGIC_ADMIN_USERNAME and MARKLOGIC_ADMIN_PASSWORD need to be provided as part of the AMI user data and these values are used to create the initial admin MarkLogic user.

                This user creation is needed for initial cluster set up process and in case if a node restarts and joins the cluster. The password that is provided when launching the template is not exported to MarkLogic process and it is not stored anywhere on the AMI. 

                If we wish to provide an administrator password, it is not recommended practice to provide a clear text password through /etc/marklogic.conf.

                Alternatives

                A best practice is to use a secure S3 bucket with encryption configured and data transmission in combination with an AMI role assigned to EC2 instances on the cluster to access the S3 bucket.  This approach is discussed in our documentation and the aim of this Knowledgebase article is to cover the approach in further detail.

                We can use AWS CLI as suggested below to securely retrieve the password from an object stored in an S3 bucket and then pass that into /etc/marklogic.conf file as the MARKLOGIC_ADMIN_PASSWORD variable.

                Solution

                We recommend storing the MarkLogic admin password in an object (e.g. a text file) in a secured/encrypted S3 bucket which can only be retrieved by an authorized user who has access to the specific S3 bucket.

                As a pre-requisite, create a file (For example: password.txt) with the required value for MARKLOGIC_ADMIN_PASSWORD and place it in a secure s3 bucket (for example: a bucket named "mlpassword")

                To modify the CloudFormation Template

                1. Locate the Launch configurations in the template

                2. Within LaunchConfig1, add the following line at the beginning

                  #!/bin/bash

                3. Add the following at the end of the launch configuration block

                           - >

                       echo 'export MARKLOGIC_ADMIN_PASSWORD=$(aws s3 --region us-west-2 cp s3://mlpassword/password.txt -)' >

                       /etc/marklogic.conf # create marklogic.conf

                4. Delete the entries are referring to MARKLOGIC_ADMIN_PASSWORD

                       - MARKLOGIC_ADMIN_PASSWORD=
                       - !Ref AdminPass
                       - |+

                5. So after modifying the LaunchConfig , it would look like below:

                LaunchConfig1:
                    Type: 'AWS::AutoScaling::LaunchConfiguration'
                    DependsOn:
                      - InstanceSecurityGroup
                    Properties:
                      BlockDeviceMappings:
                        - DeviceName: /dev/xvda
                          Ebs:
                            VolumeSize: 40
                        - DeviceName: /dev/sdf
                          NoDevice: true
                          Ebs: {}
                      KeyName: !Ref KeyName
                      ImageId: !If [EssentialEnterprise, !FindInMap [LicenseRegion2AMI,!Ref 'AWS::Region',"Enterprise"], !FindInMap [LicenseRegion2AMI, !Ref 'AWS::Region', "BYOL"]]
                      UserData: !Base64
                        'Fn::Join':
                          - ''
                          - - |
                              #!/bin/bash
                          - - MARKLOGIC_CLUSTER_NAME=
                            - !Ref MarkLogicDDBTable
                            - |+

                            - MARKLOGIC_EBS_VOLUME=
                            - !Ref MarklogicVolume1
                            - ',:'
                            - !Ref VolumeSize
                            - '::'
                            - !Ref VolumeType
                            - |
                              ::,*
                            - |
                              MARKLOGIC_NODE_NAME=NodeA#
                            - MARKLOGIC_ADMIN_USERNAME=
                            - !Ref AdminUser
                            - |+

                            - |
                              MARKLOGIC_CLUSTER_MASTER=1
                            - MARKLOGIC_LICENSEE=
                            - !Ref Licensee
                            - |+

                            - MARKLOGIC_LICENSE_KEY=
                            - !Ref LicenseKey
                            - |+

                            - MARKLOGIC_LOG_SNS=
                            - !Ref LogSNS
                            - |+

                            - MARKLOGIC_AWS_SWAP_SIZE=
                            - 32
                            - |+

                            - >
                              echo 'export MARKLOGIC_ADMIN_PASSWORD=$(aws s3 --region us-west-2 cp s3://mlpassword/password.txt -)' >
                              /etc/marklogic.conf # create marklogic.conf

                            - !If
                              - UseVolumeEncryption
                              - !Join
                                - ''
                                - - 'MARKLOGIC_EBS_KEY='
                                  - !If
                                    - HasCustomEBSKey
                                    - !Ref VolumeEncryptionKey
                                    - 'default'
                              - ''

                      SecurityGroups:
                        - !Ref InstanceSecurityGroup
                      InstanceType: !Ref InstanceType
                      IamInstanceProfile: !Ref IAMRole
                      SpotPrice: !If
                        - UseSpot
                        - !Ref SpotPrice
                        - !Ref 'AWS::NoValue'
                    Metadata:
                      'AWS::CloudFormation::Designer':
                        id: 2efb8cfb-df53-401d-8ff2-34af0dd25993

                6. Repeat the steps 2,3,4 for all the other LaunchConfig groups and save the template and launch the stack.

                With this, there is no need to provide the Admin Password while launching the stack using Cloud formation templates.

                **Please make sure that the IAM role that you are assigning have access to the S3 bucket where the password file is available. 

                NOTE: The Cloud formation templates are created in YAML - be cautious when editing as YAML is whitespace sensitive.

                Summary

                If MarkLogic Server is installed on an Amazon Elastic Compute Cloud (EC2) instance and you execute queries in the MarkLogic Query Console, it is possible that the queries will be silently cancelled. Long running queries may time out because of an AWS attached Load Balancer.

                Details,

                The Amazon Elastic Load Balancer (ELB) performs health check on running instances using protocols, timeouts etc.  The ELB terminates a connection if it is idle for more than 60 seconds. An idle connection is established when there is no action or event performed i.e. read or write. Consequently, when queries run for more than 60 seconds, the load balancer will think the connection is idle and will terminate it. When the ELB terminates a Query Console connection, it does not give any message in the display. Instead, an “XDMP-CANCELLED” message is logged to the MarkLogic ErrorLog.txt file. An XDMP-CANCELLED message indicates that query was cancelled either explicitly or as a result of a system event.

                Removing the Load balancer from your EC2 instance is one solutions to enable long running Query Console queries on an Amazon EC2 instance.

                 

                [ref: http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/ts-elb-healthcheck.html]

                Introduction: Summary

                When attempting to access files stored on an Amazon AWS S3 Bucket using MarkLogic an SVC-S3SOCERR error is raised. 

                Cause

                Under some conditions when installing a MarkLogic server, such as unattended scripted installation, some required CA Root certificates were missed resulting in the error seen.

                Prior to MarkLogic 9.0-5, there was an error in the CA certificate installation processed whereby some certificates were incorrectly flagged as disabled and therefore not installed.

                Resolution

                Upgrade to the latest version of MarkLogic to ensure all required certificates are installed as well as any recent and updated CA Root certificates.

                In the interim, the missing AWS Root certificates can be downloaded from the Amazon Trust Repository and installed manually using the MarkLogic Admin UI (Configure -> Security -> Certificate Authorities)

                References

                Introduction

                If you have forest level failover configured on your MarkLogic cluster, in the event that a single host in the cluster loses contact with the other hosts, the forests will fail over to the backup set of forests: the replica forests

                What should I do in the event of a failover?

                Failover shifts the responsibility for a given set of forests over to other hosts in the cluster; if the failing host "loses" control of its' forests, control is not automatically given back when the master becomes available; failing forests has to happen manually.

                To fail a forest back (to "flip" control back to the master), if both the replica and master forests are in sync with each other, all that's needed is to restart the replica forest. This can be done using the admin API (Configure > Forests > Forest Name > Status > Restart), or XQuery (xdmp:forest-restart):

                https://docs.marklogic.com/xdmp:forest-restart

                flip-forests.xqy

                The above code is intended as a sample for something that could be used as a scheduled task that will automatically check for failed over forests and to flip them back where possible.

                It's also worth noting that this may not be something you'd want to do; in many cases, a failover event might be a warning of a problem that occurred that needs to be investigated (for example: a disk error), so if you are planning on managing failing back forests automatically, you may want to ensure that you are monitoring the ErrorLogs for evidence of failover events so you know that they're happening.

                Further reading

                Summary

                This article explores fragmentation policy decisions for a MarkLogic database, and how search results may be influenced by your fragmentation settings.

                Discussion

                Fragments versus Documents

                Consider the below example.

                1) Load 20 test documents in your database by running

                let $doc := <test>{
                for $i in 1 to 20 return <node>foo {$i}</node>
                }</test>
                for $i in 1 to 20
                return xdmp:document-insert ('/'||$i||'.xml', $doc)

                Each of the 20 documents will have a structure like so:

                <test>
                    <node>foo 1</node>
                    <node>foo 2</node>
                           .
                           .
                           .
                    <node>foo 20</node>
                </test>
                

                2) Observe the database status: 20 documents and 20 fragments.

                3) Create a fragment root on 'node' and allow the database to reindex.

                4) Observe the database status: 20 documents and 420 fragments. There are now 400 extra fragments for the 'node' elements.

                We will use the data with fragmentation in the examples below.


                Fragments and cts:search counts

                Searches in MarkLogic work against fragments (not documents). In fact, MarkLogic indexes, retrieves, and stores everything as fragments.

                While the terms fragments and documents are often used interchangeably, all the search-related operations happen at fragment level. Without any fragmentation policy defined, one fragment is the same as one document. However, with a fragmentation policy defined (e.g., a fragment root), the picture changes. Every fragment acts as its own self-contained unit and is the unit of indexing. A term list doesn't truly reference documents; it references fragments. The filtering and retrieval process doesn't actually load documents; it loads fragments. This means a single document can be split internally into multiple fragments but they are accessed by a single URI for the document.

                Since the indexes only work at the fragment level, operations that work at the level of indexing can only know about fragments.

                Thus, xdmp:estimate returns the number of matching fragments:

                xdmp:estimate (cts:search (/, 'foo')) (: returns 400 :)

                while fn:count counts the actual number of items in the returned sequence:

                fn:count (cts:search (/, 'foo')) (: returns 20 :)


                Fragments and search:search counts

                When using search:search, "... the total attribute is an estimate, based on the index resolution of the query, and it is not filtered for accuracy." This can be seen since


                import module namespace search = "http://marklogic.com/appservices/search" at "/MarkLogic/appservices/search/search.xqy";
                search:search("foo",
                <options xmlns="http://marklogic.com/appservices/search">
                <transform-results apply="empty-snippet"/>
                </options>
                )

                returns

                <search:response snippet-format="empty-snippet" total="400" start="1" page-length="10" xmlns:search="http://marklogic.com/appservices/search">
                <search:result index="1" uri="/3.xml" path="fn:doc(&quot;/3.xml&quot;)" score="2048" confidence="0.09590387" fitness="1">
                <search:snippet/>
                </search:result>
                <search:result index="2" uri="/5.xml" path="fn:doc(&quot;/5.xml&quot;)" score="2048" confidence="0.09590387" fitness="1">
                <search:snippet/>
                </search:result>
                .
                .
                .
                <search:result index="10" uri="/2.xml" path="fn:doc(&quot;/2.xml&quot;)" score="2048" confidence="0.09590387" fitness="1">
                <search:snippet/>
                </search:result>


                Notice that the total attribute gives the estimate of the results, starting from the first result in the page, similar to the xdmp:estimate result above, and is based on unfiltered index (fragment-level) information. Thus the value of 400 is returned.

                When using search:search:

                • Each result in the report provided by the Search API reflects a document -- not a fragment. That is, the units in the Search API are documents. For instance, the report above has 10 results/documents.
                • Search has to estimate the number of result documents based on the indexes.
                • Indexes are based on fragments and not documents.
                • If no filtering is required to produce an accurate result set and if each fragment is a separate document, the document estimate based on the indexes will be accurate.
                • If filtering is required or if documents aggregate multiple matching fragments, the estimate will be inaccurate. The only way to get an accurate document total in these cases would be to retrieve each document, which would not scale.

                Fragmentation and relevance

                Fragmentation also has an effect on relevance.  See Fragments.


                Should I use fragmentation?

                Fragmentation can be useful at times, but generally it should not be used unless you are sure you need it and understand all the tradeoffs. Alternatively, you can break your document into subdocuments instead. In general, the search API is designed to work better without fragmentation in play.

                What is DLS?

                The Document Library Service (DLS) enables you to create and maintain versions of managed documents in MarkLogic Server. Access to managed documents is controlled using a check-out/check-in model. You must first check out a managed document before you can perform any update operations on the document. A checked out document can only be updated by the user who checked it out; another user cannot update the document until it is checked back in and then checked out by the other user. 

                Searching across latest version of managed documents

                To track document changes, you can store versions of a document by defining a retention policy in DLS.  However, it is often the latest version of the document that most of the people are intereseted in. MarkLogic provides a function dls:documents-query which helps you access latest versions of the managed documents in the database. There are situations where there are performance overhead in using this function.  When the database has millions of managed documents you may see some performance overhead in accessing all the latest versions. This is an intrinsic issue related to because of large numbers of files and joining across properties.

                How can one improve the search performance?

                A simple workaround is to add your latest versions in a collection (say "latest"). Instead of the API dls:documents-query, you can then use a collection query on this "latest" collection. Below are two approaches that you can use - while the first approach can be used for new changes (inserts/updates), the second approach should be used to modify the existing managed documents in the database.

                1.) To add new inserts/updates to "latest" collection

                Below are two files, manage.xqy, and update.xqy that can be used for new inserts/updates.

                In manage.xqy, we do an insert and manage, and manipulate the collections such that the numbered document has the "historic" collection and the latest document has the "latest" collection. You have to use xdmp:document-add-collections() and xdmp:document-remove-collections() when doing the insert and manage because it's not really managed until after the transaction is done.

                In update.xqy, we do the checkout-update-checkin with the "historic" collection (so that we don't inherit the "latest" collection from the latest document), and then add "latest" and remove "historic" from the latest document. 

                (: manage.xqy :)
                xquery version "1.0-ml";
                import module namespace dls = "http://marklogic.com/xdmp/dls" at "/MarkLogic/dls.xqy";
                dls:document-insert-and-manage(
                  "/stuff.xml",
                  fn:false(),
                  <test>one</test>,
                  "created",
                  (xdmp:permission("dls-user", "read"),
                   xdmp:permission("dls-user", "update")),
                  "historic"),
                xdmp:document-add-collections(
                  "/stuff.xml",
                  "latest"),
                xdmp:document-remove-collections(
                  "/stuff.xml",  "historic")

                (: update.xqy :)
                xquery version "1.0-ml";
                import module namespace dls = "http://marklogic.com/xdmp/dls" at "/MarkLogic/dls.xqy";
                dls:document-checkout-update-checkin(
                  "/stuff.xml",
                  <test>three</test>,
                  "three",
                  fn:true(),
                  (),
                  ("historic")),
                dls:document-add-collections(
                  "/stuff.xml",
                  "latest"),
                dls:document-remove-collections(
                  "/stuff.xml",
                  "historic")

                2.) To add the already existing managed documents to the "latest" collection

                To add the latest version of documents already existing in your database to the "latest" collection you can do the following in batches.

                xquery version "1.0-ml";
                import module namespace dls = "http://marklogic.com/xdmp/dls" at "/MarkLogic/dls.xqy";
                declare variable $start external ;
                declare variable $end   external ;
                for $uri in cts:search(fn:collection(), dls:documents-query())[$start to $end]/document-uri(.) 
                return xdmp:document-add-collections($uri, ("latest"))

                This way you can segregate historical and latest version of the managed documents and then, instead of using dls:documents-query, you can use the "latest" collection to search across the latest version of managed documents.

                Note: Although this workaround may work when you want search across the latest version of managed documents, it does not solve all the cases. dls:documents-query is used internally in many dls.xqy calls so not all functionality will be improved.

                Summary

                This knowledge base discusses the various aspect of vulnerabilty found in glibc library (CVE-2015-7547) in respect to MarkLogic Server.

                Please note - We do not expect any changes to be done at MarkLogic Application software level to protect against vulnerability, but we highly recommend that affected Linux OS platform (using affected library version) get latest patch to protect against exposure. 

                 

                1) MarkLogic Dependency 

                Application layer software like MarkLogic relies on underneath Operating System for various operations, critically Memory Managment. On Linux platform, glibc library is the prime lirbary package, providing different memory capability to Application layer.

                MarkLogic package installation depends upon the avaibility of glibc library from OS layer (Checking MarkLogic rpm for dependency).

                $ rpm -qpR MarkLogic-8.0-4.2.x86_64.rpm 
                lsb 
                gdb 
                libc.so.6(GLIBC_2.11)(64bit) 
                libgcc_s.so.1()(64bit) 
                libstdc++.so.6()(64bit) 
                libc.so.6(GLIBC_2.11) 
                cyrus-sasl 
                /bin/sh 
                /bin/sh 
                rpmlib(PayloadFilesHavePrefix) <= 4.0-1
                rpmlib(CompressedFileNames) <= 3.0.4-1
                rpmlib(PayloadIsXz) <= 5.2-1

                After Installation Dynamic Library Load for MarkLogic binary on Test Platform

                $ pwd
                /opt/MarkLogic/bin

                $ ldd MarkLogic | grep libc.so
                libc.so.6 => /lib64/libc.so.6 (0x000000316aa00000)

                $ ls -al /lib/libc.so.6 
                lrwxrwxrwx. 1 root root 12 Oct 28 2014 /lib/libc.so.6 -> libc-2.12.so 

                 

                2) glibc library Vulnerability (CVE-2015-7547)

                The code that causes the vulnerability was introduced in May 2008 as part of glibc 2.9, and only present in glibc's copy of libresolv which has enhancements to carry out parallel A and AAAA queries. Therefore only programs using glibc's copy of the code have this problem.

                Please read further at - https://sourceware.org/ml/libc-alpha/2016-02/msg00416.html

                 

                3) Patch for Red Hat Enterprise Linux 6 & 7 

                This issue does not affect the versions of glibc as shipped with Red Hat Enterprise Linux 3, 4 and 5.
                For Red Hat Enterprise Linux version 6 & 7, Red Hat has made latest packages with fix available as of - 02/16/2016 (below url)
                https://access.redhat.com/security/cve/cve-2015-7547

                 

                Related Reading

                GHOST: glibc vulnerability (CVE-2015-0235) - https://access.redhat.com/articles/1332213

                US-CERT: https://www.us-cert.gov/ncas/current-activity/2016/02/17/GNU-glibc-Vulnerability

                Introduction

                MarkLogic stores Certificate files in security database. All user created Security files are stored along with template ID in Security Database.

                For example, new signed Certificate installed will be stored as uri -http://marklogic.com/xdmp/pki/certificates/160051481396114827.xml and it will have  template id value in it (<pki:template-id>13176215136521847243 </pki:template-id>)

                Reference for template ID is also stored in groups.xml of that App Server config file when Cert template is attached to a specific App Server.

                Template Id is only configuration value which has two way reference, one to value stored in groups.xml config file and other is value inside Security DB Cert URL document.

                Problem Statement

                When security database is restored, it replaces existing Certificate files in Security Database along with reference for old Template ID. Now, if Template ID is still referenced by any AppServer, previous SSL App Server which never detached Cert template prior to Security DB restore, then ‘groups.xml’ file will still have reference to nonexistence Template ID.  

                In that scenario, user will receive an HTTP 500 Internal server error. 

                500: Internal Server Error ADMIN-BADCERTTEMPLATE: (err:FOER0000) '18321675798544961903' is not a valid certificate template id In /MarkLogic/admin.xqy on line 15197 In validate-certificate-template-id("18321675798544961903", <xs:element name="ssl-certificate-template" type="ssl-certificate..." .../>) $value = "18321675798544961903" $typ = <xs:element name="ssl-certificate-template" type="ssl-certificate..." .../> $id = xs:unsignedLong("18321675798544961903") $template = ()

                How to avoid the situation from occurring?

                Best path is to remove all App Server to Template Id association by going through each AppServers before any Security Database restored. Once Security Database restore is done, AppServer to new Templates association based on restored Security can be done again to enable SSL for App Server.

                How to recover? 

                Workaround for this, will be to stop MarkLogic Service and remove Template ID from Config files as well. groups.xml Config file is located at /var/opt/MarkLogic/config.xml location,  and lingering Template ID can be found under App Server <ssl-certificate-template> tag which needs to be removed.

                Please follow below steps to replace the groups.xml on cluster. 

                1. Stop the cluster ->Stop service on each host, starting by bootstrap host first and then stop service on all other hosts(Ex: as root user to stop MarkLogic service ("$/sbin/service MarkLogic stop")
                2. Go to groups.xml, located in /var/opt/MarkLogic folder -> You can move existing groups.xml file to /tmp/groups.xml.
                3. Set the template to zero for all matching lines for <ssl-certificate-template>

                         <ssl-certificate-template>0</ssl-certificate-template>

                1. Restart MarkLogic -> Restart service, starting with bootstrap host.
                2. You can enable App Servers with SSL again through Admin GUI (Admin API) again with available Templates.

                In latest version of MarkLogic, Warning message can be found about missing certificate template ID in Config file. However, there is further work that is still in progress to avoid issue from occurring all together, which requires certain redesign.

                Related MarkLogic Documentation

                Configuring SSL on App Servers

                Restoring Security Database

                Introduction

                MarkLogic's semantic stack is both powerful and feature rich. This knowledgebase article is a curated collection of links to all our best semantic stack related materials.

                Data models

                All of the features in MarkLogic’s semantic stack (SPARQL, SQL, TDE, Optic API, etc.) are based on semantic triples. Be aware that triples in MarkLogic Server can be either unmanaged triples contained in a document (aka “embedded” triples) or managed triples, which are not embedded in any document. Most use cases would be best served by using unmanaged or embedded triples. You can learn more about data models in the MarkLogic semantic stack and the benefits of the embedded triples approach at:

                When to use which semantic interface?

                See our knowledgebase article - Optic, Search, SQL, or SPARQL - when should I use which interface?

                Common issues

                When people run into trouble with MarkLogic’s semantic stack, it’s almost always related to performance. The very best way to improve your performance is to constrain the amount of work and scope of your requests - but there are other common best practices as well. You can find those best practices in our Best Practices for Using MarkLogic Semantics at Scale knowledgebase article.

                Can’t miss debugging tools

                See our knowledgebase article - What debugging tools are available for Optic, SQL, or SPARQL code in MarkLogic Server?

                Maintenance tips

                Your initial TDE views are unlikely to be your final TDE views - you'll likely need to change or update them as your database and application requirements mature. You can find best practices around updating TDE views at our knowledgebase article - Updating a TDE View 

                Tips for users of BI Tools

                For critical tips and best practices around how to best use MarkLogic Server with your favorite BI Tools, please consult our knowledgebase article - Best Practices for Using BI Tools with MarkLogic

                Introduction

                This article discusses the capabilities of JavaScript and XQuery, and the use of JSON and XML, in MarkLogic Server, and when to use one vs the other.

                Details

                Can I do everything in JavaScript that I can do in XQuery? And vice-versa?

                Yes, eventually. Server-side JavaScript builds upon the same C++ foundation that the XQuery runtime uses. MarkLogic 8.0-1 provides bindings for just about every one of the hundreds of built-ins. In addition, it provides wrappers to allow JavaScript developers to work with JSON instead of XML for options parameters and return values. In the very few places where XQuery functionality is not available in JavaScript you can always drop into XQuery with xdmp.xqueryEval(...).

                When should I use XQuery vs JavaScript? XML vs JSON? When shouldn’t I use one or the other?

                This decision will likely depend on skills and aspirations of your development team more than the actual capabilities of XML vs JSON or XQuery vs JavaScript. You should also consider the type of data that you’re managing. If you receive the data in XML, it might be more straightforward to keep the data in its original format, even if you’re accessing it from JavaScript.

                JSON

                JSON is best for representing data structures and object serialization. It maps closely to the data structures in many programming languages. If your application communicates directly with a modern browser app, it’s likely that you’ll need to consume and produce JSON.

                XML

                XML is ideal for mark-up and human text. XML provides built-in semantics for declaring human language (xml:lang) that MarkLogic uses to provide language-specific indexing. XML also supports mixed content (e.g., text with intermingled mark-up), allowing you to "embed" structures into the flow of text.

                Triples

                Triples are best for representing atomic facts and relationships. MarkLogic indexes triples embedded in either XML or JSON documents, for example to capture metadata within a document.

                JavaScript

                JavaScript is the most natural language to work with JSON data. However, MarkLogic’s JavaScript environment also provides tools for working with XML. NodeBuilder provides a pure JavaScript interface for constructing XML nodes.

                XQuery

                XQuery can also work with JSON. MarkLogic 8 extends the XQuery and XPath Data Model (XDM) with new JSON node tests: object-node(), array-node(), number-node(), boolean-node(), and null-node(). One implication of this is that you can use XPath on JSON nodes just like you would with XML. XML nodes also implement a DOM interface for traversal and read-only access.

                Summary

                If you’re working with data that is already XML or you need to model rich text and mark-up, an XML-centric workflow is the best choice. If you’re working with JSON, for example, coming from the browser, or you need to model typed data structures, JSON is probably your best choice.

                 

                 

                Introduction

                This article discusses how JavaScript is implemented in MarkLogic Server, and how can modules be reused?

                Is Node.js embedded in the server?

                MarkLogic 8 embeds Google's V8 JavaScript engine, just like Node.js does, but not Node.js itself. Both environments use JavaScript and share the core set of types, functions, and objects that are defined in the language. However, they provide completely different contexts.

                Can I reuse code written for Node in Server-Side JavaScript?

                Not all JavaScript that runs in the browser will work in Node.js; Similarly, not all JavaScript that runs in Node.js will work in MarkLogic. JavaScript that doesn’t depend on the specific environment is portable between MarkLogic, Node.js, and even the browser.

                For example, the utility lodash library can run in any environment because it only depends on features of JavaScript, not the particular environment in which it’s running.

                Conversely, Node’s HTTP library is not available in MarkLogic because that library is particular to JavaScript running in Node.js, not built-in to the language. (To get the body of an HTTP request in MarkLogic, for example, you’d use the xdmp.getRequestBody() function, part of MarkLogic’s built-in HTTP server library.) If you’re looking to use Node with MarkLogic, we provide a full-featured, open-source client API.

                Will you allow npm modules on MarkLogic?

                JavaScript libraries that don’t depend on Node.js should work just fine, but you cannot use npm directly today to manage server-side JavaScript modules in MarkLogic. (This is something that we’re looking at for a future release.)

                To use external JavaScript libraries in MarkLogic, you need to copy the source to a directory under an app server’s modules root and point to them with a require() invocation in the importing module.

                What can you import?

                JavaScript modules

                Server-side JavaScript in MarkLogic implements a module system similar to CommonJS. A library module exports its public types, variables, and functions. A main module requires a library module, binding the exported types, variables, and functions to local “namespace” global variables. The syntax is very similar to the way Node.js manages modules. One key difference is that modules are only scoped for a single request and do not maintain state beyond that request. In Node, if you change the state of a module export, that change is reflected globally for the life of the application. In MarkLogic, it’s possible to change the state of a library module, but that state will only exist in the scope of a single request.

                For example:

                // *********************************************
                // X.sjs

                module.exports.blah = function() {
                    return "Not Math.random";
                }

                // *********************************************
                // B.sjs

                var x = require("X.sjs");

                function bTest() {
                    return x.blah === Math.random;
                }

                module.exports.test = bTest;

                // *********************************************
                // A.sjs

                var x = require("X.sjs");
                var b = require("B.sjs");

                x.blah = Math.random;

                b.test();

                // *********************************************
                // A-prime.sjs

                var x = require("X.sjs");
                var b = require("B.sjs");

                b.test();

                Invoking A.sjs returns true, but subsequently invoking A-prime.sjs still returns false.


                XQuery modules

                MarkLogic also allows server-side JavaScript modules to import library modules written in XQuery and call the exported variables and functions as if they were JavaScript.

                 

                Introduction

                The notion of "flipping" back control (from failed-over replica forest back to the master forest) has been covered in previous Knowledgebase articles:

                https://help.marklogic.com/Knowledgebase/Article/View/427/0/scripting-failover-flipping-replica-forests-back-to-their-masters-using-xquery

                In this Knowledgebase article, we will discuss the pros and cons of leaving failed over forests as they are.  Should control be returned to the master forests after a failover event?

                Best Practices

                Can it be considered good practice to leave forests in their failed-over state?

                As long as the original configured master shows that it is in sync replicating state in the database status page, you know it's still ready to take over in the event that the configured replica (acting master) fails at a later time; this means that High Availability is still preserved across the cluster in spite of a prior failover event having taken place.

                In summary, the main reasons to fail back the forests to their initial configured state are as follows:

                • Your operating state will match your configured state, which could avoid surprises if you make assumptions based on configuration or naming of forests (e.g. someone somewhere may assume that forest-001-r is a replica forest and not check whether it is currently acting master due to a failover event that took place some time in the past). This is especially important if your team does not maintain a runbook for your MarkLogic cluster.
                  • Additionally, if you restart your cluster in a failed-over state, the configured masters will take over again, so your running state will be different before and after a restart, which could complicate diagnosis of any problems you may have involving the restart (e.g. if the restart was in response to a problem, or if a problem surfaces after restart)
                • Both master and replica forests can process updates, although only master forests can process queries.  Presumably you sized your cluster and distributed your forests to spread the load; if you're in a failed over state, then the load is likely to be uneven across hosts in your cluster and you probably want to get back to that even load by failing those forests back to their respective masters.
                • There are likely to be implications with backup / restore if you have an unusual distribution of master/ acting master (replica) forests that could cause further work for you.  These issues are covered in the following Knowledgebase articles:

                Conclusion

                In the event of a forest failover, as long as your previous master forests are in their (expected) sync replicating state, the risk of leaving the forest in a failed over state is minimal; any disturbance that takes the active master forest offline (such as a forest restart) will cause failover to happen again so you still continue to have High Availability

                However, forest failover can be indicative of a larger symptom: a particular host that appears to be encountering issues for any number of possible reasons.  Keeping track of when forests fail over for a given host can be a useful first line of enquiry into a system that is showing early warning signs of a problem.

                From the perspective of system management, flipping failed-over forests back to their respective masters could be considered as part of an ongoing approach to managing and maintaining general cluster health.  

                In the event of a failover, if the failover details are logged, the forests are failed back to their respective masters, subsequent failover events should become more apparent at a glance; it's easy to quickly review the status tab of a given database to confirm that all the master forests are in their open state (with their replica forests all sync replicating).

                Adopting a policy of logging what happened and resolving the issue by failing the forests back makes the procedure of managing a failover an event that gets triaged and in the longer run will make future events easier to spot and - potentially - could provide data to give you advance warning of an inherent issue involving a given host in your cluster.

                SUMMARY

                Some MarkLogic Server sites are intalled in a 1GB network environment. At some point, your cluster growth may require an upgrade to 10GB ethernet. Here are some hints for knowing when to migrate up to 10GB ethernet, as well as some ways to work around it prior to making the move to 10GB.

                General Approach

                A good way to check if you need more network bandwidth is to monitor the network packet retransmission rate on each host.  To do this, use the "sar -n EDEV 5" shell command. [For best results, make sure you have an updated version of sar]

                Sample results:

                # sar -n EDEV 5 3
                ... 10:41:44 AM IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s 10:41:49 AM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 10:41:49 AM eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 10:41:49 AM IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s 10:41:54 AM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 10:41:54 AM eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 10:41:54 AM IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s 10:41:59 AM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 10:41:59 AM eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s Average: lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00


                Explanation of terms:

                FIELDDESCRIPTION
                IFACE LAN interface
                rxerr/s Bad packets received per second
                txerr/s Bad packets transmitted per second
                coll/s Collisions per second
                rxdrop/s Received packets dropped per second because buffers were full
                txdrop/s Transmitted packets dropped per second because buffers were full
                txcarr/s Carrier errors per second while transmitting packets
                rxfram/s Frame alignment errors on received packets per second
                rxfifo/s FIFO overrun errors per second on received packets
                txfifo/s FIFO overrun errors per second on transmitted packets

                If the value of txerr/s and txcarr/s is none zero, that means that the packets sent by this host are being dropped over the network, and that this host needs to retransmit.  By default, a host will wait for 200ms to see if there is an acknowledgment packet before taking this retransmission step. This delay is significant for MarkLogic Server and will factor into overall cluster performance.  You may use this as an indicator to see that it's time to upgrade (or, debug) your network. 

                Other Considerations

                10 gigabit ethernet requires special cables.  These cables are expensive, and easy to break.  If a cable is just slightly bent improperly, you will not get 10 gigabit ethernet out of it. So be sure to work with your IT department to insure that everything is installed as per the manufaturer specification. Once installed, double-check that you are actually getting 10GB from the installed network.

                Another option is to use bonded ethernet to increase network bandwidth from 1GB to 2GB and to 4GB prior to jumping to 10GB.  A description of Bonded ethernet lies beyond the scope of this article, but your IT department should be familiar with it and be able to help you set it up.

                 

                The purpose of High Availability (HA) in MarkLogic Server (through Local Disk Failover) and Disaster Recovery (DR, through Database Replication) is to make sure you have multiple copies of the same data available so that if one copy is lost, you will have other copies upon which to fall back. 

                Consider the example of a cluster with Local Disk Failover (LDF) and Database Replication (DR) configured:

                • You have 5 copies of data available for each data forest:
                  • Primary cluster: 2 copies (1 primary forest + 1 LDF forest)
                  • Replica cluster: 2 copies (1 primary forest + 1 LDF forest)
                  • Backup: 1 copy
                • Specifically, on the primary cluster, forest A (which is a primary data forest - 1st copy) has a local disk failover forest (2nd copy) for high availability. The replica cluster will have its own data forest (3rd copy) for disaster recovery and a local disk failover forest (4th copy) for HA in your DR environment. Backups represent a 5th copy of your forest.

                While it is important to make use of features like LDF and DR to achieve high-availability/disaster-recovery through having multiple copies of data, it might be more important to consider how all these copies are stored. In the above example, even though there are 2 copies of data available in the primary cluster (3 copies if you include the backup), if all of these copies are written to the same storage and that single storage environment fails, all of those copies are lost. The best practice in terms of both HA and DR is to ensure that each of your forest's copies lives on its own dedicated storage.

                Introduction

                The performance and resource consumption of E-nodes is determined by the kind of queries executed in addtion to the distribution and amount of data. For example, if there are 4 forests in the cluster and the query is asking for only the top-10 results, then the E-node would receive a total of 4 x 10 results in order to determine the top-10 among these 40. If there are 8 forests, then the E-node would have to sort through 8 x 10 results.

                Performance Test for Sizing E-Nodes:

                To size E-nodes, it’s best to determine first how much workload a single E-node can handle, and then scale up accordingly.

                Set up your performance test so it is at scale and so that it only talks to a single E-node. Start the Application Server settings with something like

                • threads = 32
                • backlog = 512
                • keep alive = 0

                Crank up the number of threads for the test from low to high, and observe the amount of resources being used on the E-node (cpu, memory, network). Measure both response time and throughput during these tests.

                • When the number of threads are low, you should be getting the best response time. This is what the end user would experience when the site is not busy.
                • When the number of threads are high, you will see longer response time, but you should be getting more throughput.

                As you increase the number of threads, you will eventually run out of resources on the E-node - most likely memory. The idea is to identify the number of active threads when the system's memory is exceeded, because that is the maximum number of threads that your E-node can handle.

                Addtitional Tuning of E-nodes

                Thrashing

                • If you notice thrashing before MarkLogic is able to reach a  memory consumption equilibrium, you will need to continue decreasing the threads so that the RAM/thread ratio is near the 'pmap total memory'/thread
                • The backlog setting can be used to queue up requests w/o chewing up significant resources.
                • Adjusting backlog along with some of the timeout settings might give a reasonable user experience comparable to, or even better than, what you may see with high thread counts. 

                As you continue to decrease the thread count and make other adjustments, the mean time to failure will likely increase until the settings are such that equilibrium is reached before all the memory resources are consumed - at which time we do not expect to see any additional memory failures.

                Swap, RAM & Cache for E-nodes

                • Make sure that the E-nodes have swap space equal to the size of RAM (if the node has less than 32GB of RAM) or 32 GB (if the node has 32GB or more of RAM)
                • For E-nodes, you can minimize the List Cache and Compressed Tree Cache  - set to 1GB each - in your group level configurations.
                • Your Expanded Tree Cache (group level parameter) should be at least equal to 1/8 of RAM, but you can further increase the Expanded Tree Cache so that all three caches (List, Compressed, Expanded) in combination are up to 1/3 of RAM.
                • Another important group configuration parameter is Expanded Tree Cache Partitions.  A good starting point is 2-3 GB per partition, but is should not be more than 12 GB per partition. The greater the number of partitions, the greater the capacity of handling concurrent query loads.

                Growing your Cluster

                As your application, data and usage changes over time, it is important to periodically revisit your cluster sizings and re-run your performance tests.

                 

                Introduction

                This article is intended to address the impact of AWS deprecation of Python 3.6 (Lambda runtime dependency) and Classic Load Balancer (CLB) on MarkLogic Cloud Formation Templates (CFT).

                Background

                AWS announced deprecation of Python 3.6 and Classic Load Balancer (CLB).

                For Python 3.6, please refer to 'Runtime end of support dates'.
                For Classic Load Balancer, please refer to 'Migrate Classic Load Balancer'.

                MarkLogic 10 provided CFTs prior to 10.0-9.2 are impacted by the python 3.6 deprecation as MarkLogic uses custom lambdas. CFTs prior to 10.0-9.2 are also impacted by the CLB deprecation since the MarkLogic single-host deployment uses CLB.

                Solutions

                1. Upgrade to latest MarkLogic CFT templates:

                Starting with release of 10.0-9.2, MarkLogic CFT uses python 3.9 and has removed CLB for single-host deployments.

                The fully-qualified domain name (FQDN) of the node is based on internal IP address from the persistent reusable ENI. In single-host cluster without CLB, the FQDN for the node is referenced in the list of outputs as the endpoint to access Admin UI. For example, http://ip-10.x.x.x.ap-southeast-2.compute.internal:8001.

                For a single-host cluster in a private subnet, client residing in public domain will not be able to connect to single host directly. Your AWS Administrator will be required to set up a bastion host (jump box) or a reverse proxy, which acts as an addressable middle-tier to route traffic to the MarkLogic host. Alternatively, your Administrator can assign an Elastic IP to single-host which makes the host publicly accessible.

                2. Running with MarkLogic prior to 10.0-9.2

                2.1: Modify MarkLogic's most current CFT.

                You can use the latest version of the MarkLogic CFT, and then change the MarkLogic AMI version inside that CFT to refer to specific prior version of MarkLogic AMI.

                2.2: Customized CFT (derived from MarkLogic CFT but with specific modification).

                You can modify your copy of template to upgrade to Python 3.9 and remove the use of CLB.

                a) To upgrade the Python changes: Please refer to the custom lambda templates (ml-managedeni.template, ml-nodemanger.template) and search for "python3.6" and replace it with "python3.9".

                Format to build the URL: https://marklogic-db-template-releases.s3.<<AWS region>>.amazonaws.com/<<ml-version>>/ml-nodemanager.template

                Download v10.0-7.1 custom lambda templates for upgrade using below links:

                https://marklogic-db-template-releases.s3.us-west-2.amazonaws.com/10.0-7.1/ml-managedeni.template

                https://marklogic-db-template-releases.s3.us-west-2.amazonaws.com/10.0-7.1/ml-nodemanager.template

                 

                After the changes are done, the modified templates should be uploaded to the s3 bucket. Also, the 'TemplateURL' should be updated in the main CFTs (mlcluster-vpc.template, mlcluster.template) under 'Resources' -> ManagedEniStack, 'Resources' -> NodeMgrLambdaStack.

                 

                b) To remove the CLB changes: Please refer to the latest CFT version (mlcluster-vpc.template, mlcluster.template) and compare/modify the templates accordingly.

                c) To upgrade the Python version existing old stack without redeployment: Please navigate to the AWS Lambdas console (Lambda->Functions->ActivityManager-Dev->Edit runtime setting) and update the runtime to use "Python 3.9".

                AWS deprecation does not impact already deployed stack, since the Lambda functions are created during service creation (and only deleted when the service is terminated). Similarly, updating the cluster capacity does not have impact on existing deployed stack.

                MarkLogic Cloud Services (DHS)

                The issue is already addressed by the MarkLogic Cloud Services team with an upgrade of underlying dependency to "Python 3.9".

                MarkLogic 9

                Please Note that this Knowledgebase article refers to MarkLogic 10 Cloud Formation Template changes alone. For MarkLogic 9 Cloud Formation templates, work on recommended Solutions is still in progress. 

                References

                1. MarkLogic 10.0-9.2 Release Notes Addendum
                2. Latest MarkLogic CFT

                The recommended way to run MarkLogic on AWS is to use the "managed" Cloud Formation template provided by MarkLogic:

                https://developer.marklogic.com/products/cloud/aws

                The documentation for it is here:

                https://docs.marklogic.com/guide/ec2/CloudFormation

                By default, the MarkLogic nodes are hidden in Private Subnets of a VPC and the only way to access them from the Internet is via the Elastic Load Balancer.

                This is optimal as it distributed the load and shields from common attack vectors.

                However, for some types of maintenance it may be useful, or even necessary to SSH directly into individual MarkLogic nodes.

                Examples where this is necessary:

                1. Configuring Huge Pages size so that it is correct for the instance size/amount of RAM: https://help.marklogic.com/Knowledgebase/Article/View/420/0/group-level-cache-settings-based-on-ram

                2. Manual MarkLogic upgrade where a new AMI is not yet available (for example for emergency hotfix): https://help.marklogic.com/Knowledgebase/Article/View/561/0/manual-upgrade-for-marklogic-aws-ami

                 


                To enable SSH access to MarkLogic nodes you need to:

                I. Create an intermediate EC2 host, commonly known as 'bastion' or 'jump' host.

                II. Put it in the correct VPC and correct (public) subnet and ensure that it has public / Internet-facing IP address

                III. Adjust security settings so that SSH connections to bastion host as well SSH connection from bastion to MarkLogic nodes are allowed and launch the bastion instance.

                IV. Additionally, you will need to configure SSH key forwarding or a similar solution so that you don't need to store your private key on the bastion host.

                I. Creating the EC2 instance in AWS Console:

                1. The EC2 instance needs to be in the same region as the MarkLogic Cluster so the starting console URL will be something like this (depending on the region and your account):

                https://eu-west-1.console.aws.amazon.com/ec2/home?region=eu-west-1#LaunchInstanceWizard:

                2. The instance OS can be any Linux of your choice and the default Amazon Linux 2 AMI is fine for this. For most scenarios the jump host does not need to be powerful so any OS that is free tier eligible is recommended:

                Step1-AMI.png

                3.Choose instance size. For most scenarios (including SSH for admin access), the free tier t2.micro is the most cost-effective instance:

                Step2-Instance-type.png

                4. Don't launch the instance just yet - go to Step 3 of the Launch Wizard ("Step 3: Configure Instance Details").

                II. Put the bastion host in the correct VPC and subnet and configure public IP:

                The crucial steps here are:

                1. Choose the same VPC that your cluster is in. You can find the correct VPC by reviewing the resources under the Cloud Formation template section of the AWS console or by checking the details of the MarkLogic EC2 nodes.

                2. Choose the correct subnet - you should navigate to the VPC section of the AWS Console, and see which of the subnets of the MarkLogic Cluster has an Internet Gateway in its route table.

                3. Ensure that "Auto-assign Public IP" setting is set to "enable" - this will automatically configure a number of AWS settings so that you won't have to assign Elastic IP, routing etc. manually.

                4.Ensure that you have sufficient IAM permissions to be able to create the EC2 instance and update security rules (to allow SSH traffic)

                Step3-instance-details.png

                III. Configure security settings so that SSH connections are allowed and launch:

                1. Go to "Step 6: Configure Security Group" of the AWS Launch Wizard. By default, AWS will suggest creating "launch" security group that opens SSH incoming to any IP address. You can adjust as necessary to allow only a certain IP address range, for example.

                Step6-security.png

                Additionally, you may need to review the security group setting for your MarkLogic cluster so that SSH connections from bastion host are allowed.

                2.Go to "Step 7: Review Instance Launch" and press "Launch". At this step you need to choose a correct SSH key pair for the region or create a new one. You will need this SSH key to connect to the bastion host.

                ssh-keypair.png

                3. Once the EC2 instance launches, review its details to find out the public IP address.

                instance-publicIP.png

                IV. Configure SSH key forwarding so that you don't have permanently store your private SSH on the bastion host. Please review your options and alternatives here (for example using ProxyCommand) as key forwarding temporarily stores the private key on the bastion host, so anyone with root access to the bastion host could hijack your MarkLogic private key (when logged in at the same time as you).

                1. Add the private key, to SSH agent:

                ssh-add -K myPrivateKey.pem

                2. Test the connection (with SSH agent forwarding) to the bastion host using:

                ssh -A ec2-user@<bastion-IP-address>

                3. Once you're connected ssh from the bastion to a MarkLogic node:

                ssh ec2-user@<MarkLogic-instance-IP-address or DNS-entry>

                ssh-verify.png

                For strictly AWS infrastructure issues (VPC, subnets, security groups) please contact AWS support. For any MarkLogic related issues please contact MarkLogic support via:

                help.marklogic.com

                Introduction

                We discuss why MarkLogic server should be started with root priviledges.

                Details

                It is possible to install MarkLogic Server in a directory that does not require root priviledges.

                There's also a section in our Installation Guide (Configuring MarkLogic Server on UNIX Systems to Run as a Non-daemon User) that talks at some length about how to run MarkLogic Server as a user other than daemon on UNIX systems. While that will allow you to configure permissions for non-root and non-daemon users in terms of file ownership and actual runtime, you'll still want to be the root user to start and stop the server.

                It is possible to start MarkLogic without su privileges, but this is strongly discouraged.

                The parent (root) MarkLogic process is simply a restarter process. It is there simply to wait for the non-root process to exit, and if the non-root process exits abnormally for some reason, the root process will fork and exec another non-root process. The root process runs no XQuery scripts, opens no sockets, and accesses no database files.

                We strongly recommend to start MarkLogic as root and let it switch to the non-root user on its own. When the server initializes, if it is root it makes some privileged kernel calls to configure sockets, memory, and threads. For example, it allocates huge pages if any are available, increases the number of file descriptors it can use, binds any configured low-numbered socket ports, and requests the capability to run some of its threads at high priority. MarkLogic Server will function if it isn’t started as root, but it will not perform as well.

                You can work around the root-user requirements for starting/stopping (and even installation/uninstallation) by creating wrapper scripts that call the appropriate script (startup, shutdown, etc.), providing sudo privileges to just the wrapper.  This helps to control and debug execution.

                Further reading

                Knowledgebase - Pitfalls Running Marklogic Process as Non-root User 

                Summary

                When attempting to send email from MarkLogic, from Ops Director, Query Console, or other application, you might encounter one of the following errors in your MarkLogic Server Error Log, or in the Query Console results pane.

                • Error sending mail: STARTTLS: 502 5.5.1 Error: command not implemented
                • Error sending mail: STARTTLS: 554 5.7.3 Unable to initialize security subsystem

                This article will help explain what these errors mean, as well as provide some ways to resolve it.

                What these Errors Mean

                These errors indicate that MarkLogic is attempting to send an SMTPS email through the relay, and the relay either does support SMTPS, or SMTPS has not been configured correctly.

                Resolving the Error

                One possible cause of this error is when the smtp relay setting for MarkLogic server is set to localhost.  The error can be resolved by using the Admin Interface to update the smtp relay setting with the organizational SMTP host or relay.  That setting can be found under Configure --> Groups --> [GroupName]: Configure tab, then search for 'smtp relay'.

                If this error occurs when testing the Email configuration for Ops Director, you can configure Ops Director to use SMTP instead of SMTPS by ensuring the Login and Password fields are blank.  These fields can be found under Console Settings --> Email Configuration in the Ops Director application. (Note: The Ops Director feature has been deprecated with MarkLogic 10.0-5.)

                Alternatively, install/configure an SMTP server with SMTPS support.

                Related Reading

                https://en.wikipedia.org/wiki/SMTPS

                https://www.f5.com/services/resources/deployment-guides/smtp-servers-big-ip-v114-ltm-afm

                Summary

                Quorum is used to either evict or keep a node in a cluster but is quorum required even while starting my cluster?

                What is Quorum?

                Each node in a cluster communicates with all of the other nodes in the cluster at periodic intervals. This periodic communication, known as a heartbeat, circulates key information about host status and availability between the nodes in a cluster. The cluster uses the heartbeat to determine if a node in the cluster is unavailable. This determination is based on a vote from each node in the cluster, based on each node's view of the current state of the cluster. To vote a node out of the cluster, there must be a quorum of nodes voting to remove a node. A quorum occurs if more than 50% of the total number of nodes in the cluster (including any nodes that are down) vote the same way.

                Depending on cluster configuration, this quorum may or may not be required even during startup of a cluster.

                On a cluster without forest level failover configured, No quorum is required to bring up the admin UI. If one brings up the server hosting the Security (Schemas and Modules) database then you can access the admin UI.

                On a cluster with shared disk failover configured, No quorum is required to bring up the admin UI. If one brings up the server hosting the Security (Schemas and Modules) database then you can access the admin UI.

                On a cluster with local disk failover configured, a quorum is required prior to starting operations (e.g. accessing Admin UI). If you do not have quorum, then the MarkLogic admin will have to perform some intervention to bring up the required number of hosts. In case of a power outage, it is expected that all hosts will be powered up simultaneously. The server is designed to handle this well, so there is no need to serialize server startup and in fact we would prefer a simultaneous startup of all hosts in a cluster. If there is any reason for wanting to serialize server startup (such as not wanting to overwhelm the SAN), this is OK too, just be aware that normal cluster operation will start at the point where you have a quorum.

                What type of nodes count towards Quorum?

                All nodes in the cluster count towards quorum irrespective of the group they belong to, the type of node it is (E-node, D-node, E/D-node, etc.) or whether the node is up or down. Also, quorum is by the cluster, therefore forest, database or group configurations are also irrelevant when considering what kind of nodes participate in quorum.

                Why do we need to achieve Quorum of more than 50%?  Understanding network partitioning, or the "split brain" problem

                For failover to occur, you must have a quorum of participant nodes (defined as "n/2 + 1"). This is what protects you against any risk of network partitioning; if a node can't communicate with more than half the hosts in a cluster, it will be unable to tell whether it's on the losing side of a network partition.  

                If you were to try to put N hosts in one data center and N hosts in another data center, neither one would be able to determine that it is the surviving data center in the event of a network problem. If you were to try to create a cluster that spans multiple data centers, you'd want at least one more machine in a 3rd location that the two data centers would use to break the tie.

                Read more on network partitioning at: https://en.wikipedia.org/wiki/Split-brain_(computing)

                Introduction

                Stemming is handled differently between a word-query and value-query; a value-query only indexes using basic stemming.

                Discussion

                A word may have more than one stem. For example,

                cts:stem ('placing')

                returns

                place
                placing

                To see how this works with a word-query we can use xdmp:plan. Running

                xdmp:plan (cts:search (/, cts:word-query ('placing')))

                on a database with basic stemming returns

                <qry:final-plan>
                <qry:and-query>
                <qry:term-query weight="1">
                <qry:key>17061320528361807541</qry:key>
                <qry:annotation>word("placing")</qry:annotation>
                </qry:term-query>
                </qry:and-query>
                </qry:final-plan>

                Since basic stemming uses only the first/shortest stem, this is searching just for the stem 'place'.

                Searching with

                cts:search (/, cts:word-query ('placing'))

                will match 'a place of my own' ('placing' and 'place' both stem to 'place') but not 'new placings' ('placings' stems to just 'placing').

                However, on a database with advanced stemming the plan is

                <qry:final-plan>
                <qry:and-query>
                <qry:or-two-queries>
                <qry:term-query weight="1">
                <qry:key>17061320528361807541</qry:key>
                <qry:annotation>word("placing")</qry:annotation>
                </qry:term-query>
                <qry:term-query weight="1">
                <qry:key>17769756368104569500</qry:key>
                <qry:annotation>word("placing")</qry:annotation>
                </qry:term-query>
                </qry:or-two-queries>
                </qry:and-query>
                </qry:final-plan>

                Here you can see that there are two term queries OR-ed together (note the two different key values). The result is that the same cts:word-query('placing') now also matches 'new placings' because it queries using both stems for 'placing' ('place' and 'placing') and so matches the stemmed version of 'placings' ('placing').

                However, a search with

                cts:element-value-query(xs:QName('title'), 'new placing')

                returns

                <qry:final-plan>
                <qry:and-query>
                <qry:term-query weight="1">
                <qry:key>10377808623468699463</qry:key>
                <qry:annotation>element(title,value("new","placing"))</qry:annotation>
                </qry:term-query>
                </qry:and-query>
                </qry:final-plan>

                whether the database has basic or advanced stemming, showing that multiple stems are not used.

                The reason for this is that MarkLogic will only do basic stemming when indexing the keys for a value. Therefore there is a single key for the value.  If MarkLogic Server were designed to support multiple stems for values (which is does not), this would expand the indexes dramatically and slow down indexing, merging, and querying. Consider if each word had two stems, then there would be 2^N keys for N words. The size would grow exponentially for addtional stems. 

                More information on value-queries is available at Understanding Search: value queries.

                 

                Summary

                When an SSL certificate is expired or out of date, it is necessary to renew the SSL certificates applied to a MarkLogic application server.   

                The following general steps are required to apply an SSL certificate.  

                1. Create a certificate request for a server in MarkLogic
                2. Download certificate request and send it to certificate authority
                3. Import signed certificate into MarkLogic

                Detailed Steps

                Before proceeding, please note that you don't need to create a new template to renew an expired certificate as the existing template will work.

                1. Creating a certificate request - A fresh csr can be generated from the MarkLogic Admin UI by navigating to Security -> Certificate Templates -> click [your_template] -> click the request tab -> Select radio button applicable for an expired/out of date certificate case. For additional information, refer to the Generating and Downloading Certificate Requests section of our Security Guide.

                2. Download and send to certificate authority - The certificate template status page will display the newly generated request. You can download it and send it to your certificate authority for signing.

                3. Import signed certificate into MarkLogic - After receiving the signed certificate back from the certificate authority, you can import it from our Admin UI by navigating to Security-> Certificate Templates -> click [your_template] -> Import tab.  For additional information, refer to the Importing a Signed Certificate into MarkLogic Server section of our Security Guide

                4. Verify - To verify whether the certificate has been renewed, please look at the summary of your certificate authority. The newly added certificate should appear in certificate authority. Detailed instructions for this are available at Viewing Trusted Certificate Authorities

                If you are not able to view the certificate authority, then you may need to add the certificate as if it is a new CA. This can happen as if there was a change in CA certificate chain.

                • Click on the certificate template name and then import the certificate. You should already have this CA listed (as this was already there and only the certificate expired). However if there is a change in certificate authority then you will need to import it - you can do this by navigating in the Admin UI to Configure -> Security -> Certificate Authorities --> click on the import tab - this will be equivalent to adding a new CA certificate into MarkLogic. The CA certificate name will now appear in the list.

                 

                 

                 

                What is Sticky session and why is it needed?

                For client applications that use multi-statement transactions and interact with a MarkLogic Server cluster through a load balancer, it is possible for requests from your application to MarkLogic Server to be routed to different hosts, even within the same session.

                This has no effect on most interactions with MarkLogic Server, but operations that are part of the same multi-statement transaction need to be routed to the same host within your MarkLogic cluster. This consistent routing through a load balancer is called session affinity.

                How to configure Sticky Session?

                Most load balancers provide a mechanism that supports session affinity. This usually takes the form of a session cookie that originates on the load balancer. The client acquires the cookie from the load balancer, and passes it on any requests that belong to the session. The exact steps required to configure a load balancer to generate session cookies depends on the load balancer. Consult your load balancer documentation for details.

                For the applications using Java CLIENT API, configure HostId cookie in your requests to preserve session affinity. 

                For XCC applications, configure SessionID cookie to preserve session affinity.

                Configure these cookies on your Load balancer session affinity configuration for the specific port on which your application is communicating to MarkLogic cluster.

                Summary

                When running MarkLogic on AWS without using the Managed Cluster feature, a hostname warning may occur under certain circumstances.

                Customer Managed Clusters

                Customers who manage their own clusters in AWS use the /etc/marklogic.conf file to disable the MarkLogic Managed Cluster feature by setting. This is done by setting MARKLOGIC_EC2_HOST=0 to disable all of MarkLogics EC2 enhancements, or by setting MARKLOGIC_MANAGED_CLUSTER=0 to only disable the Managed Cluster feature. This should be done prior to starting MarkLogic for the first time on the host.

                AWS Configuration Variables

                SVC-SOCHN Errors

                When MarkLogic is started prior to /etc/marklogic.conf being put in place, it will populate /var/local/mlcmd.conf file with some default values, including a MARKLOGIC_HOSTNAME value that is based on the current instance hostname. If this volume is used on a new instance, it's possible to end up with a value for MARKLOGIC_HOSTNAME that will no longer resolve. This will result in the following error:

                2020-04-16 15:15:36.468 Warning: A valid hostname is required for proper functioning of MarkLogic Server: SVC-SOCHN: Socket hostname error: getaddrinfo ip-10-10-0-15.mydomain.com: Name or service not known

                The issue does not impact the functioning of the cluster.

                Resolving the Issue

                After verifying that /etc/marklogic.conf has the correct entries, remove the /var/local/mlcmd.conf file, and restart the MarkLogic service on the host.

                Further Reading

                Getting Started with MarkLogic Server on AWS

                Summary

                Some disk related errors, such as SVC-MAPINI, seen on MarkLogic Servers running on the Linux platform can sometimes be attributed to background services attempting to read or monitor MarkLogic data files.

                SVC-MAPINI Errors

                In some cases when background services are attempting to access MarkLogic data files, you may encounter an error similar to the following:

                SVC-MAPINI: Mapped file initialization error: open '/var/opt/MarkLogic/Forests/my-forest-02/0000145a/Timestamps': Operation not permitted
                

                The most common cause of this issue is Anti-Virus software.

                Resolution

                To avoid file access conflicts, MarkLogic recommends that all MarkLogic data files, typically /var/opt/MarkLogic/, be excluded from access by any background services, which includes AV software. As a general rule, ONLY MarkLogic Server should be maintaining MarkLogic Server data files. If those directories MUST be scanned, then MarkLogic should be shutdown, or the forests be fully quiesced, to prevent issues.

                Further Reading

                Introduction

                Locking can have serious performance implications in unoptimized application code. This knowledgebase article will list various tactics to determine if locking is the main driver for performance degradation of your application.

                If you would like to better understand locking in MarkLogic, please refer to the materials given below:

                Tactic 1:

                In MarkLogic 8 and earlier, enable the "Lock Trace" diagnostic trace event. The following KB article explains this strategy in detail: Understanding the "Lock Trace" diagnostic trace event

                Diagnostic trace events can be particularly useful in situations where you need access to more internal diagnostic information than is available in the standard MarkLogic ErrorLog.txt files or in the operating system logs. The host/cluster can be configured to output trace events, at which point the server will write information to the ErrorLog.txt files every time the diagnostic event is encountered.

                Please note that you should not be running normally with the “Lock Trace” trace event enabled. It is intended to be turned on for brief periods to analyze locking issues, then disabled.

                Tactic 2:

                From MarkLogic 9, instead of using a trace event to diagnose locking issues, you can use built-in functions.

                To find transactions that are holding and waiting for locks, run three nested loops:

                To find the request status for a host and transaction, run two nested loops:

                Which tactic is preferable?

                In general, trace events cause MarkLogic to incur the overhead of logging additional information, which may slow things down when turned on for extended periods or when trace events are especially verbose (as in the case of "Lock Trace"). The problem of verbose trace events is particularly acute in busy systems, where a large number of operations are in-flight at the same time, making it difficult to spot meaningful locking patterns. Consequently, tactic 2's use of built-in functions is much preferred, if you're working with MarkLogic 9 or later.

                Related knowledgebase articles

                KB - Gathering information to troubleshoot long-running queries

                Introduction

                This article will outline a general strategy for distributing a specific task across every node in a server.

                There are situations where you would like to execute queries against a number of hosts in a cluster - one such example would be to break a query down so it only operates on the forests on that particular node. Using the patterns described in this article, you will be able to build a mechanism to do just that.

                The problem

                Wouldn't it be useful if you could pass in options into xdmp:spawn() to allow the execution of code on a specific host in a cluster?

                While this has been filed as an RFE (2763) for consideration in a future release of the product, there are a few options open to you.

                From the top down

                1. Gather information about each host in your cluster

                For this you can use a call to xdmp:hosts(). This will give you a sequence of host ids - each corresponding with a node in your cluster. From here, you can get the xdmp:host-name() The snippet below demonstrates this:

                2. Create a call to an http endpoint on each host in a cluster

                We can build on the steps outlined in the first part to generate a list of URIs - each mapping to an endpoint (which would be serviced by a corresponding XQuery module to perform a particular task on that host). In the example below, we're using fn:concat() to generate the links for each host and then issuing a call to xdmp:document-get() to hit the same application server endpoint on each host.

                3. Isolate forests for a given host

                While the above technique might be useful for some purposes, you could allow for further precision by building a query which could operate exclusively on the forests managed by that node; using the technique above, this variation would allow you to "pre-screen" a databases forests to only operate against forests on that host:

                Summary

                This KB article has introduced some fairly simple patterns to allow you to programmatically direct requests to a particular host in a cluster. It also demonstrates a technique for preparing queries to operate at individual forest level.

                Such techniques can be useful for performing administrative tasks on an individual host, auditing the contents of an individual forest (or group of forests) and allow for even more flexibility when you consider bulk processing tools such as CoRB and XQSync - both of which allow you to select documents based on a custom query (which could be restricted by passing in a sequence of one or more forest ids).

                Additionally, as you have the ability to target a specific host in executing a task, you could also use the above techniques to write out a specific properties file to a writable partition on your system (such as /tmp) using a call to xdmp:save().

                Introduction

                This article discusses version differences of temporal documents in MarkLogic Server.

                Details

                MarkLogic Server 8 does not storing the "difference" for temporal documents.  Each version of the temporal document is a full document.

                At the time of this writing, MarkLogic Server does not provide any differencing tools to support diff/delta between versions of temporal documents.  It is possible to use tools external to MarkLogic Server to determine document differences.

                 

                 

                Introduction

                Interoperation of Temporal support with other MarkLogic features.

                Features that support Temporal collections

                MarkLogic’s Temporal feature is built-in to the server and is supported by many of MarkLogic’s power features: Search API, Semantics, Tiered Storage, and Flexibile Replication. Temporal queries can be written in either JSON or XQuery.

                Collections

                How are collections used to implement Temporal documents?

                Temporality is defined on a protected collection, known as a temporal collection. When a document is inserted into a temporal collection, a URI collection is created for that document. Additionally, the latest version of each document will reside in a latest collection.

                Why are collections used to group all revisions of a particular document vs storing it in the properties?

                This was done to avoid unnecessary fragmentation, enhance performance, and make best use of existing infrastructure.

                Does the Temporal implementation use the collection lexicon or just collections?

                It uses only collections. The collection lexicon can be turned on and utilized for applications.

                Won’t Temporal collections also be in the collection lexicon if the lexicon is enabled?

                Yes.

                See alsoTemporal, URI, and Latest Collections.

                Timezones

                The Temporal axes are based on standard MarkLogic dateTime range indexes.

                All timezone information is handled in the standard way, as for any other dateTime range index in MarkLogic.

                DLS (Library Services API)

                Temporal and DLS are aimed at solving different sorts of problems, so do not replace each other. They will coexist.

                Tiered Storage

                Temporal documents can be leveraged with our Tiered Storage capabilities.

                The typical use case is where companies will need to store years of historical information for various purposes regulations.

                Compliance. Either internal or external auditing can occur (up to seven years based on Dodd-Frank Legislation). This data can be deployed on commodity hardware at lower cost, and can be remounted when needed.

                Analytics. Many years of historical information can be cheaply stored on commodity hardware to allow data scientists to perform analysis for future patterns and backtesting against previous assumptions.

                JSON/JavaScript

                Temporal documents work with XML/XQuery as well as JSON/JavaScript.

                Java/search/REST/Node API

                Temporal is supported by all of our existing server-side APIs.

                MLCP

                You can specify a Temporal collection with the –temporal_collection option in MLCP.

                Normal document management APIs (xdmp:*)

                By default this is not allowed and an error will be returned.  Normally the temporal:* API should be used.  However, for more information, see also Managing and Updating Temporal Documents.

                Triples

                MarkLogic supports non-managed triples in a Temporal document.

                Introduction

                How do you find all versions of a temporal document?

                Details

                In MarkLogic Server, a temporal document is managed as a series of versioned documents in a protected temporal collection. In addition, each temporal document added creates another collection based on its URI, and all versions of the document will be in that collection.

                For example, if you have stored a temporal document at URI /orders/koolorder.xml then you can find all the versions of that document by using a collection query as

                    cts:search (/, cts:collection-query ('/orders/koolorder.xml'))

                and the uris of all the versions of the document as

                    cts:uris ((), (), cts:collection-query ('/orders/koolorder.xml'))

                Introduction

                Allen and ISO operators are comparison operators that can be used in temporal queries.

                Details

                Both operator sets are used to represent relations between two intervals.  ISO operators are more general and usually can be represented by a combination of Allen operators.  For example: iso_succeeds = aln_met_by || aln_after.

                Period Comparison Operators are discussed in more detail in Searching Temporal Documents.

                 

                Introduction

                For terms stored in the index, the position list tracks where they appear within the document. Positions are used to resolve queries where distance between terms matter (for example near queries where a term can appear n words away from another term or phrase within a given element or set of search criteria). There are a number of index options involving positions of document terms. When these indexes are enabled, MarkLogic will record positions in a positions list for each term in the universal index. When positions lists get large, MarkLogic may take a long time to load them from disk.  To minimize the impact of large position lists, MarkLogic imposes a maximum size for these lists per term.

                MarkLogic 7 and above

                Each stand in a forest maintains its own index and its own positions list. The smaller the stands, the less likely you are to encounter maximum positions for a term as smaller stands likely results in smaller term lists.  A maximum stand size was introduced in MarkLogic 7.  By default, the maximum stand size restricts the size of individual stands to 32768 (32 GB).

                If you are running into the warnings message "Termlist will discard positions at 256MB", you may need to manage your data and forests to ensure the index sizes remain manageable.

                There is a positions-list-max-size configuration parameter (default is 256MB, with a maximum value of 512MB) where the term list is considered too large and unwieldy.  For example: A 512MB term list would take 15 seconds to load from disk at 20MB/sec, so increasing this value from the default may allow for a fast fix to a potential performance problem but it's probably not the most optimal change that could be made.

                In MarkLogic 7 & MarkLogic 8,  the default maximum stand size restricts the size of individual stands to 32768 (32 GB). With the maximum stand size setting, we expect it to be less likely for new customers to run into the large positions list problem; For a 32GB stand, a single 256MB term list would take almost 1% of all the disk space taken by that stand, which is unlikely.

                Scenario: Understanding what messages to look for

                In the ErrorLog file, you may notice messages appearing at the "Info" level which look like:

                2016-04-13 03:02:17.951 Info: Termlist for X in Y is 151 MB; will discard positions at 256 MB

                This message is just letting you know that the term list is getting large and the limit is getting near for a particular stand ("Y"). The term list is managed by each stand in all your forests so for each stand, a maximum size is allowed (the default is 256MB). If the positions list starts to exceed this maximum size, positions will be discarded by MarkLogic for that database.

                Settings

                The value is set at the database level: 

                Configure -> Databases -> [Database] -> positions list max size

                This value can be increased by changing this database-level setting although we do not recommend exceeding 512MB. The main reason for this is due to performance; larger positions lists take longer to load, so there is a performance implication with this setting.

                Newer releases of MarkLogic Server set the maximum size of a given stand for new databases to a default size of 32GB (32768). This setting is governed by the Merge Policy for the database:

                Configure -> Databases ->[Database] -> Merge Policy -> merge max size

                As each stand maintains its own positions list, one way to ensure that you don't hit the maximum size is to ensure your on-disk stands are smaller; having more stands has a performance implication as any query needs to traverse all stands in order to compute the result set of fragments in order to answer the query.

                In our example message above, the positions list is 151MB which is still a decent way off from the upper default per-stand limit of 256MB - so no immediate concern. 

                Steps to resolving the issue

                If it becomes necessary, in order to keep yourself on the best side of the positions-list-max-size limit, you have two choices:

                1. You can modify the positions list max size to make these lists larger (remember that the recommended upper limit is 512MB) and this is a single configuration change that is made at database level.

                2. You could modify the merge max size to ensure each of your on-disk stands are smaller.

                Either approach will have a performance impact so my suggestion would be that neither setting should be changed unless you really need to.

                Given the above example, if the largest value you are seeing is 151MB, there is still a decent amount of overhead for all the stands in this database.

                If you start to see the value getting closer to 256MB, the fastest resolution would be to increase the positions list max size to ensure that positions are not discarded and then to think about managing the maximum size of on-disk stands for your database.

                ErrorLog message escalation: understanding the risks

                If they only ever remain at Info level, they can be safely ignored. The severity level of the logging will escalate twice. However, log messages have been designed to escalate in severity so you know how to watch for warning signs.

                1. When you reach 2/3 of the discard threshold, these messages will appear at Notice level in the ErrorLogs.
                2. When you reach 3/4 of the discard threshold, these messages will appear at Warning level in the ErrorLogs.

                At the very least, keeping tabs on when these messages start to appear at Notice level should give you plenty of advance warning.

                Monitoring for Warning level messages should also catch this issue before it becomes a critical issue and starts to impact on search results.

                Further reading

                Introduction

                Binary documents often have various associated metadata. For example, an image may have metadata like a timestamp of when and where it was taken, and so on. MarkLogic Server server offers the ability to extract this metadata information from binary documents (e.g. Images, MS Office and Adobe PDF) using XQuery built-in functions and conversion pipelines using third party software.

                The following article gives details about the security vulnerabilities reported for text extraction and MarkLogic releases containing the resolution.



                Details

                MarkLogic Server's Admin API function xdmp: document-filter will allow you to extract metadata and text from binary documents as XHTML. Additionally, the server’s xdmp:pdf-convert() and Content Processing Framework (CPF) helps convert HTML, Adobe PDF and Microsoft Office documents to XML.

                However, these mechanisms utilize and rely on a third-party softwares like Iceni  "Argus PDF converter" and Perceptive Document Filters” from Lexmark to extract text and metadata from a wide variety of document formats. 

                Recently, both Iceni and Lexmark have issued security alerts for vulnerabilities in these product and have incorporated fixes into their most recent release. They have published the following CVEs:

                For Iceni:

                • CVE-2016-8333 and CVE-2016-8335
                  • An exploitable stack-based buffer overflow vulnerability

                The latest version of Iceni (v6.6.5) patches the security issues listed above.

                For Lexmark:

                • CVE-2016-5646
                  • An exploitable heap overflow vulnerability exists in the Compound Binary Format (CBFF) parser functionality of the Lexmark Perceptive Document Filters Library.
                • CVE-2016-4336
                  • An exploitable out of bounds write vulnerability exists in the Bzip2 parsing of the Perceptive Document Filters
                • CVE-2016-4335
                  • An exploitable buffer overflow vulnerability exists in the XLS parsing of the Perceptive Document Filters conversion functionality

                These are considered to be vulnerabilities of "High" severity based on CVSS base scores in excess of 7.0.  A carefully crafted pdf, CBFF, Bzip2, or XLS file could be used to cause a buffer overflow which can result in arbitrary code execution.

                The latest version of Lexmark Isys (v11.3) patches the security issues listed above.

                 

                Resolution

                MarkLogic has issued an update which includes these fixes.

                The latest releases of MarkLogic Server versions 7 (7.0-6.8) and 8 (8.0-6) are available for download from our Community website that incorporates the latest fix for Iceni and Lexmark Isys.



                References

                • For more information on the Lexmark security issues, see

                http://support.lexmark.com/index?page=content&id=TE811&modifiedDate=08/26/16&userlocale=EN_US&locale=en

                • Further details on Iceni issues can be found at:

                https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2016-8333

                https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2016-8335

                 

                 

                Summary

                During large collection delete using xdmp:collection-delete query may time out. xdmp:collection-delete though efficient by itself, but if individual fragment part of collection are locked by other 'user query' then xdmp:collection-delete will wait for lock release on them and may time out.

                Example:

                Generally xdmp:collection-delete is fast but performance can be compounded by other processes taking and holding locks. 

                To give a really simple (and contrived) illustration of this:

                1. Create a sample 250000 Docs for below collection

                for $i at $pos in 1 to 250000 return xdmp:document-insert("/"||$pos||".xml", element test {xdmp:random()}, 
                <options xmlns="xdmp:document-insert"> 
                 <collections>
                  <collection>/my/collection</collection>
                 </collections>
                </options>)
                

                2. Below query to delete a document

                xquery version "1.0-ml";
                xdmp:document-delete("/1.xml"),
                xdmp:sleep(500000)
                

                3. Run the xdmp:collection-delete in a separate Query Console tab.

                Above delete in step:2 holds lock and calls xdmp:sleep, which then has to run to completion before xdmp:collection-delete can return, and so collection delete will timeout. 

                Above is simple example creating lock through sleep, simulating other user query that could be holding lock.

                You can also enable the Lock Trace diagnostic trace event to identify write locks.

                Solution

                1. Instead of trying to delete an entire collection in a single transaction, you could use the below query for individual collection delete in a spawn call with a certain limit Ex:5000.

                xquery version "1.0-ml";
                
                for $i in cts:uris((), ("limit=5000"), cts:collection-query("/my/collection"))
                return xdmp:spawn-function(function() { xdmp:document-delete($i) })
                

                In above query, URI lexicon is used to return a subset of those (dealing with 5000 URIs from that collection for testing). We are using a call to xdmp:spawn-function for each of those URIs and this is spawning a separate "task" to delete that document (by URI).

                Depending upon total number of documents in collection and 5000 URI per batch, you may end up creating too many Tasks in TaskServer Queue. The Task Server (there is a set queue size), the queue size can be increased if/when necessary. But you should be able to create “tasks” to delete 100,000 documents in one go.

                It is also important to note that a document can always be part of a large number of collections, so you use these (and cts:collection-query) to scope searches

                2. This particular solution helps in deleting a batch of documents in one task as opposed to the above solution where it deletes individual document. Below code helps in achieving batch delete of documents in a collection instead of running individual delete through spawn function. 

                xquery version "1.0-ml";
                
                let $batch_size := 10
                let $coll_name := "/my/collection"
                
                
                let $uris := cts:uris((), (), cts:collection-query( $coll_name)) 
                let $total_uris := fn:count($uris)
                let $total_batches := (xs:unsignedLong( math:ceil( $total_uris div $batch_size)) -1)
                
                let $result := for $batch_index in (0 to $total_batches)
                
                            let $eval := xdmp:spawn-function(function() { fn:subsequence($uris, ($batch_index * $batch_size)+1, $batch_size) ! xdmp:document-delete(.) })
                            
                            return $eval
                
                return $result
                

                Timezone information and MarkLogic

                Summary

                This article discusses the effect of the implicit timezone on date/time values as indexed and retrieved.

                Discussion

                Timezone information and indexes

                Values are stored in the index effectively in UTC, without any timezone information. When indexed, the value is adjusted to UTC from either the explicit timezone of the data or implicitly from the host timezone, and then the timezone is forgotten. The index data does not save information regarding the source timezone.

                When queried, values from the index are adjusted to the timezone specified in the query, or to the host's implicit timezone if none is specified.

                Therefore, dates and times in the implicit timezone do what would be expected in calculations, unless you have a particular reason for actually knowing the offset from UTC.


                Implicit timezone

                The definition of an implicit timezone is given at https://www.w3.org/TR/xpath20/#dt-timezone.

                The MarkLogic host implicit timezone comes into play when the document is indexed and when values are returned from the indexes.

                fn:implicit-timezone() can be used to show the implicit timezone for a host.


                Changing implicit timezone

                If you change the implicit timezone without reindexing, the implicit timezone at indexing time was different than the implicit timezone at query time, so values indexed with the implicit timezone are "wrong" in that they were indexed with a different implicit timezone.

                If you specify a timezone for the data when it is indexed and when it is queried, the implicit timezone will not be a factor.


                Examples

                First we create an dateTime element range index on element <dt>, then insert a document without timezone information:

                xdmp:document-insert ('/test.xml', <doc><dt>2018-01-01T12:00:00</dt></doc>)

                Using a server located in New York (timezone now -05:00), retrieving the value from the index via

                cts:element-values (xs:QName ('dt'), ())

                gives

                2018-01-01T12:00:00

                showing that the implicit timezone works as described above. To see the value stored in the index (as adjusted to UTC) you can specify the timezone on the value retrieved:

                cts:element-values (xs:QName ('dt'), (), 'timezone=+00:00')

                returns

                2018-01-01T17:00:00Z

                so 2018-01-01T17:00:00 is the value coming from the index.

                When the implicit timezone is -5 hours then the call without a timezone returns 12:00. However, if the implicit timezone changed, then the value returned for the query without a timezone would also change, even though the value stored in the index has not changed.

                Introduction

                XQuery modules can be imported from other XQuery modules in MarkLogic Server. This article describes how modules are resolved in MarkLogic when they are imported in Xquery.

                Details

                How modules are imported in code

                Modules can be imported by using two approaches-

                --by providing relative path

                import module namespace m = "http://example.edu/example" at "example.xqy";

                --Or by absolute path

                import module namespace m = "http://example.edu/example" at "/example.xqy";

                 

                How MarkLogic resolves the path and loads the module

                If something starts with a slash, it is a non-relative path and MarkLogic take it as is, if it doesn't, it is a relative path and first it is resolved  relative to the URI of the current module to obtain a non-relative path. 
                 
                Path in hand, MarkLogic always start by looking in the Modules directory. This is a security issue as we want to make sure that the MarkLogic created modules are the ones chosen. In general, users should NOT be putting their modules there. It creates issues on upgrade and if they open up permissions on the directory to ease deployment it creates a security hole. 
                 
                Then, depending on whether the appserver is configured to use a modules database or the filesystem, we interpret the non-relative path in terms of the appserver root either on the file system or in the Modules database. 

                 

                Debugging module path issue

                To Debug this you can also enable Module caching trace. This will check  how it resolves the paths. Enter "module" as the name of the event in the Diagnostics>Events and you should have a list of module caching events added. These will give you the working details of how module resolution is happening, and should provide enough information to resolve the issue.

                Be aware that diagnostic traces can fill up your ErrorLog.txt file very fast, so be sure to turn them off as soon when you no longer need them.

                 

                Performance Hints

                1. Be sure that your code does not rely on dynamically-created modules. Although these may be convenient at times, they will make overall performance suffer. This is because every time a module changes, the internal modules cache is invalidated and must be re-loaded from scratch -- which will tend to hurt performance.

                2. if you are noticing a lot of XDMP-DEADLOCK messages in your log, be sure your modules are not mixing any update statements within what should be a read-only query. The XQuery parser looks for updates anywhere in the modules stack -- including imports -- and if it finds one, it assumes that any Uri that is gathered by the queries might potentially be updated. Thus, if the query matches 10 Uris, it will put a write lock on them, and if it matches 100000 Uris, it will lock all of them as well, and performance will suffer. To prevent this, be sure to isolate updates in their own transactions via xdmp:eval() or xdmp:spawn().

                 

                 

                Summary

                AWS and Azure have announced the imminent disabling of TLS 1.0 and 1.1 to access their respective key management system (KMS). Action must be taken now to ensure your MarkLogic clusters can continue operating as normal beyond the end of June

                Microsoft Azure:

                Microsoft Azure KMS have stopped supporting TLS version lower then TLS v1.2 as of 06/12/23; Similarly Active Directory (Azure AD) will soon stop supporting the following Transport Layer Security (TLS) protocols and ciphers:

                  • TLS 1.1
                  • TLS 1.0
                  • 3DES cipher suite (TLS_RSA_WITH_3DES_EDE_CBC_SHA)

                Amazon AWS:

                Amazon AWS plans to stop all software end clients using TLS 1.0/ 1.1 by 06/28/23, requiring all client connection to be minimal TLS v1.2.

                Am I affected? 

                This technical advisory is only relevant to customers using MarkLogic 9, 10 or 11 in conjunction with the AWS KMS or the Azure Key Vault. 

                To establish if this is the case navigate to the KeyStore tab of your cluster configuration in the MarkLogic Admin UI and check if encryption is turned on, the KMS type is set to external and the “host name” is either an Azure Key Vault host or an AWS KMS host as illustrated below. 

                Admin GUI Configuration for External KMS. Navigate to AdminGUI -> Configure ->Clusters->'Keystore' tab.

                • Azure Key Vault

                • AWS KMS

                  • Query Console query to determine KMS config
                xquery version "1.0-ml"; 
                import module namespace admin = "http://marklogic.com/xdmp/admin" at "/MarkLogic/admin.xqy"; 
                
                let $config := admin:get-configuration() 
                return admin:cluster-get-keystore-kms-type($config)
                
                ==>internal
                

                Impact on MarkLogic Server

                MarkLogic Server configured with external KMS (AWS or Azure KMS) will fail connecting with KMS Server without TLS v1.1, and as a result encrypted database will become offline. Encrypted Security Database without access to KMS Server will lead to Admin GUI not being accessible to perform any corrective Admin action, resulting in lockout situation. 

                Admin would see below error code messages in ErrorLog.

                XDMP-FORESTERR: Error in mount of forest AW-modules-1: XDMP-AZUREKEYVAULTERR 400 Bad Request code=NotSupported message="The caller is using an older TLS version for authentication to Key Vault.TLS 1.0 is no longer accepted by KeyVault Service.

                MarkLogic Server connection to AWS S3 bucket will be impacted with connection failure as well and resulting backup and restore operation will fail.

                Remediation and course of action

                Once you establish that you are affected by verifying your configuration, please log a ticket with our technical support by visiting https://help.marklogic.com/Tickets/Submit/ and specify the exact version of MarkLogic Server you are running so the team can direct you to the appropriate remediation procedure. 

                All customers using external KMS for encryption at rest in AWS or Azure KMS environment must upgrade to patch release.

                Note: Mitigation is greatly simplified if the cluster is operational during the remediation process. DO NOT WAIT FOR THE ISSUE TO MANIFEST ITSELF BEFORE TAKING ACTION.

                Impact and Remediation for DHS customers

                AWS Data Hub Service (DHS) instances with service versions >= 3.0 are impacted due to their default usage of external AWS KMS endpoints for encryption at rest and S3 encryption.

                MarkLogic CloudOps support team will initiate and open a ticket for affected customers arranging for any planned downtime and steps to remedy.

                DHS Customers do not have action to take for the services hosted by MarkLogic CloudOps,  unless Customers also run their own on-prem clusters. In which case they should follow the outlined process for the non-DHS environment ("Impact on MarkLogic Server").

                Upgrade Resources

                Other References

                Summary

                There are a number of options for transferring data between MarkLogic Server clusters. The best option for your particular circumstances will depend on your use case.

                Details

                Database Backup and Restore

                To transfer the data between two independent clusters, you may use a database backup and restore procedure, taking advantage of MarkLogic Server's facility to make a consistent backup of a database.

                Note: the backup directory path that you use must exist on all hosts that serve any forests in the database. The directory you specify can be an operating system mounted directory path, it can be an HDFS path, or it can be an S3 path. Further information on using HDFS and S3 storage with MarkLogic is available in our documentation:

                Further information regarding backup and restore may be found in our documentation and Knowledgebase:

                Database Replication

                Database Replication is another method you might choose to use to transfer content between environments. Database Replication will allow you to maintain copies of forests on databases in multiple MarkLogic Server clusters. Once the replica database in the replica cluster is fully synchronized with its master, you may break replication between the two and then go on to use the replica cluster/database as the master.

                Note: to enable Database Replication, a license key that includes Database Replication is required. You would also need to ensure that all hosts are: running the same maintenance release of MarkLogic Server; using the same type of Operating System; and Database Replication is correctly configured.

                Also note that before MarkLogic server version 9.0-7, indexing information was not replicated over the network between the Master and Replica databases and is instead regenerated by the Replica database.

                Starting with ML server version 9.0-7, index data is also replicated from the Master to the Replica, but it does not automatically check if both sides have the same index settings.The following Knowledgebase article contains further information on this:

                Further details on Database Replication and how it can be configured, may be found in our documentation:

                MarkLogic Content Pump (mlcp)

                Depending on your specific requirements, you may also like to make use of the MarkLogic Content Pump (mlcp), which is a command line tool for getting data out of and into a MarkLogic Server database. Using mlcp, you can export documents and metadata from a database, import documents and metadata to a database, or copy documents and metadata from one database to another.

                If required, you may use mlcp to extract a consistent database snapshot, forcing all documents to be read from the database at a consistent point in time:

                Note: the version of mlcp you use should be same as the most recent version of MarkLogic Server that will be used in the transfer.

                Also note that mlcp should not be run on a host that is currently running MarkLogic Server, as the Server assumes it has the entire machine available to it, including the CPU and disk I/O capacity.

                Further information regarding mlcp is available in our documentation:

                Further Information

                Related Knowledgebase articles that you may also find useful:

                Problem Statement

                You have an application running on a particular cluster (the source cluster), devcluster and you wish to port that application to an new cluster (the target cluster) testcluster. Porting the application can be divided into two tasks: configuring the target cluster and copying the code and data. This article is only about porting the configuration.

                In an ideal world, the application is managed in an "infrastructure as code" manner: all of the configuration information about that cluster is codified in scripts and payloads stored in version control and able to be "replayed" at will. (One way to assure that this is the case is to configure testing for the application in a CI environment that begins by using the deployment scripts to configure the cluster.)

                But in the real world, it's all too common for some amount of "tinkering" to have been performed in the Admin UI or via ad hoc calls to the Rest Management API (RMA). And even if that hasn't happened, it's not generally possible to be certain that's the case, so you still have to worry that it might have happened.

                Migrating the application

                The central theme in doing this "by hand" is that RMA payloads are re-playable. That is, the payload you GET for the properties of a resource is the same as the payload that you PUT to update the properties of that resource.

                If you were going to migrate an application by hand, you'd proceed along these lines.

                Determine what needs to be migrated

                An application consists (more or less by definition) of one or more application servers. Application servers have databases associated with them (those databases may have additional database associations). Databases have forests.

                A sufficiently complex application might have application servers divided into different groups of hosts.

                Applications may also have users (for example, each application server has a default user; often, but not always, "​nobody​").

                Users, in turn, have roles, and roles may have roles and privileges. Code may have amps that use privileges.

                That covers most of the bases, but beware that apps can have additional configuration that should be reviewed: security artifacts (certificates, external securities, protected paths or collections, etc.), mime types, etc.

                Get Source Configuration

                Using RMA, you can get the properties of all of these resources:

                • Application servers

                  Hypothetically, the App-Services application server.

                curl --anyauth -u admin:admin \
                   http://localhost:8002/manage/v2/servers/App-Services/properties?group-id=Default
                
                • Groups

                  Hypothetically, the Default group.

                curl --anyauth -u admin:admin \
                   http://localhost:8002/manage/v2/groups/Default/properties
                
                • Databases

                  Hypothetically, the Documents database.

                curl --anyauth -u admin:admin \
                   http://localhost:8002/manage/v2/databases/Documents/properties
                
                • Users

                  Hypothetically, the ndw user.

                curl --anyauth -u admin:admin \
                   http://localhost:8002/manage/v2/users/ndw/properties
                
                • Roles

                  Hypothetically, the app-admin role.

                curl --anyauth -u admin:admin \
                   http://localhost:8002/manage/v2/roles/app-admin/properties
                
                • Privileges

                  Hypothetically, the app-writer execute privilege.

                curl --anyauth -u admin:admin \
                   "http://localhost:8002/manage/v2/privileges/app-writer/properties?kind=execute"
                

                And the create-document URI privilege.

                curl --anyauth -u admin:admin \
                   "http://localhost:8002/manage/v2/privileges/create-document/properties?kind=uri"
                
                • Amps

                  Hypothetically, my-amped-function in /foo.xqy in the Modules
                  database using the namespace http://example.com/.

                curl --anyauth -u admin:admin \
                   "http://localhost:8002/manage/v2/amps/my-amped-function/properties\
                   ?modules-database=Modules\
                   &document-uri=/foo.xqy\
                   &namespace=http://example.com"
                

                Create Target Configuration

                Some of the properties of a MarkLogic resource may be references to other resources. For example, an application server refers to databases and a role can refer to a privilege. Consequently, if you just attempt to POST all of the property payloads, you may not succeed. The references can, in fact, be circular so that no sequence will succeed.

                The easiest way to get around this problem is to simply create all of the resources using minimal configurations: Create the forests (make sure you put them on the right hosts and configure them appropriately). Create the databases, application servers, roles, and privileges. Create the amps. If you need to create other resources (security artifacts, mime types, etc.) create those.

                Finally, PUT the property payloads you collected from the source cluster onto the target cluster. This will update the properties of each application server, database, etc. to be the same as the source cluster.

                Related Reading

                MarkLogic Documentation - Scripting Cluster Management

                MarkLogic Knowledgebase - Transferring data between MarkLogic Server clusters

                MarkLogic Knowledgebase - Best Practices for exporting and importing data in bulk

                MarkLogic Knowledgebase - Deployment and Continuous Integration Tools

                Summary:

                MarkLogic allows the use of SSL certificates in PEM Format when securing application servers. A certificate in PEM format is the Base64-encoding of the DER-encoding of the certificate structure values.

                This article explains some common issues seen when importing certificates, as well as methods to troubleshoot problems.

                Importing a certificate into MarkLogic:

                The general procedure for creating and importing a certificate into MarkLogic can be found in the docs here:  http://docs.marklogic.com/guide/security/SSL#id_42684

                For a certificate to be successfully imported, the public key of the signed certificate must match a public key contained in the Certificate Template.  MarkLogic will create a new public/private key par for each Certificate Request that is generated within a Certificate Template.

                Troubleshooting:

                Verify Certificate in PEM format

                If you are having an issue where MarkLogic is not accepting the signed certificate, you should first verify that your certificate is in PEM format.  If this is not the case, you can use openssl to convert your format to PEM.  Below are examples of how to convert between various formats using openssl.

                Convert a DER file to PEM: $openssl x509 -inform der -in certificate.cer -out certificate.pem
                Convert a P7B file to PEM: $openssl pkcs7 -print_certs -in certificate.p7b -out certificate.cer
                Convert a PKCS#12 file to PEM: $openssl pkcs12 -in keyStore.pfx -out keyStore.pem -nodes

                PKI-NOREQ Error

                After downloading Certificate request and sending csr file to IT/CA to get signed, Admin could accidentally click on Certificate Request again generating new Certificate request (overwriting previous certificate request file), which will result in certificate import error for the certificate that is matching first/initial certificate request.

                If you are still experiencing issues when attempting to import a signed certificate and receive PKI-NOREQ, you should ensure that the public keys for the certificate request and signed certificate match.  This public key should also match with the key contained in the certificate template.

                Use the following commands to extract the public key from the certificate request and signed certificate.

                Certificate Request: $openssl req -in request.csr -pubkey
                Signed Certificate: $openssl x509 -in certificate.crt -pubkey

                Alternatively, one can also compare modulus hash (compact string) to confirm if Certificate one is trying to import does match Private key stored Template.

                Certificate Request: $openssl req -noout -modulus -in request.csr | openssl md5   
                Signed Certificate: $openssl x509 -noout -modulus -in certificate.crt | openssl md5   

                Extracting Keys for the Certificate Request

                To obtain the public key from the certificate request, you should use the following xquery script.  Note that this script will need to be run against the Security database by a user with admin rights.  The output of this command will also display Private key information.  If you need to provide the output of this command to support, please remove all data in the <pki:private-key> elements.

                xquery version "1.0-ml";
                import module namespace pki = "http://marklogic.com/xdmp/pki"
                at "/MarkLogic/pki.xqy";
                
                let $template-id := pki:template-get-id(pki:get-template-by-name("INSERT-TEMPLATE-NAME"))
                
                return
                cts:search(fn:doc(),
                cts:element-value-query(xs:QName("pki:template-id"), fn:string($template-id), "exact"))
                

                The output of this script will contain various <pki:public-key> elements.  One of these public keys needs to match with the public key contained in your signed certificate.

                Further Reading

                Summary 

                This article is intended to help investigate certain Kerberos External Authentication issues, since most of the Kerberos Security authentication requires much more IT involvement, below are few areas we recommend to investigate before involving IT for Kerberos trouble.

                Keytab file location and permission

                MarkLogic Server requires a keytab file with the specific name "services.keytab" at the specified location within the MarkLogic Data directory.

                Note: The Permissions on the keytab must not be World or Group readable.

                [Location] $ pwd
                /var/opt/MarkLogic
                [Permission & Owner] $ ls -alt services.keytab
                -rw------- 1 daemon daemon 86 May  4 09:51 services.keytab

                Sample krb5.conf Configuration file 

                Kerberos configuration file are essential to Kerberos handshake, and below is a sample Kerberos file for a reference.

                $ cat /etc/krb5.conf
                [logging]
                default = FILE:/var/log/krb5libs.log
                kdc = FILE:/var/log/krb5kdc.log
                admin_server = FILE:/var/log/kadmind.log
                 
                [libdefaults]
                default_realm = MLTEST1.LOCAL
                dns_lookup_realm = true
                dns_lookup_kdc = false
                ticket_lifetime = 24h
                renew_lifetime = 7d
                forwardable = true
                 
                [realms]
                MLTEST1.LOCAL = {
                   kdc = srv-202-1-vm1.colo.marklogic.com
                   admin_server = srv-202-1-vm1.colo.marklogic.com
                }
                [domain_realm]
                .marklogic.com = MLTEST1.LOCAL
                marklogic.com = MLTEST1.LOCAL

                 

                Configuring Client Browser to utilize Kerberos authentication 

                Most Web Browser by default are not enabled to utilize Kerberos authentication with WebServer. Making sure browser is properly configured to utilize Kerberson handshake will eliminate one more suspect during troubleshooting. Below is one good Microsoft blog detailing on Browser configuration in respect to Kerberos

                https://docs.microsoft.com/en-us/troubleshoot/iis/troubleshoot-kerberos-failures-ie

                Browser Login Dialog Username

                When Web Broswer attempts to connect Kerberos enabled WebServer, Browser will throw user prompt dialog box to user. Kerberos handshake expects that user provide complete domain/realm along with username during login process.

                Example - UserName : "test1@MLTEST1.LOCAL"

                Case Sensitivity of Kerberos

                Kerberos username as well as domain/realm are case sensitive and they should match to domain/real configured in file krb5.conf. Not having correct correct case on complete username (including realm) can lead to error with limited debugging information.

                MarkLogic Trace Events

                We can enable Kerberos Trace event as below and then run a kerberos login test again for ErrorLog to capture Trace Events, which could provide more information on Kerberos handshake between MarkLogic and Kerberos Server.


                Add the "Kerberos GSS Negotiate" trace event in the Admin UI by navigating to -> Configure -> Groups -> {group-name} -> Diagnostics -> trace events activated = true; then Add "Kerberos GSS Negotiate"; press the “ok” button.  

                List of other potential issue and troubleshoot techniques (Well compiled 3rd party source)

                https://technet.microsoft.com/en-us/library/bb463167.aspx 

                 

                 

                Introduction

                With the introduction of Certificate Based Authentication in MarkLogic 9, users can now log into a MarkLogic without entering user/password credentials.

                Configuring a MarkLogic AppServer to support TLS Client Certificate Authentication is a little more complex than simple SSL Server based authentication and it may not always be apparent why connections are not working once configuration is completed.

                This Knowledgeabse article demonstrates some simple debugging techniques that should help to track down and identify issues encountered with Certificate Based authentication where things are not working as expected.

                What is the difference between Client and Server based authentication?

                Before starting down the path of troubleshooting it's worth ensuring that we understand what the differences are between TLS Server based authentication and TLS Client Authentication:

                With a standard HTTPS connection to a TLS-enabled Application Server, MarkLogic server will send a copy of its X509 Certificate to the client who will then verify the certificate against a list of known Trusted Root certificates installed within the browser or a Java KeyStore for a Java based application (such as MLCP).

                When TLS Client Authentication is enabled in MarkLogic for Certificate Based authentication, as well as sending a certificate to the client, MarkLogic Server will request that the client sends a certificate back to the server.

                The certificate returned by the client is then used to determine which Internal or External user is used within MarkLogic.

                How does the client know which certificate to send?

                A web browser can often have multiple client certificates installed so how does it know which certificate to present to MarkLogic Server?

                The certificate(s) that the Client can use are controlled by MarkLogic Server's application server settings.  Using the Admin GUI on port 8001, during configuration for Certificate Based authentication, you can specify that a client certificate is required (Configure > Groups > [Your Group Name] > App Servers > [Your App Server Name] > ssl require client certificate : true) and you can also select one or more Certificate Authorities under the ssl client certificate authorities section.

                Only Client certificates issued by one of these authorities will be permitted.

                If a browser has multiple Client certificates issued by one of the selected Certificate Authorities, the user will be prompted to select the appropriate client certificate to use.

                Note: In this case it is important to select the certificate that have been issued to you for use within MarkLogic.

                To verify that you have a valid certificate, you can either use a local system tool such as KeyChain Access (in Mac OS X) to check that the Issuer Name details for your client certificate match those of the Certificate Authorities configured in the MarkLogic Application Server settings as per the example above.

                Alternatively, if you have a PEM representation of your user certificate you can use the OpenSSL utility to display the Issuer information, e.g.

                >$ openssl x509 -in user1.pem -issuer -noout
                issuer= /O=MarkLogic/OU=Support/CN=RootCA

                Verifying the TLS Handshake

                The first stage of MarkLogic Certificate Based authentication requires a successful TLS Handshake to take place between the Client and MarkLogic Server.

                If the TLS Handshake fails at any stage, the session will be rejected.

                Recommendation: While it is not a required it is highly recommended that you have a working HTTPS Application Server configuration first (using basic authentication) before enabling certificate based authentication.

                This will ensure you have a valid TLS Server configuration before you enable TLS Client Authentication and should reduce the amount of troubleshooting required.

                The easiest way to view what is taking place during the TLS Handshake is at the TCP packet level using a tool such as Wireshark which has built-in support for decoding the TLS protocol.

                If Wireshark can easily be installed on a Client machine, it can be configured to capture TCP traffic to/from the MarkLogic AppServer port; the example below demonstrates capturing set up with a filter for all traffic on port 8010

                If it is not possible to install Wireshark on the Client machine, the same information can be captured on the MarkLogic Server using the tcpdump utility.  From there you can create a pcap file on a given port (in this example: 8010), by running the following command:

                tcpdump –i any –s 0 –w certauth.pcap port 8010

                Once you have run tcpdump long enough to have captured the failing transaction, you can attach the resulting pcap file to a MarkLogic support ticket for further analysis.

                If you are able to view the packet trace in Wireshark, you will first need to locate and select the packet where MarkLogic Server sends the Certificate Request

                In the Frame details panel, you can drill down to the list of Distinguished Names and check that there is an entry for the Certificate Authority configured in MarkLogic

                Having ensured that MarkLogic Server is sending a request for the correct client certificate you should locate the subsequent Certificate response being sent by the client

                In the Frame details panel, the first thing to check is whether a Certificate was actually returned by the client. If no Certificate was found by the client that satisfied the MarkLogic Server request, then a Zero Certificate (Certificates Length: 0) is returned

                If a Client sends a Zero Certificate the session will be terminated immediately with a TLS Handshake Failure.

                In this case you should check that you have correctly installed a client certificate that was issued by the Certificate Authorities configured in MarkLogic for this Application Server.

                If a valid certificate was found by the Client you will see the necessary information within the Frame details panel in Wireshark

                If the session is terminated at this point with a TLS Handshake Failure the most likely cause is that MarkLogic Server was unable to verify the client certificate to a valid chain of root certificates.

                This will typically occur if the Issuing Certificate Authority configured in MarkLogic is part of a chain of Root certificates, often referred to as an Intermediate CA certificate. In this case you should check that all CA certificates in the chain have been installed to the Trusted Store in the MarkLogic Security database.

                If no TLS Handshake Failure occurs, you should see a pair of Encrypted Handshake Message packets which indicate that secure encryption has been enabled by both the client and the server and the TLS Handshake has successfully completed

                I still get a 401 Unauthorized error

                Having first established that a successful TLS Client Authentication has taken place if you still get a 401 Unauthorized error then the likely cause is that MarkLogic has not been able to successfully map the supplied Client certificate to either an Internal or External User.

                The first check that MarkLogic Server will make is to look for an Internal User that matches the Common Name (CN) in the supplied Client certificate.

                Having checked the userid specified in the certificate Common Name, e.g.

                Check that a corresponding Internal MarkLogic userid exists in your Security database; in the Admin GUI in MarkLogic check for a matching user name (Configure > Security > Users > [your user name])

                Alternatively, if no internal user matches, MarkLogic will attempt to use the full Subject Distinguished Name in the Client certificate to map to an external security name within a previously defined MarkLogic user.

                 In this scenario first check that you have a valid External Security definition configured to perform certificate based authentication (Configure > Security > External Security)

                Assign the External Security definition to the Application Server to map the external security name (Configure > Groups > [Your Group Name] > App Servers > [Your App Server Name] > external securities)

                Finally check that Internal MarkLogic User has an External Name that matches the Client Certificate Subject Distinguished Name (DN) (Configure > Security > Users > [your user name])

                Note: The ordering of the Subject DN in the External name is critical and should follow the highest to lowest level precedence, e.g.

                O=MarkLogic,OU=App Users, CN=user1

                And not

                CN=user1,OU=App Users,O=MarkLogic

                If you are unsure you can use the OpenSSL command below to list the Subject DN in the expected order:

                >$ openssl x509 -in user1.pem -subject -noout 
                subject= /O=MarkLogic/OU=App Users/CN=user1

                Further reading

                Summary

                File and semaphore errors (such as SVC-FILREM & SVC-SEMPOST) seen on MarkLogic Servers running on the Microsoft Windows platform can sometimes be attributed to Windows file system handling of MarkLogic Data files.

                This article covers a number of possible sources and how to troubleshoot.

                Windows File System and background services

                Systems running Microsoft Windows Servers are often running Virus Scanners, Corporate Security tools and non-MarkLogic backup programs (for example: Windows Shadow copy). To avoid file access conflicts, MarkLogic recommends that all MarkLogic data files be excluded from access by any background services. As a general rule, *only* MarkLogic Server should be maintaining MarkLogic Server data files.

                Troubleshooting Suggestions

                If MarkLogic Server is reporting file system or semaphore errors, here are some troubleshooting suggestions: 

                1. Make sure that MarkLogic Server is running under an account with adequate file-system permissions.  MarkLogic recommendation is that MarkLogic Server runs under the SYSTEM account (default).

                2. If Anti-virus software is installed, configure it so that it excludes all MarkLogic Server Data files from being scanned;

                3. If a shadow backup client is installed (for example: Volume shadow copy in Windows XP) configure it so that it excludes all MarkLogic Server Data files from being backed up; Recommendation is that backups of MarkLogic data files are managed by MarkLogic Server backup features.  Also note that a backup of MarkLogic Data files can not be restored unless the MarkLogic Server Database was quiesced at the time and duration of the backup.  

                4. Use Process Explorer to check for other process touching ML File Handles.

                http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx

                You can open the procexp.exe and view->Show Lower Pane and then select view->Lower Pane->Handles.

                You can watch other suspect process touching ML related file handles in Lower Pane.

                5. Run chkdsk on Windows to see any bad sector or other disk issue.

                6. Monitor Disk I/O usage to check for spikes in I/O usage/performance during the time that file errors are being reported by MarkLogic Server.

                7. Check the Windows System Log for fsync or file system (remote file-system) related error log entries.

                • Windows -> Control Panel -> Administrative tools
                • Launch Event Viewer
                • Expand "Windows Logs" and select “Application” -> Save All Event as Text file
                • Expand "Windows Logs" and select “System” -> Save All Event as Text file



                Summary

                There are situations where the XDMP-BACKDIRSPACE error occurs while backing up a database. This article explains how this condition can occur and describes a number of strategies to troubleshoot and to determine root cause.

                Under normal operating conditions, when there is enough disk space to complete a backup, MarkLogic Server does not expect to report the XDMP_BACKDIRSPACE error.  Most likely, this error is a result of a bad disk configuration, of the disk unmounting or of, simply, insufficient disk space.

                We will begin by exploring methods to narrow down the server which has disk issue and then list some thing to look into in order to identify the cause.

                How Administrators Can Narrow down the particular node out of cluster.

                XDMP-BACKDIRSPACE error indicates that a host or hosts in a cluster does not have sufficient space to complete a backup operation. Because MarkLogic Server implements a shared-nothing architecture, a database backup operation results in each MarkLogic Server that is hosting forests for the database will attempt to backup their related forests at the specified path as seen by that server. If any of those forests fails to be backed up because of insufficient disk space, the entire database backup operation will fail with XDMP_BACKDIRSPACE error.  

                If the backup was executed from the Admin UI, the error will be reported in the Admin UI. However, the host where the error actually occured in not reported.

                To identify the node with that is reporting insufficient disk space, you need to look at all of the MarkLogic Server ErrorLogs of all hosts which are mounting forests for that database.   The XDMP_BACKDIRSPACE error will be logged in the ErrorLog of the host where it occurred. Once the problem node is identified, we could do below checks to make sure if we have sufficient disk space.

                Things to look at on trouble node

                1)  Free i-nodes and free disk space

                The server could have enough free disk space, but if your linux reached configued i-nodes limit, the server would appear to be out of disk space.  The "df -hi" command can tell you if you have free i-nodes. If you are i-node constrained, configuring more i-nodes.

                2)  Disk mount Errors

                There may be network problems, resulting in the remote disk unmounting frequently. Looking for disk mounting related Error in /var/opt/messages (Linux) or System Log (Windows). 

                It is also possible that you are using non-standard or unreliable mount options. Different remote file system have different mount option recommendations. Verify that your mount option are sufficient for the workload.

                3) Is your host running on VM ?

                Many Virtualized Machine environments provides memory and disk to guest OS as needed.  This type of configuration is the source of problems for many resource intensive application. In general, need to configure your VM host to have fixed/pre-assigned memory, disk, cpu and network resources .

                4) Configuration comparision among Nodes.

                If there is no apparent free disk space issue, you should compare disk configurations between the problem node with other nodes in the cluster using with "fdisk -l", "cat /etc/fstab" and "mount" commands.

                5) Corrupt sector/block ?

                Check disk health.  chkdsk (Windows) or fsck (Linux) can be used to check the disk for Bad Sector and Blocks.

                6) Disk I/O hardware ?

                Check disk I/O. You could utilize Windows System Monitoring tool or on Linux you could use "iostat", "dstat" or inspect platforms proc disk stat files, and Sar files for disk io health.

                7) Privilege issue ?

                if MarkLogic Server is running as non-root user, check the file-system privilege of all mounted drive accessed by the MarkLogic Server process.

                Summary

                Sometimes, following a manual merge, a number of deleted fragments -- usually small number -- are left behind after the merge completes. In a system that is undergoing steady updates, one will observe that the number of deleted fragments will go up and down, but never go down to zero.

                 

                Options

                There are a couple of approaches to resolve this issue:

                  1.  If you have access to the Query Console, you should run xdmp:merge() with an explicit timestamp (e.g. the return value of xdmp:request-timestamp()). This will cause the server to discard all deleted fragments.

                  2.  If you do not have access to the Query Console, just wait an hour and do the merge again from the Admin GUI.

                 

                Explanation

                The hour window was added to avoid XDMP-OLDSTAMP errors that had cropped up in some of our internal stress testing, most commonly for replica databases, but also causing transaction retries for non-replica databases.

                We've done some tuning of the change since then (e.g. not holding on to the last hour of deleted fragments after a reindex), and we may do some further tuning so this is less surprising to people.

                 

                Note

                The explanation above is for new MarkLogic 7 installations. In case of an upgrade from prior MarkLogic 7 this solution might not work as it requires a divergent approach to split single big stands into 32GB. Please read more in the following knowledge base article Migrating to MarkLogic 7 and understanding the 1.5x disk rule (rather than 3x.

                Introduction

                In more recent versions of MarkLogic Server, "slow background" error log messages were added to note and help diagnose slowness.

                Details

                For "Slow background" messages, the system is timing how long it took to do some named background activity. These activities should not take long and the "slow background" message is an indicator of starvation. The activity can be slow because:

                • it is waiting on a mutex or semaphore held by some other slow thread;
                • the operating system is stalling it, possibly because it is thrashing because of low memory.

                Looking at the "slow background" messages in isolation is not sufficient to understand the reason - we just know a lot of time passed since the last time we read the time of day clock. To understand the actual cause, additional evidence will need to be gathered from the time of the incident. 

                Notes: 

                • In general, we do not time how long it takes to acquire a mutex or semaphore as reading the clock is usually more expensive than getting a mutex or semaphore.
                • We do not time things that usually take about a microsecond.
                • We do time things that usually take about a millisecond.

                Related Articles

                Knowledgebase: Understanding Slow Infrastructure Notifications

                Knowledgebase: Understanding slow 'journal frame' entries in the ErrorLog

                Knowledgebase: Hung Messages in the ErrorLog

                Summary

                MarkLogic server monitoring dashboard provides a way to Monitor Disk Usage which is a key monitoring metric. Comparing the disk usage shown on monitoring dashboard with Disk space on filesystem (for example, using df –h) reveals difference between two. This article talks about these differences and reasons behind them.

                 

                Details

                To understand how to use Monitoring dashboard Disk Usage, see our documentation at https://docs.marklogic.com/guide/monitoring/dashboard#id_60621

                If you add all disk usage metrics (Fast Data, Large Data, Forest Data, Forest reserve, Free) and compare it with space on your disk (using df -h or other commands) you will see a difference between those two values.

                This difference exists mainly because of two reasons:
                1. Monitoring history dashboard displays disk space usage excluding Forest journal sizes in MB & GB 
                2. On Linux, by default around 5% of the filesystem is reserved for cases where the filesystem fills up to prevent serious problems and for its own purposes. For example for keeping backups of its internal data structures.

                 

                An example

                Consider below example for a host running RHEL 7 with 100GB disk space on filesystem for one database and one forest.

                Disk usage as shown by Monitoring dashboard:
                Free                   92.46 GB      98.17%
                Forest Reserve      1.14 GB       1.21%
                Forest Data          0.57 GB        0.60%
                Large Data           0.02 GB        0.02%

                Total from Monitoring dashboard is around 94.19 GB. When we add the size of Journals (around 1GB for this case), and OS reserve space (5%), the total comes out to be 100GB which is total capacity of disk in this example.

                 

                On the other hand, consider disk usage as shown by df -h command for filesystem:

                Filesystem                    Size Used Avail Use% Mounted on
                /dev/mapper/Data1-Vol1 99G 2.1G 92G    3%   /myspace

                Adding 5% default OS reserve for Linux gives us total size for this filesystem which is more than 99GB i.e,100 GB appx.

                Items of Note

                • The Dashboard:Disk Space uses KB/MB/GB, which means 1 KB = 1000 B, not KiB/MiB/GiB where 1 KiB = 1024 B.
                • The actual disk usage for forests (including Journal sizes) can be confirmed by checking the output of below command from the file system:
                  • du --si -h /MarkLogic_Data/Forests/*
                    • -h flag is for human readable format
                    • --si flag is for using KB/MB/GB instead of the default KiB/MiB/GiB

                Conclusion

                The reason for difference in metrics on Monitoring dashboard and disk usage for filesystem is because monitoring history does not show Journal size and OS reserve space in the report.

                 

                Useful Links:

                https://docs.marklogic.com/guide/monitoring/dashboard#id_60621

                http://serverfault.com/questions/315181/df-says-disk-is-full-but-it-is-not

                http://www.walkernews.net/2011/01/22/why-the-linux-df-command-shows-lesser-free-disk-space/

                Introduction

                On a typical online transactional project it’s not uncommon, at the end of the project to discover when running at scale that simple tasks unexpectedly take much longer than expected. You’re surprised because your team know how to avoid writing ‘bad’ queries that retrieve lots of data from disk, and when you run the relevant requests through a profiler they seem to run efficiently. What’s going on?

                It’s at this point that you may well start having conversations about locking. Although you maybe got told or read about how locking works in MarkLogic at the start of the project, you pushed it to the back of your mind as there were lots of other things to think about. Now it has your attention.

                Suddenly you can see that locking is something you need to know about and given it seems to be causing problems it becomes something to be avoided. You may well start going through invoke or eval related contortions to avoid it, which in turn may create a fresh set of problems, which in turn give rise to workarounds and so on, leading to a crisis of confidence and paranoia concerning the platform itself.

                The extremes of not really understanding and ignoring locking at the outset, and later overcompensating can be overcome by having a sound understanding at the start.

                First of all, it’s worth taking the time to read the relevant section in the documentation

                From a performance point of view the following key points are worth emphasizing.

                • Queries run lock free
                • Update statements require exclusive write locks on documents they insert or update
                • Update statements only require non-exclusive read locks on documents they read but do not update

                You will only get contention between requests if they are both update requests and at least one of those requests updates a document that is either read or updated by another running request.

                The converse of that statement is that you will not get contention if the documents you are updating are not being read or updated by other concurrently running requests. If you bear this principle in mind you should be able to build an application that runs just as well at scale as it does on a laptop.

                Example: Locking without contention

                Examples are instructive. We base ours in the ‘Documents’ database, although any database will do.

                First clear your database. Then add a single document:

                    xdmp:document-insert("/for-read-lock.xml",element root{});

                We’ll use this to show that locking is fine so long as there’s no lock contention.

                In a Query Console window add this code:

                This will update /thread-1-output.xml (requiring an exclusive write lock), and read /for-read-lock.xml, requiring a non-exclusive read lock. We deliberately hold the transaction open with a sleep statement for 20 seconds so we can see the effects of locking if they occur.

                In a second Query Console window add:

                This will update /thread-2-output.xml (requiring an exclusive write lock) and read /for-read-lock.xml, again requiring a non-exclusive read lock.

                Now run the first block in the first window, and as quickly as you can, run the second block in the second window at the same time. You will see the second block returns almost immediately with something like:

                <?xml version="1.0" encoding="UTF-8"?>
                <root>Thread 2 finished<elapsed-time>PT0.0001122S</elapsed-time></root>

                The elapsed time shows the update returned almost instantly. However, the first update will not return for around 20 seconds. The point of this is that although they’re both updates, and they are both are reading the same document, /for-read-lock.xml , there is no contention. If there were, the second update would have to wait until the first update completed, and would therefore also take 20 seconds to complete.

                Example : Locking with contention

                Now we do the same thing, but using a different second thread.

                Here the server will take a read lock on /thread-1-output.xml and will require an exclusive write lock on /thread-2-output.xml. However here we will have contention – thread 2 is trying to read something that’s being updated elsewhere.

                If we again run the first block in the first window, and quickly run the second block in the second window, the second block will this time take around 20 seconds to complete:

                <?xml version="1.0" encoding="UTF-8"?>
                <root>Thread 2 finished<elapsed-time>PT20.3110441S</elapsed-time></root>

                The elapsed time shows it took 20 seconds to complete. This is because thread two blocks, waiting for read access on the exclusively locked /thread-1-output.xml

                Using xdmp:transaction-locks to identify blocking locks

                Now by inspection we can see in the code above that there is contention on /thread-1-output.xml. Sometimes the contention can be less clear. In version 9 MarkLogic introduced xdmp:transaction-locks which can help in troubleshooting problems. It requires a host name and transaction id as arguments. Add to this a small amount of XQuery and we can quickly use this to get more insight into locking problems.

                As before, we run thread 1 and the ‘bad’ thread 2, followed by (in another window):

                This iterates over all running transactions to show us our locks, sorting the longest running to the top.  My output is:

                Server : App-Services
                Transaction : 6499076740337829206
                Started at : 2018-02-02T10:39:59.1019897Z
                <transaction-locks xmlns="http://marklogic.com/xdmp/status/host">
                  <read>/for-read-lock.xml</read>
                  <write>/thread-1-output.xml</write>
                </transaction-locks>

                Server : App-Services
                Transaction : 10283815115034583414
                Started at : 2018-02-02T10:40:02.4272576Z
                <transaction-locks xmlns="http://marklogic.com/xdmp/status/host">
                  <read>/thread-1-output.xml</read>
                  <waiting>/thread-1-output.xml</waiting>
                </transaction-locks>

                The item <waiting>/thread-1-output.xml</waiting> in the second section shows I have a thread blocking on /thread-1-output.xml. Knowing this will aid me in diagnosing the source of my locking problem. Note that the problem could have been more subtle – perhaps I was reading all documents in a collection with thread one updating one document, and thread two another.

                Conclusion

                Whatever your requirements, with a little planning, it should be possible to avoid locking problems creating unexpected performance issues. Should you run into problems however, diagnostic tools should help you identify where the difficulties are. Finally, locks are ultimately a good thing, as without them we would not be able to write consistent and predictable applications. Understanding them allows you to benefit from their use, while avoiding unnecessary side effects.

                Introduction

                  To simplify the calculation, the documentation for disk space requirements section of the MarkLogic Installation guide states that starting from MarkLogic 8 (and continued in MarkLogic 9), the minimum disk space requirement is 1.5 times the total forest size for sufficiently large forests. Previously, the requirement was to maintain disk space that is 3x forest data size.

                In MarkLogic 8, we introduced (and continued in MarkLogic 9):

                1. The merge max size configuration parameter.
                  • With the merge-max-size set to 32GB (32GB was default for 7 and 8 and recommended and default for version 9 is 48GB), “Sufficiently large forest” is defined as a forest size of 128 GB or larger.  That is, a fully merged forest with no deleted fragments results in a forest that is at least 128 GB.   For a forest of this size, the disk space required is  192 GB (1.5 x 128 GB);
                  • For smaller forests (or forests that do not set the merge-max-size), roughly 3x disk space requirement still applies, due to the merge size requirements.
                2. Searches across stands now use multiple threads to improve speed.

                Stated most simply, the minimum disk space requirement for a forest is the greater of 192 GB or 1.5x times forest data size. 

                This article explains the calculation of the minimum disk space requirement, but please keep in mind that sufficient disk space beyond the bare minimum requirement should be available in order to handle influx of data into your system for at least the amount of time it takes to provision more capacity.  

                Assumptions

                Before we dive into how the minimum disk space requirement is calculated, let's briefly discuss some of the conditions that need to be met to make this calculation achievable:

                1. Assumption: "deleted" documents can be removed from a forest during a forest merge.
                  • There are database configuration settings that prevent deleted documents from being removed.  For example
                    • if you set a value for the database merge timestamp configuration, the forests will keep deleted document fragments for that period of time;
                    • If you set the database retain until backup setting, deleted fragments will not be removed until a full backup or an incremental backup is completed. 
                  • There are circumstances where MarkLogic keeps a merge window (typically one hour) for deleted fragments which may result in larger forests during that time.
                2. Assumption: Merges are always allowed to occur
                  • The database configuration allows for merge blackout periods.  During times of a merge blackout, deleted fragments are not removed.
                3. Assumption: Long running operations do not occur during times of heavy document inserts or updates.
                  • Long running operations may require obsolete stands within a forest to hang around until the operation is complete.
                  • A database backup can be a long running operation.   it is recommended to schedule backups during times of low document updates. 
                4. Assumption: If HA (High Availability) or DR (Disaster Recovery) solutions configured,  then there is sufficient network bandwidth and sufficient system stability for HA (forest replication) and DR (Database Replication) to stay in sync with minimum lag. 
                  • Storage requirements can increase significantly if HA and DR are configured; for example, if replication is paused, all of the un-shipped changes need to be retained on the master, so this can mean 2x to 4x the indexed data size.
                5. Assumption: MarkLogic Database restores to an Active database are not required.
                  • For database restores where data needs to be restored to an active database, you will need at least 2x indexed data size + 64 GB per forest. You can avoid the 2x requirement if you can clear the forest/database before restoring. 

                Details

                Once all of the assumptions are met, let's look at how we can calculate the minimum disk space required, taking into account:

                • deleted fragments - i.e. forest size with maximum number of deleted fragments before a merge is kicked off
                • in-flight merging
                • concurrent document update (or reindexing) during a merge

                This can be expressed as: 

                    minimum-disk-space-required = forest-max-size + merge-space + concurrent-update

                Forest Size

                The actual forest size varies when considering deleted fragments . The amount of variance depends on the merge-min-ratio setting (for example: a merge ratio of 2 can result in a forest with 1/3 fragments deleted)

                    forest-max-size = (1 + 1/merge-min-ratio) * minimum-forest-size

                Where minimum-forest-size is calculated by a fully merged forest with no deleted fragments. But even the minimum-forest-size can vary based on the raw document content size and index setting. 

                    minimum-forest-size = document-size * expected-index-expansion

                Index expansion varies by index settings and document content. We have seen index expansion from 0.75X to 5X.  The only way to estimate this value is by experimentation – with a sufficiently large representative sample data set so that stand overhead is insignificant. 

                When calculating disk space required, we need to account for the maximum possible size (i.e. forest size includes index expansion of documents and maximum deleted fragments.).

                    forest-max-size = (1 + 1/merge-min-ratio) * (document-size * expected-index-expansion)

                (Note: although stated as a maximum forest size, this value may still be lower than actual if merge assumptions not met.)  

                Merge Space

                During merging, there is a point in time where the old stands and new stand coexist on disk - the write must succeed before the old stand can be removed. There may be multiple merges occurring, but MarkLogic Server will with the merge-max-size configuration, the merges never require more than 1.33x the merge-max-size (the old stand is already taken into account in the forest size calculations):

                    1.33 * merge-max-size

                Concurrent Update

                There is a time lag from the time when merging begins and ends and you need space for the documents that can be updated/reindexed during that time (reindex is equivalent to a delete and an insert)

                     (merge-max-size / merge-rate) * update rate

                If you make the simplifying assumption that concurrent update (or reindexing) occurs 50% slower as merging, then this just becomes  

                    0.66 * merge-max-size

                Putting it Together

                So putting it all together in terms of forest-max-size

                    minimum-disk-space-required = forest-max-size + 2 * merge-max-size

                Remember that forest-max-size is a function of document content size, index expansion and retained deleted fragments (merge-min-ratio) per the equation presented earlier.

                Caveats

                Again, per our assumptions, there are conditions where the calculated minimum disk space requirement may not be sufficient.

                • If deleted documents are configured to be retained across merge
                • If merge blackout periods configured
                • If long running operations occur during times of document updates / inserts.
                • If HA or DR configured and replication lag occurs.
                • If a database restores are required.
                • If additional new content is loaded into the system, then the size of those additional documents needs to be included in the calculations.

                Out of Space

                What happens if MarkLogic Server does not have enough disk space?

                The most likely outcome is that merges will begin to fail and you will see an XDMP-MERGESPACE error in the error log.  It is also possible that forests will go offline.  If a forest goes offline, the database will also be offline, halting all access to the database.  When this happens, you will need to take manual corrective action to either free up some disk space or add more.

                Summary

                The minimum disk space requirement is forest-max-size + 2 * merge-max-size.   But there are many conditions, including Merge policy configuration, the one hour merge window, and long running operations that can cause deleted fragments and obsolete stands to be retained, resulting in larger than expected forest sizes and greater than expected disk space utilization. High Availability, Disaster Recovery, and Database backup / restore solutions will also require additional disk space to be available.

                It is always a good idea to give your system enough head room to avoid application or database outages and monitor your disk usage continuously to understand your trends in order to predict when your disk space allocation will be insufficient.

                Related articles:

                Recovering from low disk space

                Migrating to MarkLogic and understanding the 1.5x requirement

                 

                Value queries

                Summary

                Here we summarize some characteristics of value queries and compare to other approaches.

                Discussion

                Characteristics

                Punctuation and space tokens are not indexed as words in the universal index. Therefore, word-queries involving whitespace or punctuation will not make use of whitespace or punctuation in index resolution, regardless of space or punctuation sensitivity.

                Punctuation and space tokens are also not generally indexed as words in the universal index in value queries either. However, as a special exception there are terms in the universal index for "exact" value queries ("exact" is shorthand for "case-sensitive", "diacritic-sensitive", "punctuation-sensitive", "whitespace-sensitive", "unstemmed", and "unwildcarded"). "exact" value queries should be resolvable properly from the index, but only if you have fast-case-sensitive-searches and fast-diacritic-sensitive-searches enabled in the database.

                For field-word or field-value queries you can modify what counts as punctuation or whitespace via tokenizer overrides. This can turn what would have been a phrase into a single word.

                Outside of the special case given for exact value queries, all queries involving space or punctuation are phrase queries. Word and value search is not string matching.

                Space insensitive and punctuation insensitive do not mean tokenization insensitive. "foo-bar" will not match "foobar" as a value query or a word query, regardless of your punctuation sensitivity. Word and value search is not string matching.

                Stemming is handled differently between a word-query and value-query; a value-query only indexes using basic stemming.

                String range queries are about string matching. Whether there is a match depends on the collation, but there is no tokenization and no stemming happening.

                Exact matches

                If you want to do exact queries you can

                • Enable fast-case-sensitive-searches and fast-diacritic-sensitive-searches on your database and run them as value queries.

                or

                • Create a field with custom overrides for the significant punctuation or whitespace and run them as field word or field value queries.

                or

                • Create a string range index with the appropriate collation (codepoint, most likely) and run them as string range equality queries.
                Looking deeper

                As with all queries, xdmp:plan can be helpful: it will show you the questions asked of the indexes. If there is information from a query is not reflected in the plan, that will be a case where there might be false positives from index resolution (i.e., unfiltered search).

                For example, the plan for cts:search(/, cts:element-value-query(xs:QName("x"), "value-1", "exact")) should include the hyphen if you do have fast-case-sensitive-searches and fast-diacritic-sensitive-searches enabled in the database.

                JSON

                For purposes of indexing, a JSON property (name-value pair) is roughly equivalent to an XML element.  See the following for more details:

                    Creating Indexes and Lexicons Over JSON Documents

                    How Field Queries Differ Between JSON and XML

                References

                Introduction

                Slow journal frame log entries will be logged at Warning level in your ErrorLog file and will mention something like this:

                .....journal frame took 28158 ms to journal...

                Examples

                2016-11-17 18:38:28.476 Warning: forest Documents journal frame took 28152 ms to journal (sem=0 disk=28152 ja=0 dbrep=0 ld=0): {{fsn=121519836, chksum=0xd79a4bd0, words=33}, op=commit, time=1479425880, mfor=18383617934651757356, mtim=14445621353792290, mfsn=121519625, fmcl=16964678471847070106, fmf=18383617934651757356, fmt=14445621353792290, fmfsn=121519625, sk=10604213488372914348, pfo=116961308}
                
                2016-11-17 18:38:28.482 Warning: forest Documents journal frame took 26308 ms to journal (sem=0 disk=26308 ja=0 dbrep=0 ld=0): {{fsn=113883463, chksum=0x10b1bd40, words=23}, op=fastQueryTimestamp, time=1479425882, mfor=959797732298875593, mtim=14701896887337160, mfsn=113882912, fmcl=16964678471847070106, fmf=959797732298875593, fmt=14701896887337160, fmfsn=113882912, sk=4596785426549375418, pfo=54687472}
                
                2016-11-17 18:38:28.482 Warning: forest Documents journal frame took 28155 ms to journal (sem=0 disk=28155 ja=0 dbrep=0 ld=0):{{fsn=121740077, chksum=0xfd950360, words=31}, op=prepare, time=1479425880, mfor=10258363344370988969, mtim=14784083780681960, mfsn=121740077,fmcl=16964678471847070106, fmf=10258363344370988969, fmt=14784083780681960, fmfsn=121740077, sk=12062047643091825183, pfo=14672600}

                Understanding the messages in further detail

                These messages give you further hints on what is causing the delay; in most cases, you would probably want to involve the MarkLogic Support team in diagnosing the root cause of the problem although the table below should help with further interpretation of cause of these messages:

                Item Description
                sem time waiting on semaphore
                disk time waiting on disk
                ja time waiting if journal archive is lagged
                dbrep time waiting if DR replication is lagged
                ld time waiting to replicate the journal frame to a HA replica
                fsn frame sequence number
                chksum frame checksum
                words length in words of the frame
                op the type of frame
                time UNIX time
                mfor ID of master forest (if replica)
                mtim when master became master
                mfsn master forest fsn
                fmcl foreign master cluster id
                fmf foreign master forest id
                fmt when foreign master became HA master
                fmfsn foreign master fsn
                sk sequence key (frame unique id)
                pfo previous frame offset

                Further reading / related articles

                MarkLogic Server Logging: Slow Infrastructure Notifications

                Introduction

                As of MarkLogic Server versions 8.0-6.8 and 9.0-3, MarkLogic Server has added logging to note and help diagnose slowness.

                Details

                MarkLogic Server makes extensive use of file system and network resources of each host within the cluster. Because the performance of MarkLogic Server is dependent on the performance of the file systems and network, information is written to the server logs when interacting with these devices becomes slow. Effective with the release of MarkLogic Server versions 8.0-6.8/9.0-3 and beyond, the server will log an error message when an XDQP network protocol send packet, a file system journal write, a file system label write, or a mapped file sync takes longer than one second to complete. For operations where the size of the data involved is known, the write or send performance slower than 1 MB per second will be logged. Size is taken into consideration so that a large write or send doesn’t generate a log entry merely as a result of its size. Additionally, logging of these messages is contingent on the slow operation being less than half the average rate for the particular operation: the slowness of the operation has to be somewhat sudden for the slowdown to be of note. The average rate is derived from figures that can be found for forests in xdmp:forest-status(), and connection-based figures found in xdmp:host-status().

                Log messages will be written to the server log file ErrorLog.txt. In order to avoid the log being unnecessarily noisy, the following criteria are considered:

                • No message for any individual file or network connection will be reported more than once a minute

                • The message is logged at Notice level if the duration of the operation is less than two seconds

                • The message is logged at Warning level if the duration of the operation is greater than two seconds

                • If a message has been logged at the Notice level, the one minute period of time during which no message will be logged will be honored even if the same target device encounters a Warning level threshold during the period

                Some control of the frequency of the logging messages is available through trace events (see How to use diagnostic trace events):

                • The trace event 'No Slow Warnings' completely turns off this logging

                • The trace event 'Fewer Slow Warnings' doubles all the time limits listed above to decrease the number of log entries

                • The trace event 'More Slow Warnings' halves all the time limits listed above to increase the number of log entries

                • The trace event 'No Slow Warning Rate Threshold' removes the 1 MB/sec threshold above which a delay must take place before logging a message

                • The trace event 'No Slow Warning Interval' removes the one minute interval restriction between log messages for any individual file or network connection

                Examples of some of the log messages:

                2017-07-06 12:10:33.553 Warning: Slow fsync /var/opt/MarkLogic/Forests/doc-stress-F4/Label, 2.044 sec
                2017-07-06 12:11:50.868 Notice: Slow open /var/opt/MarkLogic/Forests/doc-stress-F10/00000183/Label, 1.659 sec

                2017-07-06 12:11:59.734 Warning: Slow utime /var/opt/MarkLogic/Forests/doc-stress-F10/Label, 2.006 sec

                2017-07-06 12:11:59.735 Warning: Slow fsync /var/opt/MarkLogic/Forests/doc-stress- F10/00000182/Obsolete, 2.278 sec
                2017-07-06 12:12:00.924 Notice: Slow utime /var/opt/MarkLogic/Forests/doc-stress-F19/Label, 1.061 sec

                2017-07-06 11:07:49.261 Warning: Slow sync_file_range /var/opt/MarkLogic/Forests/doc-stress-R12/Journals/Journal-20170706-175428-494356-14993636680966550-16521532899203342486-9000200, 512 KB in 35.05 sec
                2017-07-06 11:07:49.262 Warning: Slow fsync /var/opt/MarkLogic/Forests/doc-stress-R6/Label, 35.7 sec

                2017-07-06 00:34:32.645 Notice: Slow send 172.18.19.140:17800-172.18.19.207:7999, 282.8 KB in 1.353 sec; check host rh7-intel64-80-7.example.com

                While every system is different, some of the example messages above are for operations that should complete in milliseconds, so a warning of taking two seconds will often be slow by several orders of magnitude. For example, in replication configurations, fsync is performed against the Label file of each forest on each node once per second, so a slow fsync timing of two seconds is significantly out of line with expectations.

                Monitoring the Logs

                Slow infrastructure messages can and should be monitored as part of normal cluster management and maintenance. There is no exhaustive list that will sum up every possible resource or connection that can receive these messages, and additional entries may be added over time. Monitoring scripts or rules should take the following conditions into consideration:

                • Most of the messages will appear in the ErrorLog.txt file

                • An occasional message can appear in a specific port number error log (e.g., 8000_ErrorLog.txtwhen direct connect operations such as mlcp are used

                • The fourth token in the line will be Slow

                • The message level can be any of Info, Notice, or Warning

                • The name of the resource will follow in the fifth token

                • There will most always be a duration time, listed in seconds

                • There will usually be a rate listed in the scale most appropriate to the particular instance (B, K, MB, etc.)

                • There will always be a diagnostic pointing to the device/operation that is being deemed slow

                • There will sometimes be a diagnostic pointing to another host where the source of the problem may lie

                These guidelines should aid in producing monitoring scripts or rules for rule-based log monitoring applications.

                Introduction

                In the past, we have heard concerns from customers with regard to exactly how our scoring works.  A couple of examples from Stack Overflow include:

                As this area of the product has been a source of confusion in the past, the goal of this Knowledgebase article will be to collate a few additional resources on MarkLogic's scoring algorithm into one article and in doing so, offering some additional pointers on ways to make search scoring (hopefully) become less opaque to our users.

                Understanding relevance scoring

                The default relevance scoring mechanism in MarkLogic for a call to cts:search will be logtfidf (Term Frequency / Inverse Document Frequency). From our documentation:

                The logtfidf method of relevance calculation is the default relevance calculation, and it is the option score-logtfidf of cts:search. The logtfidf method takes into account term frequency (how often a term occurs in a single fragment) and document frequency (in how many documents does the term occur) when calculating the score.

                See: http://docs.marklogic.com/guide/search-dev/relevance#id_66768

                This can lead to an assumption that MarkLogic Server uses the following algorithm to define its relevance scoring (log is natural logarithm, base e):

                log(1/term frequency) * log(1/document frequency)

                However, this can lead to an over-simplified view of how the scoring really works. Refer to the documentation at https://docs.marklogic.com/guide/search-dev/relevance#id_74166, the logtfidf method (the default scoring method) uses the following formula to calculate relevance:

                log(term frequency) * (inverse document frequency)

                MarkLogic calculates its scores using scaled, stepped integer arithmetic.  If you look at the database status page for a given database, you may notice that one of the configuration options is called "tf normalization"; by default, this is set to scaled-log

                What this can mean is that - often for small data sets and documents - you may not see a lot of difference with regard to how scores are computed by the server.

                Our documentation describes the effect that tf normalization would have on scoring:

                The scoring methods that take into account term frequency (score-logtfidf and score-logtf) will, by default, normalize the term frequency (how many search term matches there are for a document) based on the size of the document. The idea of this normalization is to take into account how frequent a term occurs in the document, relative to the other documents in the database. You can think of this is the density of terms in a document, as opposed to simply the frequency of the terms. The term frequency normalization makes a document that has, for example, 10 occurrences of the word "dog" in a 10,000,000 word document have a lower relevance than a document that has 10 occurrences of the word "dog" in a 100 words document. With the default term frequency normalization of scaled-log, the smaller document would have a higher score (and therefore be more relevant to the search), because it has a greater 'term density' of the word "dog". For most search applications, this behavior is desirable.

                Source: https://docs.marklogic.com/guide/search-dev/relevance#id_40969

                Example: putting it to the test

                Consider the following XQuery example:

                If you view the example documents above, it could be said that the respective densities of the word "fun" (as it appears within each given document) are:

                /doc1.xml
                1/14
                /doc2.xml
                3/7
                /doc3.xml
                1/3
                /doc4.xml
                1/1
                /doc5.xml
                1/3

                If you were to run this code, you could therefore expect the documents to be ordered as follows (when ordered by relevancy):

                1. /doc4.xml
                2. /doc2.xml
                3. /doc3.xml and /doc5.xml (tied)
                4. /doc1.xml

                Instead what you see from the search:search output is:

                URI Score
                /doc2.xml 3072
                /doc5.xml 2816
                /doc4.xml 2048
                /doc1.xml 2048
                /doc3.xml 2048

                This result suggests that the formula being used is not density but raw count for term in question (for example: we see that the term "fun" occurs 3 times in /doc2.xml; 2 times in /doc5.xml and once each in docs /doc4.xml, /doc1.xml, /doc3.xml).

                What is really happening?

                Here using relevance trace is useful to see what's really happening for the score calculations:

                for $x in cts:search(fn:doc(), "fun", "relevance-trace")
                return cts:relevance-info($x)

                Running the above will give you a little more detail as to how MarkLogic Server is deriving the score. The format output for the first result looks like this:

                Notes on the Inverse Document Frequency calculation

                Term Frequency concerns itself with how often a term appears in the document

                Inverse document frequency divides that by the fraction of the documents in which the term occurs.

                The IDF portion of the equation attempts to deal with the relative importance of terms. Additionally, it:

                  • Only matters when there are multiple terms in a query
                  • Depends on statistics across an entire specific collection
                  • Testing on small collections may give misleading answers

                Further reading

                Introduction

                A common problem that is often encountered is when code which otherwise worked fine in a development environment is deployed to production and suddenly begin to show apparent signs of performance degradation. There could be many reasons for this, but a common cause to check for is whether any queries are unnecessarily running in update mode.  This Knowledgebase article will cover a strategy for diagnosing a situation where other operations are being held back whilst waiting for updates to complete.  

                Write Locks in MarkLogic: why they can cause performance issues

                A transaction obtains a write lock when it wants to update a document. Read locks held by other transactions block write locks, as do other write locks. Write locks are also known as exclusive locks because they block all other transactions from accessing a document. Only one transaction can hold a write lock on a document at a time.

                See: https://developer.marklogic.com/blog/how-marklogic-supports-acid-transactions

                Enabling the Lock Trace diagnostic trace event to identify write locks

                MarkLogic Server contains a useful trace event that will generate content in your ErrorLog.txt that can provide an indication as to whether the root cause of the performance issue is due to write locks.

                You can configure trace events at the group level using the admin GUI on port 8001:

                1. Go to Configure > Groups > [Group Name] > Diagnostics

                2. Ensure that trace events activated is set to true and enter Lock Trace into the input field as demonstrated below:

                Interpreting the Log entries

                When you experience the performance issue, look in the Error Log for output from the Lock Trace event; you should see something like this:

                2015-04-28 10:02:05.068 Info: [Event:id=Lock Trace]
                forest=Documents
                uri=#2105214633188652281
                waiting=3970525320932039109
                holding=300259500568275128
                The items listed in the log output can be interpreted as follows:
                uri
                This is the URI Key
                waiting
                This is the id of the query that is waiting for the lock to be released
                holding
                This is the id of the query that is holding the lock

                These will be written to ErrorLog.txt file if the Lock Trace diagnostic trace event is enabled.

                Disable when done: 

                Since the Lock Trace event is fairly verbose, we recommend that you remove this trace event or disable trace events when you complete your testing or diagnostics. 

                Further reading

                Introduction

                This article aims to provide a basic understanding of the advanced options for the MarkLogic Windows ODBC driver.

                Description of advanced options

                The main advanced options are:

                Use Declare/Fetch

                One should turn on Use declare/Fetch option to use memory more efficiently.
                If set to true, the driver automatically uses declare cursor/fetch to handle SELECT statements. This is a great advantage, especially if you are only interested in reading and not updating. Option results in the driver buffering a certain number of rows at a time, making it more memory efficient.
                If set to false, cursors will not be used, and the driver will retrieve the entire result set. This is inefficient for large tables and may use all Windows memory/resources.

                Cache Size Cache size determines how many rows to fetch at a time.

                When using cursors, this is the row size of the tuple cache (default is 100 rows). This option is only meaningful when using cursors (use Declare/Fetch must be on).
                It should be tested on the customer environment to find a value that balances performance and memory usage.
                One may start with 1024 cache size as starting point; However one should find optimal value based on optimal performance and memory usage in respective environment.

                Data Type Options

                Option affects how some data types are mapped:

                Text and Unknowns are mapped as LongVarChar.
                Bools as Char: Bools are mapped to SQL_CHAR, otherwise to SQL_BIT. Option should be turned on unless one need to map bools to SQL_BIT.

                ReadOnly

                Check box to make data source read-only. 

                Show System Tables

                Check box to access system tables in BI tool or Microsoft Access.

                Updatable Cursors

                Check to enable updateable cursor emulation in the driver. Recommended to keep uncheck.

                True is -1

                Represent TRUE as -1 for compatibility with some applications. Recommended to keep uncheck.

                Int8 As

                Define what datatype to report int8 columns as. Keep it numeric by default.
                        

                Level of rollback on errors

                Specifies what to rollback should an error occur.

                • Nop(0): Don't roll back anything; let the application handle the error.
                • Transaction(1): Roll back the entire transaction. Keep this on.
                • Statement(2): Roll back the statement.
                Connect Settings

                The driver sends these commands to the backend upon a successful connection. It sends these settings AFTER it sends the driver "Connect Settings". Use a semi-colon (;) to separate commands. This can now handle any query, even if it returns results. However, the results will be thrown away. Recommended to keep blank.

                The 'Global setting' dialog box

                This dialog allows one to specify pre-connection/default logging options. The logging settings have impact on performance and should only be turned on during debugging.

                CommLog (C:\psqlodbc_xxxx.log - Communications log) Logs communications to/from the backend. Recommended for debugging problems.
                MyLog (C:\mylog_xxxx.log - Detailed debug output) Logs debug messages. Recommended for debugging problems with the driver.
                MSDTCLog (C:\pgdtclog\mylog_xxxx.log - MSDTC debug output) Logs MSDTC debug messages. Recommended for debugging problems with the MSDTC.
                Specification of the holder for log outputs Adjustment of write permission.

                Manage DSN Dialog Box

                This dialog allows one to select ODBC driver to use with the connection. Note that this may not work with third-party drivers, one should use MarkLogicSQL driver.






                Introduction

                MarkLogic Sever sends log messages to both the operating system log and the MarkLogic Server file log. This server log file is maintained as a simple text file. You may view/access the log files from the 'Log' tab on the main page of the Admin UI.

                Each file is stored in the MarkLogic Server data directory for your platform. The default location for MarkLogic ErrorLog files for a Linux environment is /var/opt/MarkLogic/Logs and Windows is  C:\Program Files\MarkLogic\Data\Logs. The server gives an option of specifying the minimum log level message sent to the log file. Based on the application requirement, the default level can be changed by navigating to Admin UI -> Configure -> Groups -> {Select Group} -> 'file log level' to log additional details in the ErrorLog.txt.

                This article gives information about how to interpret the logs with any index setting changes made on a database or when rebalancer activity is triggered.

                 

                Details

                1) Database Merging activity

                Database merges are a way of self-tuning the performance of the system, and MarkLogic Server continuously assesses the state of each database to see if it would benefit from self-tuning through a merge. As part of merging, multiple stands are combined for performance reasons, disk space is reclaimed and Indexes and lexicons are combined and re-optimized based on their new size. Since merges can be resource intensive (both disk I/O and CPU)   we recommend controlling when merges occur and/or when they do not occur as well watching for the unusual merge activity.

                The Server provides multiple options to review the merging activity –

                a. Monitoring History Dashboard

                MarkLogic Server’s Monitoring History feature gives information about the merging activity over a period of time along with Merge read/write rates indicating the average of reading or writing merge data from disk.

                b. The Database Status page

                The Database Status page lists the merge state, which indicates if a merge is going on, shows the size of the merge, and estimates how long it will take the merge to complete.

                 

                c. MarkLogic Server ErrorLog.txt

                MarkLogic Server logs INFO level messages to the ErrorLog.txt file whenever a merge begins, completes, or is canceled. Additionally, there are other log messages that are logged at more detail logging levels during a merge.  When there are any database index setting changes, the reindexer activity will start shortly and you will be able to see "Config" and "Debug" level log messages as shown below:

                2015-06-22 10:34:50.503 Config: Wrote /var/opt/MarkLogic/databases.xml
                2015-06-22 10:34:50.503 Config: Loading /var/opt/MarkLogic/databases.xml

                2015-06-22 10:34:55.550 Debug: Detecting indexes for database Modules
                2015-06-22 10:34:55.550 Debug: Detecting indexes for database Documents

                During a merge, the merge rates are reported. The rate reported in the Merging status is the merge rate of all merges on the forest, averaged over the last few seconds. Low merge rates are usually an indication of not getting enough IO throughput to handle the load.

                2015-03-12 12:56:06.633 Info: Merged 15 MB in 65 sec at 0 MB/sec to /opt/marklogic-forests/Forests/...
                2015-03-12 12:56:18.083 Info: Merged 42 MB in 77 sec at 1 MB/sec to /opt/marklogic-forests/Forests/...
                2015-03-12 12:56:27.754 Info: Merged 49 MB in 87 sec at 1 MB/sec to /opt/marklogic-forests/Forests/...

                This Knowledge Base article on Server IO requirements will be helpful on how to provision the I/O capacity of their MarkLogic installation.

                Also, if there’s continuous ingestion, the database will merge continuously and merging never ends. As a result, the logs won’t be able to give the total reindexing time. However, the Admin UI will give you an estimate based on a search estimate because it is designed to be a fast report.

                 

                2)Database rebalance activity

                A database rebalancer consists of two parts: an assignment policy for data insert and rebalancing and a rebalancer for data movement. The rebalancer runs on each forest and consults the database's assignment policy to determine which documents do not 'belong to' this forest and then pushes them to the correct forests. In addition to the rebalancer periodically rebalancing the database, the following events trigger the rebalancer process:

                • Any configuration changes to the database, such as adding a new forest or retiring an existing forest.
                • Upon completion of a restore operation on the database.
                • Upon completion of a backup operation on the database.

                The Server provides multiple options to review the rebalancer activity –

                a. Database Status page –

                When the rebalancer is enabled on the database, you can check the state of the rebalancer, along with an estimated completion time, on the Database Status page.

                 

                If the rebalancer is disabled, the Show Rebalance button on the Database Status page will give number of fragments that are pending rebalancing.

                b. MarkLogic Server Logs -

                When a rebalancer activity is started on a database, a "Debug" level alert is logged that looks like shown below:

                2015-06-22 11:29:48.390 Debug: Rebalanced 10000 fragments in 7 sec at 1379 fragments/sec on forest Documents with the bucket policy.
                2015-06-22 11:29:53.448 Debug: Rebalanced 10000 fragments in 5 sec at 1977 fragments/sec on forest Documents with the bucket policy.

                Additionally, you may also use the 'Rebalancer State' trace event that gives status on the rebalancer running for all the forests:

                2015-06-22 11:46:10.514 Info: [Event:id=Rebalancer State] The rebalancer on Documents is starting
                2015-06-22 11:46:10.516 Info: [Event:id=Rebalancer State] The rebalancer on Documents starts running

                3)Database reindex activity

                a. Database Status page –

                Reindexing work that is incomplete or pending (if the reindexer is disabled) can be checked in the Admin UI, at the database status page, similar to rebalancing.

                b. MarkLogic Server Logs -

                When reindexing, Debug-level messages will note progress in the reindexing, while a final Info-level message gives a total of the work done:

                2023-05-15 17:07:06.057 Debug: Reindexed range-indexes 10000 fragments in 4 sec at 2391 fragments/sec on forest Documents-f1
                2023-05-15 17:07:10.113 Debug: Reindexed range-indexes 10184 fragments in 4 sec at 2511 fragments/sec on forest Documents-f1
                2023-05-15 17:07:11.014 Info: Reindexed range-indexes 22001 fragments in 9 sec at 2407 fragments/sec on forest Documents-f1

                c. Reindexer Preview with xdmp:forest-counts -

                You can also check the incomplete/pending work with a call to xdmp:forest-counts like

                xdmp:forest-counts(xdmp:database-forests(xdmp:database("Documents")), (),("preview-reindexer"))

                Introduction

                In early 2015, a significant security vulnerability was found in the glibc package.  Glibc is an implementation of the standard C library and is a core part of all our currently supported Linux distributions. A code audit was performed by the Qualys research group and the following security advisory was made available:

                https://www.qualys.com/research/security-advisories/GHOST-CVE-2015-0235.txt

                What does the GHOST vulnerability do?

                It is called as the GHOST vulnerability as it can be triggered by the GetHOST functions.  A blog post released by Qualys describes the vulnerability as:

                "a buffer overflow in the __nss_hostname_digits_dots() function of glibc. This bug can be triggered both locally and remotely via all the gethostbyname*() functions. Applications have access to the DNS resolver primarily through the gethostbyname*() set of functions. These functions convert a hostname into an IP address."

                https://community.qualys.com/blogs/laws-of-vulnerabilities/2015/01/27/the-ghost-vulnerability

                What do I need to do to guard against this?

                We recommend starting by briefly reading the following articles to understand the changes that have been made and if you manually patch your systems, ensure that you update your glibc library to ensure the vulnerability is patched:

                https://access.redhat.com/security/cve/CVE-2015-0235

                https://rhn.redhat.com/errata/RHSA-2015-0099.html

                If you're using another Linux distribution, start by looking at the references linked on this page and if you're in any doubt, please contact your vendor directly for advice:

                https://community.qualys.com/blogs/laws-of-vulnerabilities/2015/01/27/the-ghost-vulnerability

                What are MarkLogic doing about this?

                We are well aware of the issue and advise that all customers always keep their systems up-to-date in order to guard against this and other similar vulnerabilities.

                The performance of MarkLogic will not be impacted by the patched glibc library, so updating as per the instructions provided by your vendor is recommended.

                In addition, we are adding an additional layer of security into the product to shield unpatched systems from this vulnerability.  This patch is available immediately for any users who have already upgraded to MarkLogic 8 and we have already patched MarkLogic 6 and 7 and the next available releases (6.0-6 and 7.0-5 at the time of writing) will work to guard against this vulnerability.

                For patched releases of the product, if anyone attempts to exploit the vulnerability, the server will terminate the query and throw an exception.

                If you run MarkLogic 8 on an unpatched system, you will see the following message when you start MarkLogic on the host:

                YYYY-MM-DD HH:MM:SS.sss Warning: Guarding against detected Linux glibc GHOST vulnerability

                MarkLogic 8 is available for download at:
                http://developer.marklogic.com/products

                Introduction

                This KnowledgeBase article will cover the use of journal files by MarkLogic server. The concept of journaling is covered in many places in our documentation .  There are sections on Backing Up Databases with Journal Archiving and Restoring a Database from a Backup .

                The aim of this article is to augment the online documentation and to offer a deeper look into what goes into the journal files. It will also cover how MarkLogic Server uses journals to maintain data. It is structured in the form of questions and answers based on real customer questions we have received in the past.

                How does MarkLogic maintain journals? How long are journals kept by MarkLogic before they are deleted?

                At a high level, transactions work like this:

                As soon as any transaction takes place, the first thing that happens is the entry is written to the journal file for the corresponding forest(s) involved in that transaction.

                The journal file contains enough information for the document (also known as a fragment) to be written to disk. It's important to note that for the sake of brevity, no information pertaining to indexes is ever found in the journal; all necessary index information is generated later in the transaction by MarkLogic Server.

                An in-memory stand is created for that document. The in-memory stand contains the data as it would be within the forest, including indexes based on the current index settings from the time at which the transaction took place. At some point in time, in-memory stands are flushed to disk to become on-disk stands in the forest and a checkpoint is made against the journal to record the forest state at that point in time.

                At the time of this writing, journal sizes are set to 1024 MB by default and MarkLogic will maintain up to two of these files (per forest) at any given time. A journal file will be discarded when all data pertaining to transactions has successfully been checkpointed to disk as on-disk stands.

                Note that only data that requires indexing is held in journal files; binary files are managed outside journals (aside from retaining a pointer to the location of the binary file).

                If we take a backup with journal archiving enabled will that backup contain all the journal files since the creation of the database?

                No, journal frames are archived going forward from the backup.

                This is discussed at Backing Up Databases with Journal Archiving where it says:

                The backup/restore operations with journal archiving enabled provide a point-in-time recovery option that enables you to restore database changes to a specific point in time between full backups with the input of a wall clock time. When journal archiving is enabled, journal frames are written to backup directories by near synchronously streaming frames from the current active journal of each forest.

                Note: any backup taken without journal archiving enabled will not contain journals.

                The following KnowledgeBase article may offer some further insight into how a restore would work:

                https://help.marklogic.com/Knowledgebase/Article/View/68/0/how-does-point-in-time-recovery-work-with-journal-archiving

                Does MarkLogic offer anything that allows for incremental (partial) backups?

                Yes, starting with Version 8.  You may want to start by looking at the incremental backup feature in our documentation

                I want to understand transactions in more detail, can you recommend a place to start?

                The following KnowledgeBase article covers transaction timestamps, how transactions are managed and the difference between read queries and updates:

                https://help.marklogic.com/Knowledgebase/Article/View/102/0/read-only-queries-run-at-a-timestamp--update-transactions-use-locks

                Can I use the same directory for all the databases on the cluster?

                You should not use the same directory for the backup of more than one database backup.  Each database should use a different backup directory.  

                When you create a scheduled backup in the Admin UI the backup path field specifies the value as:

                The backup directory pathname for this database. Each database should use a different backup directory.

                Are there any impacts when restoring with Security, Schemas when using journal archiving?

                From the documentation: https://docs.marklogic.com/guide/admin/backup_restore#id_51602

                "If journal archiving is enabled, you cannot include auxiliary forests, as they should have their own separate backups."

                As a general rule, we would recommend making the Security, Schemas and Modules backups last; perform all the big content database backups first, then at the end do the smaller backups of the auxiliary databases.

                In the event of a disaster, you can recreate the cluster and restore the most recent Security, Schemas and Modules and then anything else you restore thereafter will have everything it needs already in place in on the host / cluster.

                Introduction

                Here we will describe how to see in detail the effect of running the rebalancer, before it is enabled.

                Admin UI

                The Admin UI lets you check the state of the rebalancer, along with an estimated completion time if running, on the Database Status page. See Checking the Rebalancer Status.

                Previewing the rebalancer

                xdmp:forest-counts

                For a more detailed look at the rebalancing, including how fragments will move between the forests, call xdmp:forest-counts with the 'preview-rebalancer' option.

                For example, to get all the values for the forests in the Documents database, running

                 xdmp:database-forests (xdmp:database ('Documents')) ! xdmp:forest-counts (., (), ('preview-rebalancer'))

                will return a <forest-counts xmlns="http://marklogic.com/xdmp/status/forest"> element for each forest, including information on how many fragments will be moved and their destinations.

                HTML table output

                The XQuery code below can be run in the query console to output an HTML table with the rebalancing stats from xdmp:forest-counts. Just change the value of $DATABASE from 'Documents' to the name of the database you wish to check:

                Uses

                In database replication with the bucket and legacy policies, the replica-cluster forest order should match the master-cluster forest order, to avoid rebalancing if/when replication is broken. See Database Replication and Rebalancing, replication and forest reordering.

                The above technique can be used to check if the replica database will rebalance after replication is broken.

                This check can also be used after administrative work on forests to see if there is any resultant rebalancing.

                References

                What is TDE?

                Template Driven Extraction (TDE) allows you to:

                • Define a relational lens over your document data, so you can query your data using SQL or the Optic API.
                • Use templates to define a semantic lens, specifying which values from a document make up triples in the triple index.
                • Generate rows and triples from ingested documents based on predefined templates that describe the following:
                  • The input data to match
                  • The data transformations that apply to the matched data
                  • The final data projections that are translated into indexed data.
                • Access the data in your documents in several ways, without changing the documents themselves.

                Importance of user roles:

                • All operations on Template documents are controlled by:
                  • http://marklogic.com/xdmp/tde collection
                    • A protected collection that contains TDE template documents
                  • tde-admin role
                    • Required to access the TDE protected collection
                  • tde-view role
                    • Required to view documents in the TDE protected collection
                • Insertion/Deployment:
                  • A Deployment needs a tde-admin role

                More about TDE here.

                Inserting templates/views: Tutorial

                Updating a View:

                If you want to create a view to support more than one context or scope, you can update the corresponding view declaration. This can be done either by updating the existing template with the updated view declaration or by creating a new template with the updated view declaration using the same view name.

                Before understanding how updating a view is done, one must be familiar with the concept of a 'viewLayout'. A view declaration comes with a viewLayout option which can be set to one of these two options: 'identical' or 'sparse'.

                • Identical: A viewLayout is identical by default unless explicitly set to sparse. When a viewLayout is identical, it expects all the templates created in the future using the same view name to be consistent with the view declaration of the existing view
                • Sparse: A sparse viewLayout allows you to update an existing view with a new declaration with added/removed columns

                While in most cases it is not possible to update a view if the viewLayout is identical, this is not true if a view is associated with only one template. In this specific scenario, you can indeed update an identical viewLayout without error.

                In contrast, when a view is shared between multiple templates, the ability to update it either by re-inserting an existing template associated with that view or by creating a new template depends on whether that viewLayout is identical or sparse.

                • When the viewLayout is shared between multiple templates and identical:
                  • Updating the view is not allowed and results in the error Invalid TDE template: TDE-INCONSISTENTVIEW
                  • This is because the new/updated templates view declaration is expected to be consistent with the existing one
                • When the viewLayout is shared between multiple templates and sparse:
                  • Updating the view is allowed provided it satisfies the following conditions:
                    • All 'not-nullable' columns from the existing view must be a part of the updated view
                    • The 'nullable' field of all the new columns being added in the updated view declaration must be set to 'true' - this is effectively indicating that these columns are optional

                More information about Creating Views from Multiple Templates here

                Best Practices:

                Given the complexity involved in the process of updating views, it is recommended to follow these best practices when creating views:

                • It is very important to understand the goal of creating a view. How will it be used now? How is it expected to be used in the future?
                • Make sure to set the viewLayout to the correct option at creation as this ultimately informs whether or not a view can support more than one context or scope in the future.
                • If one wishes to opt for a sparse viewLayout, it is very important to set the 'nullable' field to the correct option when declaring columns. It is not possible to remove columns after they have been created unless they're 'nullable'.
                • If you have created a view that you'd later like to alter or edit, and that view falls under a scenario where alterations or edits are no longer allowed, to make further changes you will need to delete then recreate the relevant view.

                Introduction

                This KB article is for those customers who are willing to upgrade their DHS (Data Hub Service) Data Hub version from Data Hub 5.1.0 (or earlier) to Data Hub 5.2.x+ on AWS. 

                Note: This process only applies for requests to MarkLogic Support to upgrade the Data Hub version on a DHS AWS service.

                Details

                For customers who want to upgrade their DHS Data Hub version from Data Hub 5.1.0 (or earlier) to Data Hub 5.2.x in DHS AWS, they should be made aware of the following.

                The user can still upgrade to Data Hub 5.2.x but with the following caveats:

                Old DHS Roles DH 5.2 Roles
                Flow Developer data-hub-developer
                Flow Operator data-hub-operator
                data-hub-monitor
                Endpoint Developer data-hub-developer
                Endpoint User data-hub-operator
                Service Security Admin

                data-hub-security-admin
                data-hub-admin
                pii-reader

                  To determine which Data Hub version customers can upgrade to, see Version Compatibility in the DHS AWS documentation.
                  - AWS https://docs.marklogic.com/cloudservices/aws/refs/version-compatibility.html

                  Summary

                  Internally, MarkLogic Server maps URIs to hash values. Hash values are just numbers.  For internal operations, numbers are easier to process and are more performant than strings. We refer the URI hash as a URI Key.   

                  Details

                  Where would I see a URI key?

                  Sometimes, URI Keys will appear in the MarkLogic Error Logs.  For example, the MarkLogic Lock Manager manages document locks. Internally, the lock manager in each forest doesn't deal with URIs, it only deals with URI keys. When logging messages, the lock manager helpfully tries to turn the URI key into a URI to be more human readable. It does that by looking up and retrieving the document URI matching that URI key.  If the reporting forest doesn't have a document to match the URI key, it will reports a URI key instead of a URI.

                  For example, if the 'Lock Trace' trace event is enabled, you may see events logged that look like either of the following lines:

                  2015-03-18 01:53:17.576 Info: [Event:id=Lock Trace] forest=content-f1 uri=/cache/151516917/state.xml waiting=11744114292967458924 holding=15120765280191786041

                  2015-03-18 01:53:17.576 Info: [Event:id=Lock Trace] forest=content-f1 uri=#7734249069814007397 waiting=11744114292967458924 holding=15120765280191786041

                  The first line shows a URI (/cache/151516917/state.xml), and the second gives instead a URI key (7734249069814007397). When a URI key is reported as in this example, one of the following 2 will be true:

                  • The reporting action may be restricted to a single forest and the referenced document for the URI (key) may be in a different forest; or
                  • The document may not exist at all. An example where this might occur is when a Lock is acquired by an update before the document is actually inserted, or xdmp:lock-for-update can lock URIs that aren’t in the database without ever creating a document.

                  How can I find a URI key for a URI?

                  To can turn a URI ($uri) into a URI key using the following XQuery code

                  xdmp:add64(xdmp:mul64(xdmp:hash64($uri),5),xdmp:hash64("uri()") 

                  You may want to generate the URI key in order to scan an Error Log file for reference to that key.

                  How can I find the URI or document for a URI key?

                  You can check the entire database by using cts:uris with a cts:term-query and that URI key.  As an example, the following XQuery code

                  xquery version '1.0-ml';
                  let $uri := '/foo.xml'
                  let $uri-key := xdmp:add64(xdmp:mul64(xdmp:hash64($uri),5),xdmp:hash64("uri()"))
                  return cts:uris ((), (), cts:term-query ($uri-key))

                  returns /foo.xml

                   

                  Summary

                  This article describes the errors thrown when decoding URLs and how to detect invalid characters to avoid the errors

                  Details

                  When decoding certain URLs using xdmp:url-decode(), it is possible that certain characters will cause one of two errors to be thrown. 

                  1. XDMP-UTF8SEQ is thrown if the percent-encoded bytes do not form a valid UTF-8 octet sequence. A good description of UTF-8 can be found at: https://en.wikipedia.org/wiki/UTF-8 
                  2. XDMP-CODEPOINT is thrown if the UTF-8 octet sequence specifies a Unicode codepoint invalid for XML.

                  The specification for the Uniform Resource Identifier (URI): Generic Syntax can be found here: https://tools.ietf.org/html/rfc3986. In particular, the following section explains why certain characters are invalid: "Non-ASCII characters must first be encoded according to UTF-8 [STD63], and then each octet of the corresponding UTF-8 sequence must be percent-encoded to be represented as URI characters."

                  The code below can be used to detect invalid characters.  Make sure to remove any invalid characters prior to URL decoding.

                  (codepoint <= 0x8) ||
                  (codepoint >= 0xb && codepoint <= 0xc) ||
                  (codepoint > 0xd && codepoint < 0x20) ||
                  (codepoint >= 0xd800 && codepoint < 0xe000) ||
                  (codepoint > 0xfffd && codepoint < 0x10000) ||
                  (codepoint >= 0x110000)

                  Introduction

                  There are two ways of leveraging SSDs that can be used independently or simultaneously.

                  Fast Data Directory

                  In the forest configuration for each forest, you can configure a Fast Data Directory. The Fast Data Directory is designed for fast filesystems such as SSDs with built-in disk controllers. The Fast Data Directory stores the forest journals and as many stands as will fit onto the filesystem; if the forest never grows beyond the size of the Fast Data Directory, then the entire forest will be stored in that directory. If there are multiple forests on the same host that point to the same Fast Data Directory, MarkLogic Server divides the space equally between the different forests.

                  See Disk Storage.

                  Tiered Storage (licensed feature)

                  MarkLogic Server allows you to manage your data at different tiers of storage and computation environments, with the top-most tier providing the fastest access to your most-critical data and the lowest tier providing the slowest access to your least-critical data. As data ages and becomes less updated and queried, it can be migrated to less expensive and more densely packed storage devices to make room for newer, more frequently accessed and updated data.

                  See Tiered Storage.

                   

                  Introduction

                  MarkLogic AppServer supports several authentication methods such as Basic, Digest, Kerberos, and in more recent years, SAML and Certificate-based authentication.

                  The two most common authentication methods are Basic and Digest authentication and the choice of which to use has often come down to security considerations; Basic Authentication uses a simple Base64 encoding to convert the userid and password in an HTTP Authorization header.

                  Unfortunately, the encoding process is not secure and it is a minor procedure to decode the header to reveal the actual userid and password.

                  To guard against the weaknesses of the Basic Authentication encoding process, Digest Authentication was developed to make use of new cryptographic hashing advancements and nonce values to prevent replay attacks and this became the authentication choice for most MarkLogic users wishing to secure access to their AppServers.

                  However, over time Digest Authentication has not been without problems; many of the security options specified within the Digest Authentication protocol are considered optional and, therefore, subject to implementation and compatibility differences between software products.

                  MarkLogic Server: Hardening Digest Authentication

                  With security in mind, MarkLogic made a number of changes to harden the Digest Authentication protocol used within the server from the 9.0-3 release and above; specifically, MarkLogic server now verifies the nonce, checks that the nonce is not being used as part of a replay attack, and verifies that the URI in the Authorization header is the same as the originating request URI.

                  Unfortunately, this has caused issues with some client software and libraries, which were not themselves securely implementing the Digest protocol. Most issues can usually be addressed by ensuring you are running a current version of your client software; however, there are occasions where problems can still occur, such as being repeatedly prompted to enter your userid and password.

                  Considerations for introducing HTTPS / TLS

                  If you are affected by Digest Authentication issues with MarkLogic, it is worth considering switching back to Basic Authentication with the addition of securing the connection between the client and MarkLogic AppServer using TLS, i.e., Basic Authentication over HTTPS.

                  This will mitigate against the one original reason for using Digest over Basic Authentication in that it will prevent the Basic Authorization header from being disclosed and thus prevent decoding from taking place. In addition, unlike Digest Authentication, Basic Authentication over HTTPS not only protects the user credentials but also encrypts the payload being transmitted, which in itself may also contain sensitive information.

                  If you are still not convinced to convert from Digest Authentication to Basic Authentication over HTTPS then it is also worth considering the fact that the Digest Authentication protocol is probably nowhere near as secure today as you might think, to the point that Digest-MD5 - on which most the HTTP Digest Authentication is based - is now considered obsolete and deprecated due to a high number of security deficiencies that make it vulnerable to attack as detailed in RFC-6331 

                  From RFC-6331 section 6, 7 and 8

                  6. DIGEST-MD5 outer hash (the value of the "response" directive) does not protect the whole authentication exchange, which makes the mechanism vulnerable to "man-in-the-middle" (MITM) attacks, such as modification of the list of supported qops or ciphers.

                  7. The following features are missing from DIGEST-MD5, making it insecure or unsuitable for use in protocols:

                  A. Channel bindings [RFC5056].

                  B. Hash agility (i.e., no easy way to replace the MD5 hash function with another one).

                  C. Support for SASLPrep [RFC4013] or any other type of Unicode character normalization of usernames and passwords. The original DIGEST-MD5 document predates SASLPrep and does not recommend any Unicode character normalization.

                  8. The cryptographic primitives in DIGEST-MD5 are not up to today's standards, in particular:

                  A. The MD5 hash is sufficiently weak to make a brute force attack on DIGEST-MD5 easy with common hardware [RFC6151].

                  B. The RC4 algorithm is prone to attack when used as the security layer without discarding the initial key stream output [RFC6229].

                  C. The DES cipher for the security layer is considered insecure due to its small key space [RFC3766].

                  Note that most of the problems listed above are already present in the HTTP Digest authentication mechanism.

                  References

                  Tiered Storage

                  MarkLogic Server allows you to manage your data at different tiers of storage and computation environments, with the top-most tier providing the fastest access to your most critical data and the lowest tier providing the slowest access to your least critical data.

                  MarkLogic Server tiered storage manages data in partitions. Each partition consists of a group of database forests that share the same name prefix and the same partition range

                  The range of a partition defines the scope of element or attribute values for the documents to be stored in the partition. This element or attribute is called the partition key. The partition key is based on a range index, collection lexicon, or field set on the database. The partition key is set on the database and the partition range is set on the partition, so there can be several partitions in a database with different ranges. 

                  MarkLogic Server documentation covers a detailed example on how to use range index as the partition key for tiered storage. 

                  This article provides a generic and simple example of using a collection lexicon as the partition key.

                  Collection Lexicon with Tiered Storage

                  Consider a database 'test-db' with 4 forests that are grouped into 2 partitions. Following are the necessary configuration requirements to setup this database for tiered storage. These are settings that can be configured on the admin UI database configuration page (Admin UI - > databases -> {database-name})

                  - set 'rebalancer enable' to true
                  - set 'Locking' to strict
                  - set 'Rebalancer Assignment Policy' to range
                  -
                  set 'Collection lexicon' to
                  true

                  Under the assignment policy, choose 'Collection Lexicon' as the 'Range index type'. By doing this we are setting the partition key as 'collection lexicon'

                  Partitions are based on forest naming conventions. A forest's partition name prefix and the rest of the forest name are separated by a dash (-). For our example, consider the following forest names and the partitions they will be grouped into:

                  tier1-forest1
                  tier1-forest2

                  tier2-forest1
                  tier2-forest2


                  As specified by the forest name, all forests with the same prefix are grouped under one partition. So, in this case, forests with prefix tier-1 are grouped under the first partition, forests with prefix tier-2 are grouped as the next partition, and so on.

                  Note that all of the forests in a database configured for tiered storage must be part of a partition.

                  The determination of which partition the data that is ingested should be placed in is made by the defined partition range. All the forests in one partition will have a common range. These are defined in the forest configuration page (Admin UI-> forests-> {forest-name}-> range)

                  For this example, since we are using collection lexicon as the partition key, consider the following ranges for the three partitions -

                  Tier1
                  lower bound - accounts
                  upper bound - files

                  Tier2
                  lower bound - journals
                  upper bound - magazines

                  Alternatively, partitions can be created using the REST management API or the xquery/Javascript APIs ()

                  Once this is done, if documents are ingested, for example with a collection "books", that document will be placed into any of the forests in Tier-1.

                  However, there is one caveat with using the collection lexicon for tiered storage - it works well only if the documents have a single collection. If a document has more than one collection, then the ordering can become random and it can be placed in any of the partitions.

                  Also, it is best practice to create a 'default partition' (partition without a range), so that any documents which fall out of the defined ranges, will go to the default partition. In the absence of a default partition, the ordering can again be random.

                  Related Documentation

                  MarkLogic Administrator's Guide: Tiered Storage

                  Introduction

                  Database replication can be used to replicate the contents of a database spanning a cluster to forests on a similarly configured cluster.  The most common use of database replication is to keep two identical MarkLogic clusters in sync with each other.

                  Database replication can also be thought of as a way to quickly and effectively make a backup of all your forest data for a given database.  For example, it can also be used effectively in situations where you want to copy the contents of a database from a master cluster to a single (foreign) node.

                  In this Knowledgebase article, we will walk through the process of configuring database replication to safely replicate the contents of a live database (spanning a 3-node cluster) onto a single MarkLogic instance. Such a process could be used if you need to - for example - create a development environment that contains real application data.

                  Caution

                  Before you follow these steps, please take note of the following points before you make any changes:

                  1. Enabling replication - and copying all your forest data over to another host - will have some overhead on network traffic and additional I/O overhead on each of the hosts in the master cluster. Please ensure that your system is able to cope with the additional overhead before attempting the work outlined in this article.

                  2. After the replication process has completed, the target forests will switch from the status of async replicating to sync replicating. If the I/O capacity of the replica cannot keep up with the master, it could affect performance on the Primary cluster by forcing the lag limit to be observed; this is explained in detail in our documentation under the section on replication lag: Database Replication Guide - Replication Lag

                  As a result of the increased workload that replication will place on your Primary Cluster, please ensure that you have enough resources and - if necessary - arrange for the majority of the work to be completed at a time when traffic on the primary cluster is low.

                  Scenario

                  For a prerequisite, we are starting with a 3-node cluster that contains a specific database (in the context of this example, the database is called "application").  The database contains 12 forests; 6 of these are master forests (2 on each of the 3 nodes in the cluster) and 6 of these are replicas of the 6 masters, which are used for forest-level failover.

                  In order to take our "backup" of this database, we will need to copy the contents of each of the six forests and we're going to use database replication to copy their contents over to a single host that has been prepared for this task.

                  Listed below are the steps required to perform this task (step-by-step):

                  1. Review the content of the Master database

                  This is the master "application" database (6 forests, 6 replicas on a 3-node cluster) - as you can see there are almost 19 million documents stored within this database (over 6 primary forests) and the primary forests are identifiable by following the naming convention of using the database name as a prefix and a sequential number (in this case, application-01 to application-06):

                  2. Review the configuration of the single destination host

                  This is the single node host that we want to replicate to - here we have the same database (called "application") and a matching number of forests (6 forests) with matching names (application-01 to application-06).  The forests are attached but the database currently remains empty:

                  3. Ensure that Database Replication is not currently configured for this database on the primary (3-node) cluster

                  On the master, we're going to select Configure > Databases > application > Database Replication and confirm that Database Replication is not currently set up:

                  4. Set up the new host as a "foreign cluster"; the target system to copy all the data over from the master

                  On the master (Configure > Databases > application > Database Replication), select the Configure tab and use the "Select one here" link; this allows us to set up our foreign cluster (which is another way to describe our single host that we're going to replicate our data to):

                  5. Add the host details for the foreign cluster

                  Here we're entering the host name for the single host, so the master 3-node cluster can establish contact with it and start the process of configuring database replication.

                  6. Accept the remaining defaults by clicking on 'ok'

                  Accept all the default settings - for this walkthrough we're following the simplest path to configuring database replication (from one set of forests whose names match a corresponding set of forests on another host)

                  7. Confirm that your hostname is now listed as a configured foreign cluster

                  Confirm that the foreign cluster is now configured - if everything worked out, you should see something like this on-screen:

                  8. Set up Database replication on the master cluster

                  Now that we have the foreign cluster configured, we can now set up the database to replicate the data over.

                  On the master, go back to Configure > Databases > application > Database Replication, click on the Configure tab and ensure that the foreign cluster is now correctly identified by the master; if it is, click 'ok' for the next part of the setup process:

                  9. Choose the default "Connect By Name" strategy.

                  In this example, the forest names are identical, so the Replicas can be matched up by name.   MarkLogic has identified that the forest names match so it's generated a table to show the mapping that database replication will be using.

                  Allow it MarkLogic to Connect By Name to the replica set and select ok:

                  10. Check the database status of the foreign cluster

                  Confirm that replication is now taking place on the single host (the application database on the foreign cluster host); if it's worked, all forests will be listed with a state of syncing replica and you should see the number of documents starting to increase:

                  Further reading

                  Introduction

                  This Knowledgebase article demonstrates how you can use the KeyStore Explorer tools to generate a CA Root Certificate and end-user certificates for use with MarkLogic Server (for Application Servers which are SSL enabled) and for SSL based client authentication within your applications.

                  KeyStore Explorer can be downloaded from http://keystore-explorer.org/

                  Getting Started

                  Start KeyStore Explorer and select Create a new KeyStore or if you have already had a keystore you can use Open an existing KeyStore

                  For the KeyStore type select JKS

                  Generating a Root Certificate Authority

                  The first step is to create a valid Root Certificate Authority that will be used to sign all end-user or intermediate CA certificates

                  Right-click within the KeyStore workspace to open the context menu and select the Generate Key Pair option from the menu

                  Select RSA as the Algorithm and select a Key Size (typically 2048)

                  After clicking on OK, most of the certificate details will already be pre-populated but you can change the Signature Algorithm, Validity and Serial Number as required.

                  Click on the Edit Name button

                  Complete the Certificate Subject details as necessary (in the example above, we're providing a Common Name, an Organization Unit and an Organization Name), then click OK to save these details. You will see these are now listed under the Name field for the certificate.

                  Click on the Add Extensions button

                  For a Certificate Authority the Basic Constraints and Key Usage extensions are required.

                  Click the Green + button

                  Select the Key Usage Extension

                  Select the Certificate Signing and CRL Sign attributes. With these selected, click OK

                  Click the Green + button again and this time, select the Basic Constraints Extension

                  Check the Subject is a CA box and click OK

                  Verify that both the Key Usage and the Basic Constraints Certificate Extensions are now listed and click OK

                  Click OK to complete the Root CA certificate generation

                  Assign an Alias to the newly created key

                  Enter a password to protect the private key

                  At this point the Root CA Certificate has been created

                  Importing the Root Certificate Authority into MarkLogic

                  Before you can import the Root Certificate into MarkLogic you will first need to export it from the KeyStore Explorer tool in the correct format.

                  Right click on the Root CA entry in the KeyStore and select Export -> Export Certificate Chain

                  Select X.509 as the Export Format and check the PEM checkbox, if you have only a single Root CA certificate select Head Only otherwise select Entire Chain

                  Specify the filename for the exported file; in the example we are using /tmp/rootca.cer (this filename and path will be used later in this article to insert the trusted certificate into MarkLogic Server).

                  Click Export to save the Root CA certificate to a file

                  And click OK to dismiss the confirmation prompt

                  From the Query Console run the following xquery code against the Security Database:

                  You should see an xs:unsignedLong is returned by the call to pki:insert-trusted-certificates if the certificate has been inserted successfully.

                  You can check the Certificate Authorities details in the MarkLogic Admin UI on port 8001 (Configure > Security > Certificate Authorities) to ensure the Root CA certificate was added; the certificate will be listed under the Organization Name that was specified when you created the certificate

                  Using the Root Certificate Authority to Sign End-User Certificates

                  To use Certificate based authentication within MarkLogic you will need to generate and sign certificates using a Root CA certificate such as the one generated by following the steps above.

                  In Keystore Explorer, right click on the Root CA Certificate that you will be using for signing the user certificate and select Sign > Sign New Key Pair

                  As with the root certificate, the user certificate should use the same RSA Algorithm

                  As with the Root Certificate most attributes are pre-populated and can be left with the configured settings.

                  Much like the Root CA, the Name needs to be completed for a basic user certificate.

                  Enter Name details as required:

                  Fill in the Common Name, Organization Unit and Organization Name fields and click OK

                  Click OK generate the user certificate

                  Enter the Alias and click OK

                  Specify a password for protecting the Private Key and click OK to generate the keypair

                  Click OK to dismiss the confirmation prompt

                  You should now see both a root certificate (rootca) and a user certificate (user1)

                  Exporting the User certificate for Certificate Based Authentication

                  There are a number of different application methods that may use certificate based authentication with MarkLogic, such as web browser access, MLCP, DHF and XCC applications.

                  Java based applications will be able to use the KeyStore file generated by the KeyStore Explore tool using the Java javax.net.ssl.keyStore properties

                  Accessing MarkLogic using a web browser requires the Certificate and Private Key to be imported into the web browser using the PKCS#12 format.

                  The following steps show how to export the User certificate and key into the correct format for importing to a web browser.

                  Select the user certificate in the KeyStore, right-click and select Export > Export Key Pair

                  Enter a password for the PKCS#12 file, specify a filename and path (in this example, we're using /tmp/user1_rootca_.p12) and click Export

                  Click OK to dismiss the confirmation prompt


                  You will now be able to import the PKCS#12 file into your web browser.

                  Using openssl to create separate Certificate and Private Key files from a keypair

                  Some applications may require that a separate Certificate and Private Key file are specified, in this case the easiest way to do this is to export a PKCS#12 file as described above and use the OpenSSL tool to split out the separate Certificate and Private Key components.

                  The example below outputs the private key and certificates from a PKCS#12 keypair using the openssl tool and these can be used to create the necessary files using a text editor:

                  $ openssl pkcs12 -in /tmp/user1_rootca_.p12
                  Enter Import Password:
                  MAC verified OK
                  Bag Attributes
                      friendlyName: user1 (rootca)
                      localKeyID: 54 69 6D 65 20 31 35 33 34 32 37 33 38 30 37 31 30 33
                  Key Attributes: 
                  Enter PEM pass phrase:
                  Verifying - Enter PEM pass phrase:
                  -----BEGIN ENCRYPTED PRIVATE KEY-----

                  Further reading

                  Introduction: MarkLogic's shared nothing architecture

                  Each node in your cluster maintains:

                  • Its own set of group level caches;
                  • A stack of application servers; 
                  • All configuration files to allow it to understand the entire topology of the cluster.

                  If you execute a query on a given host in that cluster, that host will use its own resources (CPU, RAM) to run that query.

                  Server fields

                  In addition to this you can also store data in server fields.

                  This could be very useful if you want to "pre-compute" some data that may be required again but which has an up-front cost to create (for example: creating a large number of lookups that may load large numbers of documents from disk).

                  Maps are excellent for fast lookup and retrieval and can allow you to use MarkLogic to store intermediary data; this can be especially useful when you're working on a query that has to work through a lot of steps and may need to resolve some pieces of information multiple times throughout the lifecycle of the report.

                  Caveats

                  However, if you're planning on using server-side fields, there are some important points worth noting:

                  • Server side fields exist only on the host evaluating on the query
                  • Anything stored in a map will not survive an event where the MarkLogic process restarts.

                  Here's a simple example of how a MarkLogic host can be used to store data in server-side maps:

                  Generate some test data

                  The following example demonstrates the creation of 1,000,000 documents. These will be loaded into 20 separate groups.

                  Maps and Server fields

                  For each of those groups we can retrieve a given group directly from a map stored in RAM on the server.

                  In the above example, each subsequent retrieval will return the data directly from the map after it has been fully populated and stored in the server field.

                  Further reading:

                  Introduction

                  Can MarkLogic Server be used behind a caching proxy such as Squid?

                  We present some general pointers and tips for setting up Squid, and we provide an example of how you can use MarkLogic Server to retrieve data through Squid. This example has been tested with Squid 2 (on RHEL/CentOS 5) and Squid 3 (on RHEL/CentOS 6 and 7).  

                  Installing squid

                  A simple approach is to use the yum package manager to install squid:

                  sudo yum -y install squid

                  Configure squid

                  Create the squid cache directory:

                  squid -z

                  Modify the config file

                  Line 921 of /etc/squid/squid.conf (in Squid 2.6)

                  http_port 3128 transparent

                  note the "transparent" after the http_port

                  Start squid

                  As root or with elevated rights using sudo:

                  service squid start

                  Tail the log

                  tail -f /var/log/squid/access.log

                  Troubleshooting: ensure it's not an issue with SELinux or the firewall

                  If there are issues with the response timing out, be sure to check your firewall rules and your SELinux configuration as these could be blocking the request.

                  Sample XQuery code

                  The code below demonstrates a call being made through squid out to http://www.marklogic.com:

                  Http Response

                  You should see an http 200 response to show the http request was routed through squid

                  <response xmlns="xdmp:http">
                    <code>200</code>
                    <message>OK</message>
                    [ ... ]
                  </response>
                  

                  Introduction

                  mlstat is a command line tool that monitors various aspects of MarkLogic Server performance. It is runs on the MarkLogic node itself and is modeled on the classic Unix tools like vmstat and mpstat. It is designed to be always on, running in the background redirecting it's output to a file. It has the ability to tag each line of output with an Epoch or timestamp so the data can be correlated with an event

                  Note: this command-line tool has been replaced by the Monitoring History functionality in MarkLogic 7 (and subsequent releases of the product). As such it is no longer under active development by MarkLogic.

                  You can learn more about the new Monitoring History features by following this link: http://docs.marklogic.com/guide/monitoring/history

                  Design

                  mlstat is a bash script that calls other tools at regular intervals, compares the data with its previous sample and normalizes it on a per second basis. The tools it uses are:

                  • xquery script stats.xqy to get docs inserted, forests, stands, active_merges, merge_read_bytes, merge_write_bytes, journal_write_bytes, journal_write_rate, save_write_rate, save_write_bytesin memory-mb, on disk size
                  • xquery script http-server-status.xqy to get stats about a HTTP or xdbc server
                  • xquery get-hosts.xqy to get a list of hosts
                  • iostat to get disk and cpu performance data
                  • vmstat to get runnable and blocked processes, swap in/out and context switch performance data
                  • pmap to get memory sizes for anon and mapped files
                  • /proc to get memory sizes forthe MarkLogic process
                  • ifconfig to calculate network bandwidth
                  • The MarkLogic log for interesting events such as saves and merges

                  Assumptions

                  • mlstat currently only runs on Linux machines
                  • mlstat assumes that iostat, pmap and vmstat are available on the system ie the sysstat package has been installed
                  • mlstat assumes that iostat, vmstat and ifconfig are in the users $PATH
                  • mlstat assumes that xquery files stats.xqy, get-hosts.xqy and http-server-status.xqy have been installed in the MarkLogic/Admin directory. If not mlstat will exit.
                  • To display database statistics obviously MarkLogic needs to be running

                  Options

                  Use the -h flag for options

                  Database stats
                  -d <database>		Database to monitor
                  -j			Journal stats
                  -s			Save Stats
                  -m			Merge stats
                  -a			In-memory and disk sizes
                  -g			Docs Ingested, Deletes, Re-indexes, stands
                  -q			Query mb
                  -c			Forest cache stats
                  -v			Verbose cache stats
                  -I			ML view of I/O
                  -B			Backup and Restore stats
                  -R			Replication Send and Receive stats
                  -l <file location>	Location of log for scraping
                  -b <filename>		Dump log events to a separate file
                  -L			Dump log events to stdout
                  -o <http-server>	Http server stats
                  -x <xdbc-server>	xdbc server stats
                  -r               	Dump stats for Replica forests not regular System stats
                  -y			Linux stats - cpu, runnables, blocked,swap
                  -n <network name>	Network interface to monitor
                  -k <disk name>		Stripe or disk to monitor
                  -A			Aggregate all the disk stats into 1 number
                  -M			Dump memory stats from pmap of the MarkLogic process (requires root)
                  -S			Dump memory stats from /proc of MarkLogic process Control
                  -U <name>		ML User name other than admin
                  -P <passwd>		ML Passwd other than admin
                  -f			Dump stats in comma seperated list for csv
                  -e			Include Epoch per line
                  -t			Include Timestamp per line
                  -i <interval>		Set interval, default 10
                  -p <count>		Number of samples to take
                  -H <hostname>		Dumps stats for just one host in cluster
                  -N <hostname>		Run mlstat on this node, default is localhost
                  -C <comment>		Prepend this comment to each line
                  -X			Suppress headers
                  

                  Running mlstat

                  The only required flag is -d if you are tracking one of the database statistics. The -d parameter specifies which database to extract performance data from.

                  However no flags means no data; there is no set of default data.

                  By default mlstat prints on a 10 second interval. Use the -i flag to change this. mlstat measures the actual interval taken and uses this value for all rate stats calculated

                  It is recommended to add a timestamp to each line of mlstat output making it possible to plot results later and pinpoint performance issues (use the -e, -t or -D flags)

                  Due to the potential size of the Error Log, checking the log is not enabled by default. However if you specify the -s (saves) or -m (merges) flags the ErrorLog file will be scraped to get save and merge counts on this node.

                  Like other tools mlstat dumps a header every 10 samples, to suppress this header specify the -X flag

                  Also like other tools you can restrict the number of samples to collect with the -p flag

                  mlstat can be run on multiple nodes at the same time. In this mode it is highly recommended to use -H <node name> to collect the data for that particular node.

                  Generating CSV output from mlstat

                  mlstat with the -f option produces a comma delimited csv file that can be used to generate graphs in Excel or other tools.

                  Sections of mlstat output

                  • If the -e flag is specified mlstat will print the Linux epoch at the start of every line. This is extremely useful for plotting data
                  • Specifying -t will convert this epoch to timestamp from the Linux date command
                  • Specifying -D will emit both a date a time, handy for tests that run over a number of days
                  • By using the -b or -L flags Save and Merge events written to the MarkLogic log can be printed by mlstat
                    • Using -b option to redirect this output to a file
                    • Using -L prints this output to standard out
                  • If the ErrorLog is being written to a different location use -l to indicate this to mlstat
                  • if -m or -s flags are used then the log file will be scraped to get counts of merges and saves respectively
                  • By specifying -H just the ML stats for the specified node will be printed. It is important to use the fully qualified nodename eg foobar.example.com as defined in the cluster or the data cannot be extracted
                  • The -j option simply prints the MB/s of journal files written to disk. By default this is for all nodes -H specifies a particular node
                  • The -s option prints the number of saves of in-memory stands to disk and the MB/s of Save data for the cluster. Again by default this is for all forests, -H for a single node
                  • The -m option prints the number of Merges currently active (A-Mergs), completed in the last period (C-Mergs) and the MBs per second Reads and Writes for merges across the entire cluster (note the Merge-rMB/s usually does not equal Disk I/O and a good percentage of the reads will be satisfied by the Linux filesystem cache)
                  • The -a option dumps the size the in-memory stands and the current size of the stands on disk. Again if -H is used then only the space for that node is displayed
                  • The -g flag dumps Docs ingested, Deleted and Re-indexed per second and current Stands in the database. The stand count will include both in-memory stands and on-disk stands
                  • The -q flag dumps the MB per second read from disk for queries. This is an approximation of query processing load
                  • With the -c flag you can dump the hit rates of the List cache (LC) and Compressed Tree Cache (CTC)
                  • The -I flag gives a view of I/O from inside the MarkLogic Server
                  • The -B flag measures the MB/s for backup and restore. It dumps the 512KB Read and Write ops per second for both backup and restore. It also dumps MarkLogics internal measurement of load for these operations. Finally based on the cpu time spent on the operation it calculates latency
                  • The -R flag dumps the send and receive KB per second for database replication. Note this does not represent network traffic for local disk replication
                  • With -o or -x , stats from a HTTP or xdbc server can be dumped. By default the query rate, current count of outstanding requests, number of outstanding update requests, active threads in the server and the Extended Tree Cache (ETC) hit rate are dumped. Note we add the name of the HTTP/xdbc server to the heading of each field
                  • Adding the -v flag dumps statistics for the other caches (fs program cache,db program cache,env program cache fs main module sequence cache, db main module sequence cache, fs library module cache, db library module cache) These caches do not tend to be an issue and are included for completeness.
                  • The -y flag dumps the breakdown of cpu time spent, runnable and blocked processes, swap in and out and context switches per second for this node. For CPU breakdown There is percent user, nice, system,idle, iowait and steal. The nice percentage is an indication of how much cpu is being spent on merges
                  • By specifying -k <disk-name> mlstat will dump the I/O statistics on this node. The device can be a stripe such as md0 or individual disks such as /dev/xvdl. (Users can specify multiple disks using the | character. Note the | must be escaped on Linux so the command would be -k xvdl\|xvdm)
                  • If multiple disks are specified then the -A flag can be included to aggregate the data from all these disks.
                  • By specifying -n mlstat will dump the Network Bandwidth in Kbits per second
                  • The -M flag uses the Linux utility pmap to determine how much memory have been allocated to Anon and Memory mapped files in the MarkLogic process. For each it has two fields, memory in MB requested and the current Resident Set Size (RSS) of the allocation. The RSS indicates how much memory Linux has actually assigned.
                  • On some Linux systems, notably Red Hat, you need root permission to use pmap on the MarkLogic daemon process. As an alternative most of the same data can be accquired via /proc. The -S flag uses /proc to collect RSS and process size information

                  Example usage

                  ./mlstat -d Documents -g
                  Monitor the ingest rate and stand count for the Documents database
                  ./mlstat -d Documents -g -t
                  Monitor the ingest rate and stand count for the Documents database with a timestamp on each line
                  ./mlstat -d Documents -g -t -j -m -s
                  Monitor the ingest rate, Journal MB/s, Merge read and write MB/s, Save MB/s and stand count for the Documents database with a timestamp on each line
                  ./mlstat -d Documents -g -t -j -m -s -y
                  Monitor the ingest rate, Journal MB/s, Merge read and write MB/s, Save MB/s and stand count for the Documents database. Add the cpu stats for this node. With a timestamp on each line
                  ./mlstat -d Documents -g -t -j -m -s -y -i 60
                  Monitor the ingest rate, Journal MB/s, Merge read and write MB/s, Save MB/s and stand count for the Documents database. Add the cpu stats and for this node. With a timestamp on each line and set the interval to every 60 seconds
                  ./mlstat -d Documents -g -t -j -m -s -y -i 60 -n eth0
                  Monitor the ingest rate, Journal MB/s, Merge read and write MB/s, Save MB/s and stand count for the Documents database. Monitor the cpu stats and network eth0 for this node. With a timestamp on each line and set the interval to every 60 seconds
                  ./mlstat -d Documents -B
                  Monitor the Backup and Restore I/O for the Documents database
                  ./mlstat -d Documents -R
                  Monitor the Database replication network traffic for the Documents database
                  ./mlstat -d Documents -I
                  Dump MarkLogics view of I/O for the Documents database
                  ./mlstat -k xvdl\|xvdm
                  Monitor two disks, xvdl and xvdm on this server
                  ./mlstat -k xvdl\|xvdm -A
                  Monitor two disks, xvdl and xvdm on this server but accumulate their stats
                  ./mlstat -M
                  Monitor the memory usage of the MarkLogic daemon using pmap
                  ./mlstat -S
                  Monitor the memory usage of the MarkLogic daemon using /proc
                  ./mlstat -d Documents -x 9000-xcc
                  Monitor the XDBC server 9000-xcc on the Documents database
                  ./mlstat -d Documents -x 9000-xcc -v
                  Monitor the XDBC server 9000-xcc on the Documents database and add its cache hit rates

                  Download

                  You can download all the required files for mlstat in the zip file (mlstat.zip) attached to this KnowledgeBase article (see below)

                  Introduction

                  MarkLogic uses the LDAP "memberOf" attribute to determine group membership for authorising access to specific security roles, however by default, the "memberOf" attribute is not enabled in an OpenLDAP server. This article will show to enable the "memberOf" attribute in an OpenLDAP server so that MarkLogic can successfully determine group membership and authorise access to a Role.

                  Configuring OpenLDAP to support the "memberOf" attribute

                  1. Create a LDIF file with the following contents

                  memberOf.ldif

                  dn: cn=module,cn=config
                  cn: module
                  objectClass: olcModuleList
                  objectclass: top
                  olcModuleLoad: memberof.la
                  olcModulePath: /usr/lib64/openldap

                  dn: olcOverlay=memberof,olcDatabase={2}bdb,cn=config
                  objectclass: olcconfig
                  objectclass: olcMemberOf
                  objectclass: olcoverlayconfig
                  objectclass: top
                  olcoverlay: memberof

                  Check that the database name assigned to "olcDatabase" is the same on your system as different Linux distributions may use other names, e.g hdb instead of bdb.

                  /etc/openldap/slapd.d/cn=config

                  drwxr-x--- 2 ldapldap 40 Nov 28 18:13 olcDatabase={2}bdb

                  Check the OpenLDAP library name and path (olcModuleLoadolcModulePath) are valid, as with the database name this can vary with different Linux distributions.

                  2.  Issue the following command to add "memberOf" support and restart OpenLDAP

                  ldapadd -Y EXTERNAL -H ldapi:/// -f /tmp/memberof.ldif

                  If the LDAPI:// is not active on the system add the following parameter to the OpenLDAP Configuration in "/etc/sysconfig/ldap" and restart OpenLDAP first.

                  SLAPD_LDAPI=yes

                  3. It should now be able to add users to LDAP and assign them as a member to the required LDAP groups, OpenLDAP will then add the attribute “memberOf” to their LDAP entry, e.g


                  dn: uid=appadmin,ou=Users,dc=MarkLogic,dc=Local
                  objectClass: top
                  objectClass: person
                  objectClass: organizationalPerson
                  objectClass: inetOrgPerson
                  cn: appadmin
                  sn: MarkLogic App Admin
                  uid: appadmin
                  userPassword:: cGFzc3dvcmQ=

                  dn: cn=AppAdmin,ou=Groups,dc=MarkLogic,dc=Local
                  objectClass: top
                  objectClass: groupOfNames
                  cn: AppAdmin
                  member: uid=appadmin,ou=Users,dc=MarkLogic,dc=Local

                  Note

                  OpenLDAP assigns the "memberOf" to the user LDAP entry as an operational attribute, as such it will not be visible when you using a normal "ldapsearch" command, e.g

                  ldapsearch   -x -h localhost -D "cn=Manager,dc=MarkLogic,dc=local" -W -b "uid=appadmin,ou=Users,dc=MarkLogic,dc=Local"

                  In order to display the operational attributes add an additional "+" parameter to the end of the search command, e.g.

                  [admin@kerberos tmp]# ldapsearch -x -h localhost -D "cn=Manager,dc=MarkLogic,dc=local" -W -b "uid=appadmin,ou=Users,dc=MarkLogic,dc=Local" +
                  Enter LDAP Password:
                  # extended LDIF
                  #
                  # LDAPv3
                  # base <uid=appadmin,ou=Users,dc=MarkLogic,dc=Local> with scope subtree
                  # filter: (objectclass=*)
                  # requesting: +
                  #

                  # appadmin, Users, MarkLogic.Local
                  dn: uid=appadmin,ou=Users,dc=MarkLogic,dc=Local
                  structuralObjectClass: inetOrgPerson
                  entryUUID: 299e5620-49e2-1036-8750-e5471d51d42c
                  creatorsName: cn=Manager,dc=MarkLogic,dc=Local
                  createTimestamp: 20161128181352Z
                  entryCSN: 20161128181352.705449Z#000000#000#000000
                  modifyTimestamp: 20161128181352Z
                  memberOf: cn=AppAdmin,ou=Groups,dc=MarkLogic,dc=Local
                  modifiersName: cn=Manager,dc=MarkLogic,dc=Local
                  entryDN: uid=appadmin,ou=Users,dc=MarkLogic,dc=Local
                  subschemaSubentry: cn=Subschema
                  hasSubordinates: FALSE

                  Additional reading

                  Assigning external LDAP group membership to a Role.

                  OpenLDAP memberOf overlay manual

                  Abstract

                  perf is a tool that can help analyze the performance of a process on Linux.

                  Discussion

                  Often, pstack is recommended as a way of understanding the behavior of MarkLogic Server. pstack works by pausing the MarkLogic process and printing out an execution stack trace; this effectively gives a point-in-time view of what the process is doing. When repeated over time, this gives a time-sliced view of the process.

                  Another tool for understanding performance is perf. perf works by sampling the process, rather than pausing it, and so is more lightweight than pstack, and it gives a statistical view of the process execution.

                  A typical use would be recording the activity for a minute or so when the CPU is high.

                  Installation

                  To install on RHEL/CentOS:

                  yum install perf

                  Use

                  To record and report the data (must be run as root/sudo):

                   cd /tmp

                  then

                   perf record -F 99 -ag -- sleep 60

                  This will take about one minute and write a file called perf.data in your working directory. Then, create a report as follows:

                   perf report --header -I --stdio > `hostname -s`.`date -Is`.report 

                  This reads the perf.data and creates a readable report in the file nodeX, which can be submitted for analysis.  Please also run

                   perf script > `hostname -s`.`date -Is`.script

                  and submit the nodeX.script file.  The file may be large but should compress well.

                  A quick way to find what calls are taking the most time is to run

                   perf top

                  Note: All the steps outlined above need to be executed on the same MarkLogic host

                  See also

                  Introduction

                  In this Knowledgebase article, we will show you how the rebalancer can be used to migrate data from a filesystem mount to another separate mount.

                  This scenario could occur, for example, if you created some forests and initially did not specify a data directory and later on, a new volume was added.

                  Understanding Forest configuration

                  It's worth noting that you can't modify a forest's data directory location after the forest has been created, so if a forest is created and you later realise that the data directory path was incorrect, the fastest course of action to remedy the issue is to delete the current configuration and to create a new forest with the correct (and updated) configuration.

                  Scenario: migrating all forest content from one location to a new location

                  In this scenario, we are going to make use of MarkLogic's rebalancer to get it to migrate the data held in two forests onto two other forests. In this scenario, we are working on the premise that there is a database (called MyDatabase) which contains two forests: these forests happen to have the default data directory specified.

                  We want to migrate all the data into two new forests on a different mount point. In this example, I'm demonstrating this through the use of a different directory, but the principle remains the same.

                  We will run through this scenario step-by-step to show you how to migrate data from one location to another.

                  1. Identify the database that contains the data that you want to move

                  In this situation, we're using an example database called MyDatabase

                  2. Ensure that the rebalancer is enabled for this database

                  In the admin GUI on port 8001 we are going to go to Configure > Databases > MyDatabase and then we're going to scroll down the options until we see the one for "rebalancer enable". This needs to be set to true.

                  3. Make a note of the current forests for that database

                  In the admin GUI on port 8001 we are going to go to Configure > Databases > MyDatabase and then we're going to go to the "Status" tab:

                  Note that we have two forests listed: Forest-1 and Forest-2. These are the forests that will be getting retired and removed from service.

                  4. Create your new forests on the new mountpoint

                  In this case, we have created two new forests: NewForest-1 and NewForest-2 and in both cases, we've specified a new filesystem location.  In the example below I've called this C:\MarkLogicData to demonstrate the process:

                  Note that you can go to this view in the admin GUI by going to Configure > Forests and by looking at the content in the Summary tab.

                  Also note that at this stage, these forests have been created but they're not listed as being attached to any database; this is indicated by the blank entry in the dropdown menus next to the forest names.

                  5. Review the current forest configuration of your database

                  You can see the status of the current configuration in the admin GUI on port 8001 by selecting Configure > Databases > MyDatabase > Forests

                  Note that you should see your two current forests (Forest-1 and Forest-2 listed as "attached" and immediately below that you should see that there are two forests that are not yet attached to any database (NewForest-1 and NewForest-2):

                  6. Retire the current forests and attach the new forests to the database

                  Ensure that the original forests (Forest-1 and Forest-2) are now set to "retired" using the checkboxes and ensure the new forests (NewForest-1 and NewForest-2) are now attached to this database so the rebalancer can migrate all the data from the original forests to the newly added forests:

                  7. Confirm that the rebalancer is now operational on the database status page

                  In the admin GUI on port 8001 go back to Configure > Databases > MyDatabase and then look at the Status tab:

                  Note that you should see information listed under the heading "Rebalancing State"; this should give you an indication on how long MarkLogic Server expects the operation to take and how many fragments need to be migrated out.

                  You should also see 4 forests listed; the original forests should now show less documents than before and a number of deleted fragments, whereas you should see the document counts on the newly added forests beginning to increase.

                  8. Confirm that the original forests are now empty 

                  When the process is complete, the Rebalancing State should read "Not rebalancing" and you should see that your original 2 forests now list 0 documents:  

                  At this stage, we can see that the rebalancing work is done and the retired forests are now safe to remove from the system.

                  9. Detach the original forests from your database

                  In the admin GUI on port 8001, go back to the Forest Configuration page for the database (MyDatabase in this example) by selecting Configure > Databases > MyDatabase > Forests

                  You can now uncheck the attached checkboxes for both of the original (now retired) forests.  Save the changes with the "ok" button:

                  10. Confirm that your database only lists the new forests on the status page

                  Note that we should only see two forests (NewForest-1 and NewForest-2listed this time:

                  Introduction

                  Special care may need to be taken when loading documents into MarkLogic Server where the document URI contains one or more special characters.  In this article, we will walk through a scenario where exceptions are thrown if such a URI with special character is not handled properly and then we will talk about how to handle such URIs. This article will take advantage of inbuilt functions (and encode method of java.net.URLEncoder class) and showcase their usage via a couple of samples created using XCC/J to understand this scenario and suggested approach.

                  Relationship between URI and URL

                  A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource. The most common form of URI is the Uniform Resource Locator (URL).

                  A URL is a URI that, in addition to identifying a web resource, specifies the means of acting upon or obtaining the representation, specifying both its primary access mechanism and network location. For example, the URL 'http://example.org/wiki/Main_Page' refers to a resource identified as /wiki/Main_Page whose representation, in the form of HTML and related code, is obtainable via HyperText Transfer Protocol (http) from a network host whose domain name is example.org

                  While it is possible to load documents into MarkLogic Server, where the document URI contains special characters not encoded, it is recommended to follow best practices by URL encoding document URIs as it will help you design robust applications, free from the side effects caused by such special characters in other areas of your application stack.

                  Importance of URL encoding

                  URL encoding is often required to convert special characters (such as "/", "&", "#", ...), because special characters: 

                  1. have special meaning in some contexts; or
                  2. are not valid character for an URL; or
                  3. could be altered during transfer. 

                  For instance, the "#" character needs to be encoded because it has a special meaning of that of an html anchor. The <space> character needs to be encoded because it is not a valid URL character. Also, some characters, such as "~" might not transport properly across the internet.

                  Consider the example where a parameter is supplied in a URL and parameter value has a special character in it, such as,

                  • Parameter is "movie1" and its value is "Fast & Furious"

                  The parameter may be submitted via a URL such as "http://www.awebsite.com/encodingurls/submitmoviename.html?movie1=Fast & Furious". In this example, space and & need to be handled specially, otherwise it may not be interpreted properly - for example, the associated GET request may fail.

                  These character can be encoding:
                        Space as '%20' or '+'
                        '&' as '%26'

                  And thus the URL, after encoding, would look like 'http://www.awebsite.com/encodingurls/submitmoviename.html?movie1=Fast+%26+Furious'.

                   What is URL encoding?

                  URL Encoding is the process of converting a string into a valid URL format. Valid URL format means that the URL contains only "alpha | digit | safe | extra | escape" characters. For URL specifications, there are various established standards including below listed w3c standards:

                  1. http://www.w3.org/Addressing/URL/url-spec.html
                  2. http://www.w3.org/International/francois.yergeau.html 

                  Safe and unsafe characters

                  Based on Web Standards, the following quick reference chart explains which characters are “safe” and which characters should be encoded in URLs. 

                  Classification

                  Included characters

                  Encoding required?

                  Safe characters

                  Alphanumerics [0-9a-zA-Z], special characters $-_.+!*'(), and reserved characters used for their reserved purposes (e.g., question mark used to denote a query string)

                  NO

                  ASCII Control characters

                  Includes the ISO-8859-1 (ISO-Latin) character ranges 00-1F hex (0-31 decimal) and 7F (127 decimal.)

                  YES

                  Non-ASCII characters

                  Includes the entire “top half” of the ISO-Latin set 80-FF hex (128-255 decimal.)

                  YES

                  Reserved characters

                  $ & + , / : ; = ? @ (not including blank space)

                  YES*

                  Unsafe characters

                  Includes the blank/empty space and " < > # % { } | \ ^ ~ [ ] `

                  YES

                   * Note: Reserved characters only need encoding when not used for their defined, reserved purposes.

                  For complete details and understanding these character classification please check RFC1738

                  Walkthrough of an example Scenario using XCC/J

                  Let's take a look at a sample created to connect to MarkLogic Server using the XCC/J connector. 

                  We will start with a case in our scenario where we have a special character in a document URI which is not safely handled properly while loading this document in to MarkLogic Server. Next we will resolve it by using URI encoding

                  Consider the following code:

                  In above code we are running a newAdHocQuery and calling xdmp:document-insert and passing in the URI (with special character). Request has been submitted in a try-catch block to handle any exception which comes out while submitting this request

                  On running this code we will get below exception:

                  Full adHocQuery being executed: xdmp:document-insert("&.xml", <test/>)
                  com.marklogic.xcc.exceptions.XQueryException: XDMP-ENTITYREF: (err:XPST0003) Invalid entity reference ".xml"
                  [Session: user=[user], cb={default} [ContentSource: user=admin, cb={none} [provider: address=localhost/127.0.0.1:8000, pool=1/64]]]
                  [Client: XCC/8.0-1, Server: XDBC/8.0-1.1]
                  in /eval, on line 1
                  expr:

                  Notice that there is no '&' character present in the exception trace because '&' is a special character and is not handled properly. To resolve this issue, we can use the encode method of java.net.URLEncoder class to encode these characters. Now consider below example,

                  As you can see in above example we have encoded a uri with special character by encoding it, 
                  String badUri = "&.xml"; 
                  String goodUri = URLEncoder.encode(badUri, "UTF-8");

                  Running this code will successfully load the document with encoded URI, as %26.xml

                  Another example for scenario using curl

                  Here in this example, we are using curl to load a simple XML document with a URI having a special character (ム). Scenario is similar as mentioned in above. This time we are using curl to load document into MarkLogic.

                  Consider the following curl command:

                  curl --anyauth --user username:password -X PUT -T ./test.xml -i -H "Content-type: application/xml" http://localhost:8000/v1/documents?uri=/%e3%83%a0.xml 

                  Here are the contents of test.xml:  <test><sample>test 1</sample></test>

                  Running above curl command to load a simple xml document with a URI having a special character (ム) fails with "400 Bad Request":

                  {"errorResponse":{"statusCode":400, "status":"Bad Request", "messageCode":"REST-INVALIDPARAM", "message":"REST-INVALIDPARAM: (err:FOER0000) Invalid parameter: invalid uri: /πâá.xml"}}

                  To resolve this issue, we can use the --data-urlencode option provided by curl to encode data.

                  Now consider below example,

                  curl --anyauth --user username:password -X PUT -T ./test.xml -i -H "Content-type: application/xml" http://localhost:8000/v1/documents --data-urlencode uri=/%e3%83%a0.xml –G

                  --data-urlencode is used to encode the uri parameter and -G is used to join arguments into request data

                  Running this code will successfully load the document with encoded URI, as /%e3%83%a0.xml

                  Conclusion

                  While it is possible to load documents into MarkLogic Server, where the document URI contains special characters not encoded, it is recommended to follow best practices by URL encoding document URIs as it will help you design robust applications, free from the side effects caused by such special characters in other areas of your application stack.

                  References

                  I. http://www.permadi.com/tutorial/urlEncoding/

                  II. http://perishablepress.com/stop-using-unsafe-characters-in-urls/

                  III. http://www.ietf.org/rfc/rfc3986.txt RFC3986 on URI

                  IV. http://www.ietf.org/rfc/rfc1738.txt RFC1738 on URL

                  V. http://developer.marklogic.com/products/xcc

                  VI. http://docs.oracle.com/javase/7/docs/api/java/net/URLEncoder.html

                  VII. http://en.wikipedia.org/wiki/Uniform_resource_identifier#The_relationship_between_URIs.2C_URLs.2C_and_URNs

                  VIII. https://ec.haxx.se/http-post.html

                   

                  Introduction

                  A document uniform resource identifier (URI) is a string of characters used to identify a name of a document stored in MarkLogic Server. This article describes which characters are supported by MarkLogic 8 to represent a document URI.

                  ASCII

                  MarkLogic 8 allows all characters from printable ASCII characters to be used in a document URI (i.e. decimal range 32-196).

                  List of allowed special characters within ASCII range

                  <space> ! " # $ % & ' () * + , - . / : ; < = > ? @ [ \ ] ^ _ ` {  | }  ~ 

                  Please note ASCII character for space (decimal 32) can be used, however it should not be used as a prefix or a suffix.

                  Other Character Sets

                  MarkLogic Server supports UTF 8 encoding. Apart from valid ASCII character set mentioned above, any valid UTF-8 character can be used within a document URI in MarkLogic Server. 

                  Examples include: Decimal range 384-591 for representing Latin Extended-A;  and decimal range 880-1023 for representing Greek and Coptic.

                  External Considerations

                  Few interfaces (such XCC/J) and datatypes might place more restrictions on characters allowed in a MarkLogic document URI. For example, xs:anyURI datatype place more restrictions on a URI and restricts use of & (Decimal code 38) and < (Decimal code 60). Consider the following scenario.

                  A schema is loaded into database and validations are applied before inserting an xml document into the database, 

                  Now below query will fail to insert a document with URI having a

                  Above code fails and gives error listed below,

                  [1.0-ml] XDMP-DOCENTITYREF: xdmp:unquote("<?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?>&#10;<...") -- Invalid entity reference "." at line 2

                   

                  To resolve this issue, function xdmp:url-encode can be used, for example

                  let $node := xdmp:unquote(fn:concat('<?xml version="1.0" encoding="UTF-8"?>
                  <tns:simpleuri xmlns:tns="http://www.example.org/uri" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.example.org/uri uri.xsd ">',

                  xdmp:url-encode(fn:codepoints-to-string($n)), '.org
                  </tns:simpleuri>'))

                  The MarkLogic knowledge base article, Using URL encoding to handle special characters in a document URI , explains a recommended approach for safely handling special characters (using url encoding). A document URI containing special characters, as mentioned in above Knowledge base article, should be encoded before it is inserted into MarkLogic 8. 

                  Summary

                  While it is possible to load documents into MarkLogic Server where the document URI contains special characters not encoded, it is recommended to follow best practices by URL encoding document URIs as it will help you design robust applications, free from the side effects caused by such special characters in other areas of your application stack. 

                  Additional References

                  ISO/IEC 8859-1

                  w3 school: HTML Unicode (UTF-8) Reference

                   

                  Introduction

                  MarkLogic Server is a highly scalable, high performance Enterprise NoSQL database platform. Configuring a MarkLogic cluster to run as virtual machines follows tuning best practices associated with highly distributed, high performance database applications. Avoiding resource contention and oversubscription is critical for maintaining the performance of the cluster. The objective of this guide is to provide a set of recommendations for configuring virtual machines running MarkLogic for optimal performance. This guide is organized into sections for each computing resource, and provides a recommendation along with the rationale for that particular recommendation. The contents of this guide are intended for best practice recommendations and are not meant as a replacement for tuning resources based on specific usage profiles. Additionally, several of these recommendations trade off performance for flexibility in the virtualized environment.

                  General

                  Recommendation: Use the latest version of Virtual Hardware

                  The latest version of Virtual Hardware provides performance enhancements and maximums over older Virtual Hardware versions. Be aware that you may have to update the host, cluster or data center. For example, ESXi 7.0 introduces virtual hardware version 17, but VMs imported or migrated from older versions may not be automatically upgraded.

                  Recommendation: Use paravirtualized device drivers in the guest operating system

                  Paravirtualized hardware provides advanced queuing and processing off-loading features to maximize Virtual Machine performance. Additionally, paravirtualized drives provide batching of interrupts and requests to the physical hardware, which provides optimal performance for resource intensive operations.

                  Recommendation: Keep VMware Tools up to date on guest operating systems

                  VMware Tools provides guest OS drivers for paravirtual devices that optimize the interaction with VMkernel and offload potentially processor-intensive tasks such packet segmentation.

                  Recommendation: Disable VMWare Daemon Time Synchronization of the Virtual Machine

                  By default the VMWare daemon will synchronize the Guest OS to the Host OS (Hypervisor) once per minute, and may interfere with ntpdor chronyd settings. Through the VMSphere Admin UI, you can disable time synchronization between the Guest OS and Host OS in the virtual machine settings.

                  VMWare Docs: Configuring Virtual Machine Options

                  Recommendation: Disable Time Synchronization during VMWare operations

                  Even when daemon time synchronization is disabled, time synchronization will still occur during some VMWare operations such as, Guest OS boots/reboots, resuming a virtual machine, among others. Disabling VMWare clock sync completely requires editing the .vmx for the virtual machine to set several synchronization properties to false. Details can be found in the following VMWare Blog:

                  VMWare Blog: Completely Disable Time Synchronization for your VM

                  Recommendation: Use the noop scheduler for VMWare instances rather than deadline

                  The NOOP scheduler is a simple FIFO queue and uses the minimal amount of CPU/instructions per I/O to accomplish the basic merging and sorting functionality to complete the I/O.

                  Red Hat KB: IO Scheduler Recommendations for Virtualized Hosts

                  Recommendation: Remove any unused virtual hardware devices

                  Each virtual hardware device (Floppy disks, CD/DVD drives, COM/LPT ports) assigned to a VM requires interrupts on the physical CPU; reducing the number of unnecessary interrupts reduces the overhead associated with a VM.

                  Processor                                                                                                     

                  Socket and Core Allocation

                  Recommendation: Only allocate enough vCPUs for the expected server load, keeping in mind the general recommendation is to maintain two cores per forest.

                  Rationale: Context switching between physical CPUs for instruction execution on virtual CPUs creates overhead in the hypervisor.

                  Recommendation: Avoid oversubscription of physical CPU resources on hosts with MarkLogic Virtual Machines. Ensure proper accounting for hypervisor overhead, including interrupt operations, storage network operations, and monitoring overhead, when allocating vCPUs.

                  Rationale: Oversubscription of physical CPUs can cause contention for process intensive operations on in MarkLogic. Properly accounting will ensure adequate CPU resources are available for both the hypervisor and any MarkLogic Virtual Machines.

                   

                  Memory                                                                                                      

                  General

                  Recommendation: Set up memory reservations for MarkLogic Virtual Machines.

                  Rationale: Memory reservations guarantee the availability of Virtual Machine memory when leveraging advanced vSphere functions such as Dynamic Resource Scheduling. Creating a reservation reduces the likelihood that MarkLogic Virtual Machines will be vMotioned to an ESX host with insufficient memory resources.

                  Recommendation: Avoid combining MarkLogic Virtual Machines with other types of Virtual Machines.

                  Rationale: Overcommitting memory on a single ESX host can result in swapping, causing significant performance overhead. Additionally, memory optimization techniques in the hypervisor, such as Transparent Page Sharing, rely on Virtual Machines running the same operating systems and processes.

                  Swapping Optimizations

                  Recommendation: Configure VM swap space to leverage host cache when available.

                  Rationale: During swapping, leveraging the ESXi hosts local SSD for swap will likely be substantially faster than using shared storage. This is unavailable when running a Fault Tolerant VM or using vMotion, but will provide a performance improvement for VMs in an HA cluster.

                  Huge / Large Pages

                  Recommendation: Configure Huge Pages in the guest operating system for Virtual Machines.

                  Rationale: Configuring Huge Pages in the guest operating system for a Virtual Machine prioritizes swapping of other memory first.

                  Recommendation: Disable Transparent Huge Pages in Linux kernels.

                  Rationale: The transparent Huge Page implementation in the Linux kernel includes functionality that provides compaction. Compaction operations are system level processes that are resource intensive, potentially causing resource starvation to the MarkLogic process. Using static Huge Pages is the preferred memory configuration for several high performance database platforms including MarkLogic Server.

                   

                  Disk                                                                                                                        

                  General

                  Recommendation: Use Storage IO Control (SIOC) to prioritize MarkLogic VM disk IO.

                  Rationale: Several operations within MarkLogic require prioritized, consistent access to disk IO for consistent operation. Implementing a SIOC policy will help guarantee consistent performance when resources are contentious across multiple VMs accessing disk over shared links.

                  Recommendation: When possible, store VMDKs with MarkLogic forests on separate aggregates and LUNs.

                  Rationale: Storing data on separate aggregates and LUNs will reduce disk seek latency when IO intensive operations are taking place – for instance multiple hosts merging simultaneously.

                  Disk Provisioning

                  Recommendation: Use Thick Provisioning for MarkLogic data devices

                  Rationale: Thick provisioning prevents oversubscription of disk resources.  This will also prevent any issues where the storage appliance does not automatically reclaim free space, which can cause writes to a LUN to fail.

                  NetAPP Data ONTAP Discussion on Free Space with VMWare

                  SCSI Adapter Configuration

                  Recommendation: Allocate a SCSI adapter for guest operating system files and database storage independently. Additionally, add a storage adapter per tier of storage being used when configuring MarkLogic (i.e., an isolated adapter with a virtual disk for fast data directory).

                  Rationale: Leveraging two SCSI adapters provides additional queuing capacity for high IOPS virtual machines. Isolating IO also allows tuning of data sources to meet specific application demands.

                  Recommendation: Use paravirtualized SCSI controllers in Virtual Machines.

                  Rationale: Paravirtualized SCSI controllers reduce management overhead associated with operation queuing.

                  Virtual Disks versus Raw Device Mappings

                  Recommendation: Use Virtual Disks rather than Raw Device Mappings.

                  Rationale: VMFS provides optimized block alignment for virtual machines. Ensuring that MarkLogic VMs are placed on VMFS volumes with sufficient IO throughput and dedicated physical storage reduces management complexity while optimizing performance.

                  Multipathing

                  Recommendation: Use round robin multipathing for iSCSI, FCoE, and Fibre Channel LUNs.

                  Rationale: Multipathing allows the usage of multiple storage network links; using round robin ensures that all available paths will be used, reducing the possibility of storage network saturation.

                  vSphere Flash Read Cache

                  Recommendation: Enabling vSphere Flash Read Cache can enhance database performance. When possible, a Cache Size of 20% of the total database size should be configured.

                  Rationale: vSphere Flash Read Cache provides read caching for commonly accessed blocks. MarkLogic can take advantage of localized read cache for many operations including term list index resolution. Additionally, offloading read requests from the backing storage array reduces contention for write operations.

                   

                  Network                                                                                                          

                  General

                  Recommendation: Use a dedicated physical NIC for MarkLogic cluster communications and a separate NIC for application communications. If multiple NICs are unavailable, use separate VLANs for cluster and application communication.

                  Rationale: Separating communications ensures optimal bandwidth is available for cluster communications while spreading the networking workload across multiple CPUs.

                  Recommendation: Use dedicated physical NICs for vMotion traffic on ESXi hosts running MarkLogic. If additional physical NICs are unavailable, move vMotion traffic to a separate VLAN.

                  Rationale: Separating vMotion traffic onto separate physical NICs, or at the very least a VLAN, reduces overall network congestion while providing optimal bandwidth for cluster communications. Additionally, NIOC policies can be configured to ensure resource shares are provided where necessary.

                  Recommendation: Use dedicated physical NICs for IP storage if possible. If additional physical NICs are unavailable, move IP storage traffic to a separate VLAN.

                  Rationale: Separating IP storage traffic onto separate physical NICs, or at the very least a VLAN, reduces overall network congestion while providing optimal bandwidth for cluster communications. Additionally, NIOC policies can be configured to ensure resource shares are provided where necessary.

                  Recommendation: Use Network I/O Control (NIOC) to prioritize MarkLogic inter-cluster communication
                  traffic.

                  Rationale: Since MarkLogic is a shared-nothing architecture, guaranteeing consistent network communication between nodes in the cluster provides consistent and optimal performance.

                   

                  Network Adapter Configuration

                  Recommendation: Use enhanced vmxnet virtual network adapters in Virtual Machines.

                  Rationale: Enhanced vmxnet virtual network adapters can leverage both Jumbo Frames and TCP Segmentation Offloading to improve performance. Jumbo Frames allow for an increased MTU, reducing TCP header transmission overhead and CPU load. TCP Segmentation Offloading allows packets up to 64KB to be passed to the physical NIC for segmentation into smaller TCP packets, reducing CPU overhead and improving throughput.

                  Jumbo Frames

                  Recommendation: Use jumbo frames with MarkLogic Virtual Machines, ensuring that all components of the physical network support jumbo frames and are configured with an MTU of 9000.

                  Rationale: Jumbo frames increase the payload size of each TCP/IP frame sent across a network. As a result, the number of packets required to send a set of data is reduced, reducing overhead associated with the header of a TCP/IP frame. Jumbo frames are advantageous for optimizing the utilization of the physical network. However, if any components of the physical network do not support jumbo frames or are misconfigured, large frames are broken up and fragmented causing excessive overhead.

                   


                  Analyzing Resource Contention                                                                               


                  Processor Contention

                  Virtual Machines with CPU utilization above 90% and CPU ready metrics of 20% or higher, and really any CPU ready time, are contentious for CPU.

                  Key metric for processor contention is %RDY.

                  Memory Contention

                  Metrics for memory contention requires an understanding of VMware memory management techniques.

                  . Transparent Page Sharing
                  . Enabled by default in the hypervisor
                  . Deduplicates memory pages between virtual machines on a single host

                   

                  . Balloon Driver
                  . Leverages the guest operating systems internal swapping mechanisms
                  . Implemented as a high-priority system process that balloons to consume memory, forcing the operating system to swap older pages
                  . Indicated by the MEMCTL/VMMEMCTL metric

                   

                  . Memory Page Compression
                  . Only enabled when memory becomes constrained on the host
                  . Breaks large pages into smaller pages then compresses the smaller pages
                  . Can generate up to a 2:1 compression ratio
                  . Increases processor load during reading and writing pages
                  . Indicated by the ZIP metric

                   

                  . Swapping
                  . Hypervisor level swapping
                  . Swapping usually happens to the vswp file allocated for the VM
                  . Storage can be with the VM
                  . Storage can be in a custom area, local disk on the ESXi host for instance

                   

                  . Can use SSD Host Cache for faster swapping, but still very bad
                  . Indicated by the SWAP metric

                   Free memory metrics less than 6% or memory utilization above 94% indicate VM memory contention.


                  Disk Contention

                  Disk contention exists if the value for kernelLatency exceeds 4ms or deviceLatency exceeds 15ms. Device latencies greater than 15ms indicate an issue with the storage array, potentially an oversubscription of LUNs being used by VMFS or RDMs on the VM, or a misconfiguration in the storage processor. Additionally, a
                  queueLength counter greater than zero may indicate a less than optimal queue size set for an HBA or queuing on the storage array.

                  Network Contention

                  Dropped packets, indicated by the droppedTx and droppedRx metrics, are usually a good sign of a network bottleneck or misconfiguration.

                  High latency for a virtual switch configured with load balancing can indicate a misconfiguration in the selected load balancing algorithm. Particularly if using the IP-hash balancing algorithm, check the switch to ensure all ports on the switch are configured for EtherChannel or 802.3ad. High latency may also indicate a
                  misconfiguration of jumbo frames somewhere along the network path. Ensure all devices in the network have jumbo frames enabled.

                  References                                                                                                                                        

                  VMware Performance Best Practices for vSphere 5.5 - http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.5.pdf

                  VMware Resource Management - http://pubs.vmware.com/vsphere-51/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-51-resource-management-guide.pdf

                  Summary

                  In releases 8.0-5.3 and 7.0-6.4, we've added code to detect lagged operations and log warnings.

                  Fast Query Timestamp

                  Every forest has a notion of a "fast query timestamp", also sometimes referred to as a "nonblocking timestamp". This is the maximum timestamp at which a query can run without waiting for the forest's timestamp to advance; it indicates the most current time at which the forest has complete state to answer a query. There are several reasons for forests to have this timestamp.

                  The first has to do with transaction commits, during which the forest places a finger on the commit timestamp for the duration of the commit. The point of this is to ensure that queries perceive committed transactions to be atomic. There can be multiple (even many) transactions with a finger on various timestamps at any given point in time.

                  The second has to do with asynchronous database replication, in which case each replicated journal frame is accompanied by an appropriate fast query timestamp from the master forest, sampled when the frame was journaled. The forest in the replica database will advance its fast query timestamp to track the journal stream. If replication is interrupted for some reason, the timestamp will stay fixed until replication resumes.

                  There is now code to detect and warn that a forest's fast query timestamp has lagged the cluster commit timestamp by an excessive amount. For forests in a master database, this means 30 seconds. For forests in a replica database, the warning starts at 60 seconds. The complaint frequency automatically backs off to once every 5 minutes for each forest, with one final warning when the lag is no longer excessive to properly identify when the issue was resolved. The text of the warning looks like

                  2016-09-09 10:37:01.225 Warning: Forest content-db-001-1 fast query timestamp (14734353609140210) lags commit timestamp (14734354209281070) by 60014 ms

                  This warning will help flag any problems with overly long transactions that can hold up queries. The warning helps flag the lag issue earlier, rather than later.

                  Journaling

                  There are times when it takes a very long time to write a journal frame, which may result in a lagged timestamp. Reasons can include underprovisioned disk, oversubscribed VM environments, VM migration, etc. These incidents will now get flagged by a new warning message like the following whenever writing a journal frame exceeds 30 seconds:

                  2016-08-22 21:52:18.636 Warning: forest content-db-f1 journal frame took 38882 ms to journal: {{fsn=99181221, chksum=0xbc959270, words=33}, op=commit, time=1471917138, mfor=15947305669564640543, mtim=14719107644243560, mfsn=99181221, fmcl=16964678471847070106, fmf=7272939609350931075, fmt=14445621385518980, fmfsn=103616323, sk=13614815415239633478, pfo=233342552}

                  Canary Thread

                  Another addition is a canary thread that wakes up each second, checks how long it was asleep, and warns if it was longer than 10 seconds. That message looks like

                  2016-09-09 10:37:01.225 Warning: Canary thread sleep was 12345 ms

                  Further Reading

                  Information on these and other events, including how to control the time limits:

                  MarkLogic Knowledgebase - IO Statistics: New performance trace events

                  Information on database replication lag:

                  MarkLogic Knowledgebase - Database Replication Lag Limit Explained

                  Summary

                  Customers will sometimes run into strange issues when submitting requests to a WebDAV application server from HTTP clients.

                  Detail

                  The purpose of WebDAV application servers in MarkLogic is to support the additional HTTP methods provided by the WebDAV RFC. These extra methods include MKCOL, PROPFIND, and COPY. In order to insert a document into a directory that does not exist, the WebDAV client needs to use these HTTP methods to check if the directory exists (PROPFIND), create the directory (MKCOL / PUT), and then insert the document (PUT).

                  The WebDAV RFC states: "When the MKCOL operation creates a new collection resource, all ancestors must already exist, or the method must fail with a 409 (Conflict) status code. For example, if a request to create collection /a/b/c/d/ is made, and neither /a/b/ nor /a/b/c/ exists, the request must fail."

                  Simple HTTP clients use only a subset of the methods necessary to properly interact with a WebDAV application server; It is recommended that you only use a WebDAV specific client against a MarkLogic WebDAV application server.

                  Introduction

                  Debugging in general is difficult - especially so if you're not using the right tools. This knowledgebase article provides a brief introduction to the tools available for debugging MarkLogic's semantic features, including the Optic API, SQL on MarkLogic Server, and SPARQL on MarkLogic Server.

                  Semantic Stack Debugging Tools

                  Unlike traditional MarkLogic searches, which do filtered or unfiltered searches after index resolution, queries in MarkLogic's semantic stack use an optimizer to generate a query plan.

                  1. For the optimizer:
                    1. You can see your current optimizer settings in your logs if you enable the "Optic Optimization" and "Optic Statistics" (or their SPARQL counterparts, "SPARQL Cost Analysis" and "SPARQL Value Frequencies") trace events. Unfortunately, you won't be able to do much with these optimizer settings on your own - but MarkLogic Support will likely ask you for that output if you're running into issues with the optimizer for further debugging on our side.
                    2. While you might not be able to change individual optimizer settings on your own, be aware that as an alternative you can remove the optimizer from the equation entirely by setting your query's optimization level to "optimize=0".
                  2. For the plan:
                    1. You can output the relevant plan programmatically with op:explain, xdmp:sql-plan, or sem:sparql-plan.
                    2. You can also output the plan to your logs by enabling the "Optic Plan" or "SPARQL AST" trace events.
                    3. Plan output lets you know how many and what kind of joins are involved in your query, which will then enable you to consider ways to execute your query with fewer or more efficient joins if you're looking for more performance.
                  3. For requests in general, MarkLogic Server also provides Request Monitoring, which enables you to configure logging of information related to requests, including metrics collected during request execution.

                  Takeaways

                  • If you're looking to debug Optic, SQL, or SPARQL application code on your own, op:explain, xdmp:sql-plan, and sem:sparql-plan are your go-to tools to see how many and what kind of joins your query is currently using.
                  • If you need additional help, MarkLogic Support will ask you to enable the "Optic Optimization", "Optic Statistics", "Optic Plan", and "Optic Execution" trace events (or their SPARQL counterparts "SPARQL Cost Analysis", "SPARQL Value Frequencies", "SPARQL AST", and "SPARQL Execution").
                  • For customers using MarkLogic 10.0-8 or later, “Optic Summary” trace event is available. 

                  References:

                  Introduction

                  The session timeout in the XDBC Admin UI page controls the timeout of the server session object. This is different from the XCC client session object. The XCC client session object does not close itself based on a timeout.

                  Details

                  Every time a request comes into an XDBC application server, the counter on its associated server session timeout is reset. For example, given a 60 second server session timeout, that 60 second counter is reset when a second request is received, regardless of how long an XCC client thread might sleep.

                  Multi-Statement Transactions

                  However, if the XCC client code were to instead use multi-statement transactions, then you'd see the following error:

                  SEVERE [1] (SessionImpl.throwIllegalState): Cannot commit without an active transaction

                  Multi-statement transaction client code needs to specifically include a call to Session.setTransactionMode at the beginning, and a call to Session.commit at the end. When using multi-statement transactions, if the client thread's sleep duration is set to be less than the XDBC application server session timeout, then that request should complete without error.

                  Introduction

                  An issue that has been brought to our attention in the past is one where a nightly scheduled backup is set up by a System Administrator and left to diligently work away in the background. Elsewhere, another team (or System Administrator) performs a major product upgrade or makes changes to the application such that a large reindexing process needs to kick off.

                  It is generally recommended that processes such as reindexes are treated as maintenance tasks and scheduled to run outside peak hours; primarily due to the additional load they may place on both the CPU and the IO subsystem. It's equally common for backup processes to be scheduled to run outside peak hours - largely for the benefit of capturing a complete backup of all the updates that were made during that particular working day.

                  The purpose of this Knowledgebase article is to describe the outcome of a scenario where a reindex is taking place and - during this time - a scheduled backup takes place.

                  The time cost of reindexing

                  MarkLogic Server has been designed to allow the process of reindexing to be stopped without having any negative impact on any of your queries. That is to say: a large and complex reindex may take a significant amount of time and as such, may need to be spread over multiple maintenance windows.

                  In the event where a reindex process does not run through to completion, your existing queries (that ran fine before) will still continue to run because they will be able to take advantage of the current on-disk index data.

                  This feature allows for easier upgrading to take place between newer releases of the product and allows you to iteratively add new functionality to your MarkLogic Application without breaking existing features.

                  When a scheduled backup takes place

                  For any mission critical situation, it's your data that is key. Any backup process is considered more important (and higher priority) than a reindex due to this fact. So in the event where a scheduled backup runs, the workload of the reindexer will be placed on hold until the backup has run to completion.

                  After the backup has completed, the reindexer (if enabled) will automatically start up again from where it left off.

                  The problem with this approach

                  The main issue with this process is that reindexing is both a time bound activity and one which provides regular feedback to the user - both through the Admin Interface on port 8001 and also through ErrorLog messages.

                  Concern can easily be caused when a process such as reindexing takes place along with the expectation that it is due to complete within a specified amount of time, only to find that this has not taken place.

                  When the user investigates the ErrorLogs, they'll see evidence of the backup starting, but they will likely notice a period of time where no progress from the reindexer is reported.

                  This - understandably - can be a cause for concern.

                  Admin UI Improvements

                  From MarkLogic 8.0-3 we added further messaging to the Admin UI on port 8001 to help to address this situation; from this release and above, if a backup takes place, you will now see a message indicating that rebalancing (or reindexing) have been disabled during either a backup or a restore:

                  Summary

                  It’s a lot easier to think of what a directory is useful for rather than to state what it is. Directories are a powerful and efficient way to group documents in a database. While collections are also powerful and efficient, directories have always seemed more natural because they have an obvious analog in filesystem directories and because they can effectively be serialized through the document URI. 

                  Details

                  Like documents, directories can have properties. You may have run into this while performing bulk loads as the server tries to keep the last modified date for the directory reflective of the most recent children documents. You can also put your own properties on the directory, which can be quite handy for assigning properties that are common to a group of documents. 

                   Like documents, directories have permissions. You can control the documents users can “see” through webdav by controlling access at the directory level, and you can also assign a default permission level on a directory that all of its children documents will inherit. This is especially useful if you are using permissions on your stored modules and editing them – you can simply load with the appropriate URI, and all the right permissions will be assigned at load time.

                  Directories seem like documents in some regards but, when you create a directory in MarkLogic, it is reported by the database status screen as a fragment rather than a document. Furthermore, input() and doc() do not return directories, they only return document nodes. You could have a million directories in the database and doc() will return an empty sequence.

                  Directory Properties

                  The properties of a directory at a URI will identify itself as a directory.  For example, the properties of the root directory xdmp:document-properties("/") will report 

                  <prop:properties xmlns:prop="http://marklogic.com/xdmp/property">

                    <prop:directory/>

                  </prop:properties>

                  You can see how many directories you have by executing

                    xdmp:estimate(xdmp:document-properties(cts:uris())[.//prop:directory])

                  You can list all the directory URIs in the database by executing 

                    for $uri-prop in xdmp:document-properties(cts:uris())[.//prop:directory]  return base-uri($uri-prop)  

                  Directory Creation

                  If the database is configured to create directories automatically then a document is insert will result in directories being created if they do not already exists (based on the URI of the document).  

                  Warning: automatic directory creation has a performance penalty as document insert causes additional write locks to be acquired for all directories implied by the URI;  This may have the effect of serializing inserts.  Unless there is a need to have automatic directory creation turned on (such as for webdav), it is recommended that the directory creation setting on the databases be set to manual.

                  You can manually create a directory by calling  

                      xdmp:directory-create( $uri )

                  Directory Deletion

                  WARNING: xdmp:directory-delete( $uri ) deletes not only the directory, but also deletes all of its child and descendant documents and directories from the database.

                  Use caution when calling this function. Bulk delete calls like xdmp:directory-delete and xdmp:collection-delete delete the relevant documents' term lists without knowledge of those documents URIs. Without the URIs, the server can't determine the mimetypes of the corresponding documents. Without the mimetypes, it cannot prove that the corresponding documents are or are not modules. Since the server doesn't know if modules are being deleted, module caches will consequently always be invalidated when calling bulk delete calls like xdmp:directory-delete and xdmp:collection-delete, regardless of the contents of either the relevant directory or collection. If your application requests cannot afford the response time impact of module cache re-warming, you should instead call xdmp:document-delete for each document in the relevant directory or collection instead of calling bulk delete operations like xdmp:directory-delete and xdmp:collection-delete.

                  If all you want to do is delete a directory fragment, you need to just remove the node from it's property.

                      xdmp:node-delete(xdmp:document-properties( $uri ));

                  A golden record is a single, well-defined version of all the data entities in an organizational ecosystem.

                  In the Data Hub Central, once you have gone through the process of ingest, map and master, the documents in the sm-<EntityType>-mastered collection would be considered golden records.

                  • For instance, if your EntityType is customer, then every document in the collection sm-Customer-mastered would be a golden record.

                  So, you would:

                  • First Ingest the data using Ingest step (or any other ML ingest mechanisms)
                  • Create Entity Models
                  • Map your data using Map steps
                  • Create match/merge rules in the match/merge step or master step
                  • Run the Master step
                  • The outcome of the merge step would be called a golden record which you can find in the collection specified above

                  How to find these golden records?

                  On the query console, if you choose the data-hub-FINAL database and run the following query, the resulting documents would be the golden records: fn:collection("sm-Customer-mastered")

                  Note: The above example assumes the EntityType as "Customer" - you can replace that with your own EntityType when you run the query)

                  Summary

                  MarkLogic Server can ingest and query all sorts of data such as XMLtextJSON, binary, generic, etc. - and it can do so with both great speed and scale. This article contains a simple rule of thumb to determine when you should do some query or data model tuning to take advantage of the speed and scale that MarkLogic Server can deliver.

                  Details

                  A very common development practice is to build your application against a smaller subset of your data, then to deploy that application against your entire production dataset. That practice works when using MarkLogic Server, too. However, do keep in mind that MarkLogic Server was engineered to deliver sub-second response time across very large amounts of data. In general, if your queries against MarkLogic Server are taking multiple seconds to return results when running against your development data subset, then it's very, very likely those same queries will take tens of seconds or even fail to return at all due to timing out when run against your larger production dataset.

                  When building your appliction, if you're seeing queries that take significantly greater than one second to return, you should absolutely begin the effort to optimize either the relevant query, or your data model, or both - especially if runtime increases as the amount of data increases.

                  Examples

                  1) For query tuning, consider the following:

                  for $source-doc in fn:collection($collection)
                  where
                  $source-doc/property::rowkey > $start-rowkey
                  order by $source-doc/property::rowkey ascending
                  return fn:document-uri($source-doc)

                  This code needs to step through a result set made up of every document in a given collection to evaluate whether or not each of those documents has got a rowkey value greater than a given value. The runtime of the query as written will consequently increase as both the number of documents and/or the number of evaluations increases.

                  In contrast:

                  for $source-doc in
                  cts:search(fn:collection($collection),
                  cts:properties-query(cts:element-range-query(xs:QName("rowkey"), ">", $start-rowkey)),
                  ("unfiltered")
                  )
                  order by $source-doc/property::rowkey descending
                  return fn:document-uri($source-doc)

                  Here, instead of iterating over a result set made up of every document in a given collection, and evaluating each document in that results set to see if they match a given criteria, the cts:search used here will return a result set composed of only the subset of documents that are both in a given collection that also match supplied query terms (in this case, rowkey > $start-rowkey). Note that you'll also need to define an element range index of type int on the rowkey element to take advantage of the resulting much faster index resolution instead of iteration/evaluation, otherwise this query will return the error XDMP-ELEMRIDXNOTFOUND.

                  In addition to avoiding overly large result set size via query terms, you'll also want to consider the kind of query you'll want to run and what that means in terms of your data model. It's actually possible to run queries both filtered and unfiltered (note the presence of "unfiltered" option in the query revision above). While it's possible to run your queries filtered (where the slower filtering pass will remove any false positives returned during the faster unfiltered index resolution phase of your search), for maximum performance you'll want to construct your data model in such a way that unfiltered queries will return accurate results without the need for a filtering pass. This leads us to:

                  2) Data model tuning - see our Best practices around data modeling and data loading Knowledge Base article, as well as the "XML and JSON Data Modeling Best Practices" on-demand MarkLogic University course, available here.

                  Other Resources

                  There's much, much more information in our Query Tuning and Performance Guide documentation. Additionally, to see how a given expression will be processed, you'll want to make use of xdmp:plan. To optimize query performance, you'll want to make use of xdmp:query-meters and xdmp:query-trace.

                   

                  MarkLogic Server is designed to scale horizontally, and goes to great effort to make sure queries can be parallelized independently of one another. Nevertheless, there are occasions where users will run into an issue where, when invoked in parallel, some subset of their queries will take much longer than usual to execute. The longer running parallel invocations are almost always due to those some queries' runtime being informed by

                  a. the runtime of the query in isolation (in other words, absent any other activity on the database) but also

                  b. the time that query instance spends waiting on resources necessary for it to make forward progress.

                  Resolving this kind of performance issue requires a careful analysis of how queries are using resources as they work there way through your application stack. For example, if you have a web or application server in front of MarkLogic Server, how many threads do you have configured? How does that number compare to the number of threads configured on the MarkLogic application server to which its connected? If the number of MarkLogic application server threads is much smaller than the number of potential incoming requests, then some of your queries will be fast because all they need to do is execute - and some of your queries will be slower to run because, in addition to the time needed to execute, they'll also need to wait for a MarkLogic application server thread to free up. In this case, you'll want to bring the number of threads into better alignment with one another - either by reducing the number of threads on the web or application server in front of MarkLogic, or increasing the number of MarkLogic application server threads - or both.

                  You'll want to try and minimize the amount of time queries spend waiting for resources, in general. Application server threads are just one example of resources on which queries can sometimes wait. Queries can also wait for all sorts of other resources to free up - threads, RAM, storage I/O, read or write locks, etc. Ultimately, if you're seeing a performance profile where a single query invocation if fast but some subset of parallel invocations is fast and some slow (sometimes seen with higher query runtime averages and larger query runtime standard deviations), then you're very likely to have a resource bottleneck somewhere in your application stack. Resolving such a bottleneck will involve some combination of increasing the amount of available resource, reducing the amount of parallel traffic, or improving the overall efficiency of any one instance of your query.

                  Background:

                  Due to concerns around high availability, every node in a MarkLogic Server instance has its own copy of configuration files like databases.xml, hosts.xml, etc. Consequently, it's important for each node to make sure it's working off the correct version of the various configuration files.

                  This is accomplished by using the Admin/config-lock.xqy library to make sure the appropriate configuration file is locked before they are updated to make sure that multiple file requests through the Admin API or REST Management API don’t corrupt the configuration files. As a consistency check, before locking, the Admin/config-lock.xql library makes sure that the timestamps of the configuration files are more recent than the timestamps in the lock file. This is done to make sure that if someone else has acquired the lock and updated the files, the lock won’t be returned until the files are consistent.

                  Problem Statement:

                  If you restore the "App-Services" database from another cluster, you can wind up with timestamps in the lock file that bear no useful relationship to the actual configuration files. This is because the "App-Services" database is where the lock file /config-lock/config-timestamps.xml is located, which contains the timestamps of the configuration files from the last time they were locked - for the restore target cluster. Restoring the "App-Services" database from a different source cluster will overwrite the restore target's /config-lock/config-timestamps.xml file. This causes the error "MANAGE-TIMESTAMPOLD: Config files out of date on host" - which is triggered when you try to perform any subsequent update on the restore target cluster's MarkLogic Server configuration files.

                  Note: In general, this error is triggered by any PUT or POST operation to the REST Management API, with the exception of some of the security endpoints, which actually update the security database instead of the configuration files.

                  Workaround:

                  In the restore target cluster showing the "MANAGE-TIMESTAMPOLD" error, deleting the lock file “/config-lock/config-timestamps.xml” should fix the problem because that same file will be re-created with the correct timestamps.

                  Note: it is very important to make sure no one is running Admin updates before deleting the lock file, as the timestamps corresponding to those updates will be lost when you delete the lock file.

                  Best Practice:

                  The "App-Services" database is one of the default MarkLogic databases, used to track configuration file timestamp information local to the cluster on which it resides. It is not recommended to restore "App-Services" databases across MarkLogic clusters.

                  Introduction:

                  When trying to restore from a backup previously taken, the  XDMP-BACKDIRINUSE error message may sometimes be encountered:

                      XDMP-BACKDIRINUSE - Backup data directory currently has a backup job in progress

                  As described, the most common occurrence of this message is when a restore is attempted while a backup task is running.  However, you may also encounter this error when another process has the backup directory locked.

                  Solution:

                  To resolve this error, you will need to first determine if there is, indeed, a backup running to the same directory, or if the directory is locked by another process.

                  If there is another backup running, wait for it to complete or kill it, and attempt the restore again.

                  However, if there is no other backup task running, check if there are any files in the backup directory with the prefix "Job.*"  (This may happen when the backup files were copied during a running backup job)

                  For example:

                  -rw-r--r-- 1 mlogic dba  4747 May 20 15:48  Job.F5CDF0424BDC0DE1

                  -rw-r--r-- 1 mlogic dba  4747 May 20 15:48  Job.B856D24DED41A543 

                  When there are files which start with Job.* in the backup directory, the server assumes that there is another backup job in progress and will throw the XDMP-BACKDIRINUSE error. Deleting these files from the directory (or renaming them) and performing the restore again should get you past this error.

                  If neither of these solutions are sufficient to get past this error, you should look for external (non-MarkLogic) processes that might be holding a lock on database backup files, such as file backup or virus scan programs. If these exist, either wait for the processes to complete or kill them, and then attempt the restore again.

                   

                   

                   

                   

                  Introduction

                  It is possible to encounter an XDMP-RIDXTOOBIG error during a database merge - for example, you may see an error that looks similar to: 

                  XDMP-FORESTNOT: Forest ... not available: XDMP-FORESTERR: Error in merge of forest ...: XDMP-RIDXTOOBIG: Too many entries in range index.

                  Encountering this error may result in the forest going offline.

                  Cause

                  The error XDMP-RIDXTOOBIG means that there are too many entries in the range index. Range indexes in MarkLogic server are limited to 2^32 (~4 billion) entries per stand. 

                  During a merge, if the resulting stand will have a range index with more than 2^32 entries, then the merge will fail with the above mentioned error and the forest will be taken offline offline.

                  Solution

                  One way to avoid encountering the XDMP-RIDXTOOBIG error would be to set the 'merge-max-size' of the database in question to a size where you know it will not unlikely hit the range index entries limit.  A value that we often recommend for the 'merge-max-size' setting is 32GB. The 'merge-max-size' setting will enforce an upper limit of 32GB on the size of any individual stand in the forest.  MarkLogic Server does this by managing merges so that a merge will not occur if the resultant stand size would be bigger than that size.

                  Note: In MarkLogic 7 and later releases, the default value for merge-max-size is 32GB, which is recommended as it provides a good balance between keeping the number of stands and preventing very large merges from using large amounts of disk space.

                  Links to other related documentation:

                  MarkLogic Server's Administrators Guide section on 'Understanding and Controlling Database Merges'

                  Knowledgebase article on 'Migrating to MarkLogic 7 and understanding the 1.5x disk rule'

                  Knowledgebase article on 'Range Indexes and Mapped File Initialization Errors'

                   

                   

                   

                   

                   

                   

                  xdmp:value() vs. xdmp:eval():

                  Both xdmp:value() and xdmp:eval() are used for executing strings of code dynamically. However, there are fundamental difference between the two:

                  • The code in the xdmp:value() is evaluated against the current context - if variables are defined in the current scope, they may be referenced without re-declaring them
                  • xdmp:eval() creates an entirely new context that has no knowledge of the context calling it - which means one must define a new XQuery prolog and variables from the main context. Those newly defined variables are then passed to the xdmp:eval() call as parameters and declared as external variables in the eval script

                  Function behavior when used inline:

                  Although both these functions seem to fulfill the same purpose, it is very important to note their behaviors changes when used inline. Consider the following example:

                  declare namespace db = “http://marklogic.com/xdmp/database”;
                  Let $t:= <database xmlns=”http://marklogic.com/xdmp/database” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”>
                  <database-name>aptp-dev-modules</database-name>
                                </database>

                  return

                  fn:fold-left(function($a, $b){ xdmp:value(fn:concat(“$t/db:”, “database-name”)) }, (), (1,2,3))
                  (or)
                  fn:fold-left(function($a, $b){ $t/xdmp:value(fn:concat(“db:”, “database-name”)) }, (), (1,2,3))

                  When a function is called inline, the expressions inside the function cannot be statically compiled to function items because the values of the closed-over variables are not yet available. Therefore, the query parser would have to look for any variable bindings during dynamic analysis to be able to evaluate the expression. Ideally, variables from the main context are passed to the function call as parameters. However, in the case of xdmp:value(), the function is expected to have the needed context to evaluate the expression and therefore the expression is evaluated without looking for any variable bindings - which can ultimately lead to unexpected behavior. This explains why the first return statement in the above example returns an ‘empty sequence’ and the second one returns the correct results because the variable is being referenced outside of the xdmp:value call. In other words, when used inline - xdmp:value() cannot reference variables declared in the current scope.

                  In contrast, in the case of xdmp:eval, the parser would know to look for variable bindings during dynamic analysis as this function is not expected to have the knowledge of the calling context. Consequently, when using xdmp:eval the context needs to be explicitly created and the variables explicitly passed to the call as parameters and declared as external variables.

                  Summary

                  Default Inter-node communication within MarkLogic cluster is done by XDQP (XML Data Query Protocol) on non-secure channel, with the assumption that all nodes reside within same secure network. However, you can set the "XDQP SSL enabled"  flag to true in order to make all inter-node communication occur over SSL (Secure Socket Layer) channel. This article describes different component of XDQP SSL enabled Performance.

                  Few things about MarkLogic XDQP SSL Enabled...

                  1. MarkLogic Server, by default, has TLS enabled and forced with FIPS mode (Federal Information Processing Standards). 
                  2. MarkLogic uses OpenSSL library (open source) for Encryption implementation - industry standard for all major/most of the software vendors.
                  3. Enabling SSL on XDQP does NOT change number of initial TCP channels (3) on XDQP Port (7999), rather MarkLogic will establish 3 SSL Channels instead of TCP channels. Overhead to establish 3 SSL channels, are extra SSL handshakes in addition to the 3-way TCP handshakes during cluster startup (which are needed in non-SSL scenario as well).
                  4. Once SSL channel is established, nature and number of XDQP messages on top of SSL does not change purely because of SSL configuration (bandwidth usage will only change based on load).

                  MarkLogic Internal XDQP SSL Enable Performance Test results...

                  The following results are from a single MarkLogic Lab infrastructure test.  Results may vary depending on a number of other factors, some described later in this article.

                  XDQP+SSL:

                  • a.(1 Threads) :
                              4.124 % overhead
                  • b.(4 Threads) :
                              1.435 % overhead
                  • c.(16 Threads) :
                              5.034 % overhead
                  • d.(64 Threads) :
                              -3.395 % overhead

                  Based on our testing we do not expect XDQP SSL Performance difference of more then ~5%;

                  Understanding Performance Cost with SSL ?

                  While SSL does not add overhead to network traffic (except initial SSL handshake), Encryption itself has a Processing Cost.

                  1) Hardware/CPU support for Encryption: Once channel is established (cluster startup) XDQP data communicated are encrypted with symmetrical cryptography using OpenSSL library. However most of the modern CPUs do provide hardware based encryption support, accelerating the Encryption itself. If enabling SSL over XDQP results in considerable Performance cost, you should check with your hardware provider whether your hardware has encryption support and ask to see their test data.  

                  2) Resource Availability for Current Load pattern: Make sure that you have sufficient CPU and Memory on your environment with existing load so that enabling XDQP SSL does not tip you over the edge.

                  3) Firewall: There are various 3rd party tests that confirms that a Firewall (and other SSL inspection software) do contribute to SSL Performance (once SSL traffic grows beyond certain limit). If you are running into Performance issue - we recommend to test the Performance after disabling the Firewall & other Network inspections software across cluster (including Routers) and measure the difference, to see if that is the direction of the Performance issue resolution.

                  Further Reading

                  Summary

                  MarkLogic uses a proprietary protocol for internal communication between nodes in a cluster. This article is describes the XML Data Query Protocol (XDQP) and related configuration settings. 

                  XDQP Version:

                  Each MarkLogic Server release communicates with at a specific XDQP version.  The following XDQP version mismatch warning may be logged if there is a node in your cluster that is running at a different MarkLogic Server version and those instances run at different incompatible XDQP versions.  

                  2016-01-13 01:14:04.292 Warning: XDQPServerConnection::init(172.18.128.26:7999-172.18.128.25:59736): XDMP-XDQPVER: XDQP version mismatch: engrlab18-128-026.engrlab.marklogic.com,7000502 172.18.128.25,7000100

                  In MarkLogic 8 and earlier versions, this is an error condition that must be corrected. To learn more about the XDQP version mismatch, please refer to our KB article 'What is XDMP-XDQPVER, and what should be done about it? '

                  XDQP Connections:

                  XDQP is Application protocol that runs at the TCP Layer, using Port 7999 and 7998. Port 7999 is used for Heartbeat check as well as E-node/D-node data communication. Port 7998 is mainly used for communication between clusters for Replication.  This article mainly will focus on port 7999 XDQP communication.

                  XDQP Connections are created when MarkLogic Server starts up and, barring any exception condition, remains persistent until the Server is shutdown. During steady state cluster operations, no XDQP connections are being opened or closed.

                  During startup, For each node in cluster, MarkLogic Server will create 3 connections to other Node's on their port 7999. Symmetrically other node will also create 3 connections to first Node's 7999 port. Therefore, each node in the cluster will have 6x(N-1) connections created for Cluster with N number nodes. Multiple connection channels (3) for XDQP port 7999 are intended for Performance reasons and number of channels are NOT configurable.

                  MarkLogic Server sends heartbeat message to each node in the cluster every second (NOT configurable). The heartbeat message synchronizes all servers to the same clock (transaction timestamp), keeps a consistent state of the 'quorum', propagates configuration changes, and may carry Query Data. If the heartbeat messages are interrupted for an extended period of time, a host may become disconnected from the cluster (Host Timeout).

                  Heartbeat messages are cycled across all 3 connection channels; Since MarkLogic sends heartbeat messages to each specific node every second, we would expect to see traffic at least once on each connection channel every 3 seconds.

                  Heartbeats from other hosts are dropped if clock skew is detected beyond acceptable limit, causing hosts to disconnect.

                  2016-12-08 07:13:46.475 Debug: XDQPServerConnection::run: SVC-SOCRECV: Socket receive error: getpeername 172.18.128.26:7999: Transport endpoint is not connected

                  2016-12-08 07:13:46.475 Debug: Stopping XDQPServerConnection, client=ML03, conn=172.18.128.26:7999, requests=555, recvTics=0, sendTics=4171, recvs=105709240, sends=54185927, recvBytes=274440048295, sendBytes=36220099744

                  2016-12-08 07:13:48.644 Debug: Destroying XDQPServerSession

                  New connections are also rejected if clock skew beyond acceptable limit are detected (host timeout). Other Host is declared down if the problem persists with new connect.

                  2016-11-02 04:41:13.666 Debug: Stopping XDQPClientConnection, server=ML05-V, conn=172.18.128.25:45688-172.18.128.26:7999, sendTicks=49, recvTicks=0, sends=19201744

                  XDQP Maximum Connections:

                  MarkLogic will not handle incoming XDQP Connection request from more then 255 peer nodes, which effectively makes the MarkLogic cluster upper limit 256 nodes. 

                  Connections are made as follows:

                  - all nodes in a cluster will connect to each other

                  - bootstrap nodes will connect to bootstrap nodes in another cluster when the clusters are connected

                  - d-nodes in one cluster will also connect to d-nodes in another cluster when their forests are connected for database replication

                  - in addition, bootstrap nodes will be connected to all the d-nodes in a connected cluster when those d-nodes are involved in database replication

                  If a node reaches the maximum connections it will throw an XDMP-XDQPMAX error.

                  Timeouts Configuration Items

                  XDQP Timeout:

                  Is a Group level configuration item that "specifies the time, in seconds, before a request between a MarkLogic Server evaluator node (the node from which the query is issued) and a MarkLogic Server data node (the node from which the forest data is retrieved) times out."

                  A timeout on the XDQP channel is at the application level, meaning XDQP timeout does NOT necessarily close the connection channel underneath for fist XDQP timeout, like you would imagine for an HTTP Timeout.

                  MarkLogic will deem XDQP connection to be broken :

                  (1) If a Send operation takes longer than XDQP timeout the XDQP connection.

                  (2) If no traffic has been Received from another host over XDQP for two times the XDQP timeout (default XDQP timeout 10 seconds x 2 = 20 Seconds).

                  (3) If there is clock skew greater then Host timeout between Hosts.

                  Once XDQP connection to a host is deemed broke, MarkLogic will throw Retry Exception and trigger XDQP Restart, closing and re-opening all 3 XDQP connections to a host. An XDQP Restart for a given host will cause all running requests accessing that host to be retried. 

                  XDQP Restart does not affect 3 connections channel *from* other host on local port 7999, as they are separate group.

                  Generally, An XDQP Restart fixes a lot of networking issues and avoids Host timeout. Host timeout is a lot severe, as it will removes other host from Cluster based on Quorum and will trigger failover.

                  By default XDQP timeout are set to 10 seconds.  To learn more about issues that may affect how you configure the XDQP timeout, refer to our KB article 'How to handle XDQP-TIMEOUT on a busy cluster'

                  Host Timeout:

                  Is a Group level configuration item that "specifies the time, in seconds, before a MarkLogic Server host-to-host request times out. The host-to-host requests are used for communication between nodes in a MarkLogic Server cluster."

                  Host timeout pertains to how long MarkLogic node can go without seeing any response from peer node/host on the XDQP channel, before MarkLogic closes the connection.

                  Other then non responding host, Network issue, Host timeout also limits the acceptable clock skew between nodes, not allowing clock skew to be greater then Host timeout.

                  By default, the host timeout is set to 3 times the XDQP Timeout = 30 seconds. The reason for default value is to allow XDQP connection issues to be resolved and fixed before a host is judged offline, triggering failover. If there is a networking problem it usually can be resolved with an XDQP restart, preventing a failover.

                  Host Initial timeout

                  Is a Group level configuration item that "specifies the time, in seconds, that an instance of MarkLogic Server will wait for another node to come online when the cluster first starts up before deciding that the node is down, and initiating failover for any forests that are assigned to that offline host."

                  Host Initial timeout pertains to establishment of initial 3 TCP channels on XDQP port 7999. By default, MarkLogic will try for 4 minutes (240 seconds) to establish a connection before giving up on a peer node.

                  Further Reading

                  XML serialization and output options

                  XML as stored in MarkLogic Server

                  MarkLogic Server starts by parsing and indexing the document contents, converting the document from serialized XML (what you see in a file) to a compressed binary fragment representation of the XML data model—strictly, the XQuery Data Model (XDM).  The data model differs from the serialized format. For example, when serialized in XML an attribute value may be surrounded by single or double quotes; in the data model that difference is not recorded.  Character references are stored internally as codepoints.

                  Therefore, when XML is returned from the database, the content will be the same, but serialization details may vary.  If it is required to return a file byte-for-byte then it can be stored in in MarkLogic server in its binary form.  However, binary documents are not indexed by MarkLogic, which means they cannot be directly searched.

                  XML as returned by MarkLogic Server

                  Though the original serialization information may not be stored in MarkLogic Server, there are a number of ways that output can be controlled when returning serialized XML from MarkLogic Server.

                  1. The XQuery xdmp:output option can be used at the code level: xdmp:output.
                  2. Output options may be used with xdmp:save when writing a file.
                  3. Output options may be specified at the app-server level:  Controlling App Server Access, Output, and Errors.

                   

                  Introduction

                  Fields are a great way of restricting what parts of your documents to search, based on XML element QNames or JSON propertyNamesFields are extremely useful when you have content in one or more elements or JSON properties that you want to query simply and efficiently as a single unit. But can you use field names you've created with XPath's fn:not ()? In other words, given a field name "test-field-name" can you do something like fn:not(//test-field-name)? Unfortunately, you can not, as the server will return an XDMP-UNINDEXABLEPATH error. There is, however, a workaround.

                  Workaround

                  The workaround is to create two fields, then to query across those two fields using cts:not-in-query ( http://docs.marklogic.com/cts:not-in-query), Consider two documents:

                  Document 1

                  xdmp:document-insert(
                    "/test/fields-001.xml",
                    <doc>
                        <content>
                            <courtcase>
                                <metadata>
                                    <docinfo>
                                        <hier>
                                            <hierlev>
                                                <heading>
                                                    <title>1900</title>
                                                </heading>
                                                <hierlev>
                                                    <heading>
                                                        <title>Volume 10</title>
                                                    </heading>
                                                    <hierlev>
                                                        <heading>
                                                            <title>test title - (1900) 10 Ch.D. 900</title>
                                                        </heading>
                                                    </hierlev>
                                                </hierlev>
                                            </hierlev>
                                        </hier>
                                    </docinfo>
                                </metadata>
                            </courtcase>
                        </content>
                  </doc> ,
                  xdmp:default-permissions(),
                  ("test", "fields")
                  )

                  Document 2

                  xdmp:document-insert(
                    "/test/fields-002.xml",
                  <doc>
                      <content>
                          <courtcase>
                                <metadata>
                                       <docinfo>
                                              <hier>
                                                    <hierlev>
                                                         <heading>
                                                                <title>1879</title>
                                                          </heading>
                                                          <hierlev>
                                                              <heading>
                                                                    <title>John had a little lamb</title>
                                                              </heading>
                                                              <hierlev>
                                                                    <heading>
                                                                          <title>Mary had a little lamb</title>
                                                                    </heading>
                                                                </hierlev>
                                                           </hierlev>
                                                     </hierlev>
                                                 </hier>
                                            </docinfo>
                                       </metadata>
                                  </courtcase>
                              </content>
                  </doc> ,
                  xdmp:default-permissions(),
                  ("test", "fields")
                  )

                  Say you're interested in three different paths:

                  1) All titles, Which Should be defined as fn:collection()//heading/title

                  2) Titles with lower-level titles, Which Should be defined as fn:collection()//hierlev[.//hierlev/heading/title]/heading/title

                  3) Titles with NO lower-level titles, Which Should be defined as fn:collection()//hierlev[fn:not(.//hierlev/heading/title)]/heading/title

                  Unfortunately, while we can express #3 in full XPath, we can not express #3 in the subset of XPath used to describe path fields. However, you can emulate #3 by defining fields corresponding to #1 & #2, then combining them in a cts:not-in-query.

                  Create the path fields
                  1. All titles

                    Create a Path Field with name "titles-all" path "//heading/title"

                  1. Titles with lower-level titles

                    Create a Path Field with name "titles-with-lower-level titles," path "//hierlev[.//hierlev/heading/title]/heading/title"

                  Emulate the XPath you want by combining these two newly created path fields in a cts: not-in-query ()

                  for $doc in cts:search(

                    fn:collection("fields"),

                    cts:not-in-query(

                      cts:field-word-query(

                        "titles-all",

                        $term

                        ) ,

                      cts:field-word-query(

                        "titles-with-lower-level-titles",

                        $term

                        )

                      )

                    )

                  return

                    xdmp:node-uri($doc)

                  Summary

                  The ampersand is a special character used to denote a predefined entity reference in a string literal.

                  XQuery W3C Recommendation

                  Can be found at http://www.w3.org/TR/xquery-30/ .

                  Section 2.4.5 'URI Literals'  states "Certain characters, notably the ampersand, can only be represented using a 'predefined entity reference' or a 'character reference'."

                  Section 3.1.1 'Literals' defines the predefined entity reference for ampersand as "&amp;".

                  Issues with the ampersand character

                  The ampersand character can be tricky to construct in an XQuery string, as it is an escape character to the XQuery parser. The ways to construct the ampersand character in XQuery are:

                  • Use the XML entity syntax (for example, &amp;).
                  • Use a CDATA element (<![CDATA[element content here]]>), which tells the XQuery parser to read the content as character data.
                  • Use the repair option on xdmp:document-loadxdmp:document-get, or xdmp:unquote.

                  For additional details and examples, please refer to XML Data Model Versus Serialized XML in the MarkLogic Server's XQuery and XSLT Reference Guide.

                  Introduction

                  This article discusses the use of XQuery in JavaScript and vice versa.

                  Using XQuery in JavaScript

                  A JavaScript module in MarkLogic can also import an XQuery library and access its functions and variables as if they were JavaScript. If you’re working primarily in JavaScript, but you have an existing library in XQuery or a specialized task that would be better suited to XQuery, you can write just that library in XQuery and import it into a JavaScript module.

                  The calling JavaScript module doesn’t need to even know that the library was implemented in XQuery. MarkLogic automatically makes all of the public functions and variables of the XQuery library available as native JavaScript functions and variables in the calling JavaScript module.  (This is what’s happening when you import one of MarkLogic’s many libraries that come bundled with the server, such as Admin or Security.)

                  This capability will be key for those developers with existing investments in XQuery that want to start using JavaScript without having to rewrite all of their libraries.

                  Using JavaScript in XQuery

                  You can't import JavaScript libraries to XQuery, but you can call xdmp:invoke with a JavaScript main module or evaluate a string of JavaScript with xdmp:javascript-eval.

                  Introduction

                  The first time a query is executed, it will take longer to execute than subsequent runs.  This extra time for the first runs become more pronounced when importing large libraries.  Why is this so and is there anything that we can do to improve the performance?

                  Details

                  When MarkLogic evaluates an XQuery script it first compiles it into a complete XQuery program. When compiling the program, the transitive closure of all imported modules are linked together and all function and variables names are resolved.

                  MarkLogic maintains a cache of pre-parsed library modules, so library modules are not re-parsed when a program is compiled. But every unique program needs to be globally linked together with all its included library modules before it is executed. The time for this linking can result in "First Runs" being slower than subsequent runs.

                  Performance recommendation

                  When using library modules, you will likely see better performance if you parameterize frequently used queries through variables and not through code. For example, use external variables in xdbc requests, or use request fields with application server requests.