Community

MarkLogic 10 and Data Hub 5.0

Latest MarkLogic releases provide a smarter, simpler, and more secure way to integrate data.

Read Blog →

Company

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up →

 
Knowledgebase
Most popular articles 
 
Newest articles 
 

Introduction

This article discusses the "Stand s has n fragments" messages that may appear in error log or system log files. These messages can appear at different log levels (Notice, Warning, Error, Critical, Alert, and Emergency) as the severity will increase as the number of fragments in a single stand increases, indicating increasing risk. 

Fragment counts and their corresponding Log levels:

 In MarkLogic 8 and MarkLogic 9, the fragment count thresholds within a single stand for the log levels are:  

  • At around 84 million fragments, MarkLogic Server will report this with a Notice level log message
  • At around 109 million fragments, MarkLogic Server will report this with a Warning level log message
  • At around 134 million fragments, MarkLogic Server will report this with an Error level log message
  • At around 159 million fragments, MarkLogic Server will report this with a Critical level log message
  • At around 184 million fragments, MarkLogic Server will report this with an Alert level log message
  • At around 209 million fragments, MarkLogic Server will report this with an Emergency level log message

At 256 million fragments your data may be at risk of becoming corrupted due to integer overflow. The log level reflects the risk and is intended to get your attention at higher stand fragment counts.

Emergency level log entries

Consider an example Error Log entry where the following information is observed:

2015-06-20 10:13:39.746 Emergency: Stand /space/Data/Forests/App-Services/00000fae has 213404541 fragments.

At all levels, the messages should be monitored and managed, but at the Emergency level, you will need to take corrective action soon.  

Corrective Actions

Note that it is the number of fragments in a stand that is important, not the number of fragments in a forest.  The actions that you take should act to decrease the size of stands in a forest. 

Some of the actions you can take:

  • If not already configured, MarkLogic databases should be configured with a merge-max-size value smaller than the current forest size (Databases created in MarkLogic 7 or MarkLogic 8 have a default value of 32GB).
  • If merge-max-size already configured for the database, decrease the value of this setting. 

Summary

Occasionally, you might see an "Invalid Database Online Event" error in your MarkLogic Server Error Log. This article will help explain what this error means, as well as provide some ways to resolve it.

What the Error Means

The XDMP-INVDATABASEONLINEEVENT means that something went wrong during the database online trigger event. There are many situations that can trigger this event, such as a server-restart, or when any of the databases has a change in configuration). In most cases, this error is harmless - it is just giving you information.

Resolving the Error

We often see this error when the user id that is baked into the database online event created by CPF is no longer valid, and the net effect is that CPF's restart handling is not functioning. We believe reinstalling CPF should fix this issue.

If re-installing CPF does not resolve this error, you will want to further analyze and debug the code that is invoked by the restart trigger.

 

 

 

Details:

Upon boot of CentOS 6.3, MarkLogic users may encounter the following warning:

:WARNING: at fs/hugetlbfs/inode.c:951 hugetlb_file_setup+0x227/0x250() (Not tainted)

MarkLogic 6.0 and earlier have not been certified to run on CentOS 6.3. This messages is due to MarkLogic using a resource that has been depreciated in CentOS 6.3. The message can be ignored, as it will not cause any issues with MarkLogic performance. Although this example points specifically points out CentOS 6.3, this message could potentially occur in other MarkLogic/Linux combinations.

Introduction

Some customers have reported seeing kernel level messages like this in their /var/log/messages file:

Jan 31 17:41:46 ml-c1-u3 kernel: [17467686.201893] TCP: Possible SYN flooding on port 7999. Sending cookie

This may also be seen as part of the output from a call to dmesg and could possibly follow a stack trace, for example:

[<ffffffff810d3d27>] ? audit_syscall_entry+0x1d7/0x200 
[<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b possible SYN flooding on port 7999. Sending cookies. possible SYN flooding on port 7999. Sending cookies.

What does it mean?

The tcp_syncookies configuration is likely enabled on your system.  You can check for this by viewing the contents of /proc/sys/net/ipv4/tcp_syncookies

$ cat /proc/sys/net/ipv4/tcp_syncookies
1

If the value returned is 1 (as per the example above), then tcp_syncookies are enabled for this host

Possible SYN flooding

A SYN flood is a form of denial-of-service attack in which an attacker sends a succession of SYN requests to a target's system in an attempt to consume enough server resources to make the system unresponsive to legitimate traffic.

Source: Wikipedia https://en.wikipedia.org/wiki/SYN_flood

You would expect to see evidence of a SYN flood when a "flood" of TCP SYN messages are sent to the host. Under normal operation, your kernel should acknowledge these incoming SYNs with a SYN-ACK, are not followed by ACK messages from the client. The process (or pattern) described above is known as Three Way Handshaking. The goal of this is to firmly establish communication on both the server and the client.

In the event of a real attack, a SYN flood will most likely originate from a fake IP address; during an attack, the client performing the "flood" is not waiting for the SYN-ACK response back from the server it is attacking.

Under normal operation (i.e. without SYN cookies), TCP connections will be kept half-open after receiving the first SYN because of the handshake mechanism used to establish TCP connections. Due to the fact that there is a limit to how many half open connections that the kernel can maintain at any given time, this is where the problem becomes characterised as an attack.

The term half-open refers to TCP connections whose state is out of synchronization between the two communicating hosts, possibly due to a crash of one side. A connection which is in the process of being established is also known as embryonic connection.

Source: Wikipedia https://en.wikipedia.org/wiki/TCP_half-open

If SYN cookies are enabled, then the kernel doesn't track half-open connections. Instead it relies on the sequence number in the following ACK datagram that the ACK follows a SYN and a SYN-ACK which establishes full communication between client and server. By ignoring half-open connections, SYN floods are no longer a problem.

In the case of MarkLogic, this message can appear if the rate of incoming messages is perceived to the kernel as being unusally high. In this case, this would not be indicative of a real SYN flooding attack, but to the TCP/IP stack it looks like it exhibits the same characteristics and the kernel responds by reporting a possible (fake) attack.

Notes from the kernel documentation

See the section of the kernel documentation for tcp_syncookies - BOOLEAN for some further information regarding this feature:

The syncookies feature attempts to protect a socket from a SYN flood attack. This should be used as a last resort, if at all. This is a violation of the TCP protocol, and conflicts with other areas of TCP such as TCP extensions. It can cause problems for clients and relays. It is not recommended as a tuning mechanism for heavily loaded servers to help with overloaded or misconfigured conditions. For recommended alternatives see tcp_max_syn_backlog, tcp_synack_retries, and tcp_abort_on_overflow.

Further down, they state:

Note, that syncookies is fallback facility. It MUST NOT be used to help highly loaded servers to stand against legal connection rate. If you see SYN flood warnings in your logs, but investigation shows that they occur because of overload with legal connections, you should tune another parameters until this warning disappear. See: tcp_max_syn_backlog, tcp_synack_retries, tcp_abort_on_overflow.

Source: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt

Tuning on a MarkLogic Server

Any dmesg output indicating "possible SYN flooding on port 7999" may appear in tandem with very heavy XDQP (TCP) traffic within a MarkLogic cluster - this link provides further detail in relation to a similar scenario with Apache HTTP server. You can tune your TCP settings to try to avoid SYN Flooding error messages, but SYN flooding can also be a symptom of a system under resource pressure. 

If a MarkLogic Server instance sees SYN flooding message on a system that is otherwise healthy and the messages occur because of normal and expected marklogic server communications, you may want to increase the backlog (tcp_max_syn_backlog) or adjust some of the other settings (such as tcp_synack_retries, tcp_abort_on_overflow). However, if SYN Flooding message only occurs on a system that is under resource pressures, then solving the resource issue should be the focus.  

How to disable SYN cookies

You can disable syncookies by adding the following line to /etc/sysctl.conf:

# disable TCP SYN Flood Protection
net.ipv4.tcp_syncookies = 0

Also note that the new setting will take only effect after a host reboot.

Further reading

Introduction

After upgrading to MarkLogic 10.x from any of the previous versions of MarkLogic, examples of the following Warning and Notice level messages may be observed in the ErrorLogs:

Warning: Lexicon '/var/opt/MarkLogic/Forests/Documents/00000006/c4ea1b602ee84a34+Lexicon' collation='http://marklogic.com/collation/zh-Hant' out of order


Notice: Repairing out of order lexicon /var/opt/MarkLogic/Forests/Documents/00000006/c4ea1b602ee84a34+Lexicon collation 'http://marklogic.com/collation/zh-Hant' version 0 to 602

Warning: String range index /space/Forests/Documents/0006ef0e/c0dc932d1b4bcaae-37c6e3905909f64e+string collation 'http://marklogic.com/collation/' out of order. 

Notice: Repairing out of order string range index /space/Forests/Documents/0006ef0e/c0dc932d1b4bcaae-37c6e3905909f64e+string collation 'http://marklogic.com/collation/' version 0 to 602

Starting with MarkLogic 10.0, the server now automatically checks for any lexicons or string range indexes that may be in need of repair.  Lexicon and Range indexes perform "self-heal" in non-readonly stands whenever a lexicon/range index is opened within the stand.

Reason

This is due to changes introduced to the behavior of MarkLogic's root collation.

Starting with MarkLogic 10.0, the root collation has been modified, as do all collations that derive from it; which means there may be some subtle differences in search ordering.

For more information on the specifics of these changes, please refer to http://www.unicode.org/Public/UCA/6.0.0/CollationAuxiliary.html

This helps the server to support newer collation features, such as reordering entire blocks of script characters (for example: Latin, Greek, and others) with respect to each other. 

Implementing these changes has, under some circumstances, improved the performance of wildcard matching by more effectively limiting the character ranges that search scans (and returns) for wildcard-based matching.

Based on our testing, we believe this new ordering yields better performance in a number of circumstances, although it does create the need to perform full reindexing of any lexicon or string range index using the root collation.

MarkLogic Server will now check Lexicons and String Range Indexes and will try to repair them where necessary.  During the evauation, MarkLogic Server will skip making further changes if any of the following conditions apply:

(a) They are already ordered according to the latest specification provided by ICU (1.8 at the time of writing)

(b) MarkLogic Server has already checked the stand and associated lexicons and indexes

(c) The indexes use codepoint collation (in which case, MarkLogic Server will be unable to change the ordering).

Whenever MarkLogic performs any repairs, it will always log a message at Notice level to inform users of the changes made.  If for any reason, MarkLogic Server is unable to make changes (e.g. a forest is mounted as read-only), MarkLogic will skip the repair process and nothing will be logged.

As these changes have been introduced from MarkLogic 10 onwards, you will most likely observe these messages in cases where recent upgrades (from prior releases of the product) have just taken place.

Repairs are performed on a stand by stand basis, so if a stand does not contain any values that require ordering changes, you will not see any messages logged for that stand.

Also, if we any ordering issues are encountered during the process of a merge of multiple stands, there will only be one message logged for the merge, not one for each individual stand inolved in that merge.

Summary

  • Repairs will take place for any stand that has been found to have a lexicon or string index that has an out-of-order and out-of-date (e.g. utilising a collation described by an earlier version of ICU) collation, unless that stand is mounted as read only.
  • Any repair will generate Notice messages when maintenance takes place.
  • Whenever a lexicon or string Range index is opened, this check/repair will take place for any string range index; lexicon call (e.g. cts:values); range query (e.g. cts:element-range-query) and during merges merges.
  • The check looking for ICU version mismatches plus items that are out-of-order, so any lexicon / string range index with older ordering (and which requires no further changes), no further action will be taken for that stand.

Known side effects

If size of the String range index or Lexicon is very large, repairing can cause some performance overhead and may impact search performance during the repair process.

Solution

These messages can be avoided by issuing a full reindex of your databases immediately after performing your upgrade to MarkLogic 10.

Introduction

Forests in MarkLogic Server may be in one of several mount states. On mounting, local disk failover forests or database replication forests should both eventually reach the sync replicating or async replicating state. There are occasions, however, where local disk failover or database replication forests will sometimes get stuck in the wait replication state. This knowledgebase article will itemize many of these wait replication scenarios, as well as the operational tactics to use in response. 

Wait replication scenarios

Wait replication as a result of lack of quorum

A quorum in MarkLogic server represents more than 50% of the total number nodes of the cluster. It's very important to note the total number of nodes - regardless of group membership, forest assignment, whether nodes are running/not running, etc. - if a machine exists in the hosts.xml configuration file and in the list of hosts in the Admin UI, it contributes to the total count.

While it's possible to run a MarkLogic cluster with only a subset of the configured nodes up, it's not a recommended configuration. In addition, if the number of active nodes in your cluster falls below the greater than 50% quorum threshold, you might run into forests in the wait replication state due to the lack of quorum.

What to do about it? You'll need to alter your cluster's configuration to meet the quorum requirement. That can mean either removing missing nodes from the cluster's configuration (essentially telling the cluster to stop looking for those missing nodes), or alternatively bringing up nodes that are currently part of the configuration, but not actively returning heartbeats (effectively letting the cluster see nodes it expects to be there). 

You can read more about quorum at the following knowledgebase articles:

Wait replication as a result of mixed file permissions

The root MarkLogic process is simply a restarter process which waits for the non-root (daemon) process to exit. If the daemon process exits abnormally, for any reason, the root process will fork and exec another process under the daemon process. The root process runs no XQuery scripts, opens no sockets, and accesses no database files. While it's possible to run the MarkLogic process as a non-root user, be very careful about forest file permissions - if your configured MarkLogic user doesn't have the necessary permissions, you might see wait replication and an inability to correctly failover to local disk failover forests when necessary - in which case you'll need to set your forest file permissions correctly to move forward. You can read more about running the MarkLogic process as a non-root user at:

Wait replication due to upgrading in the wrong order

Per our documentation, when upgrading you must first upgrade your replica environment, then subsequently upgrade your master environment.

if your cluster upgrades aren’t done in the correct order, you’re going to need to:

  1. Decouple your master and replica clusters, then stop the replica cluster

  2. Edit your replica cluster's databases.xml to remove entries with Security database replication

  3. Start the replica cluster, beginning with the node that hosts the Security forest

  4. Manually run http://(hostname of node hosting the Security forest):8001/security-upgrade-go.xqy?force=true

  5. Re-couple your master and replica clusters

You can read more about upgrading environments using database replication at:

Wait replication because you downgraded

MarkLogic Server does not support downgrades. If you do attempt to downgrade your installation, your replica forests will be stuck in wait replication.

What to do about it? As in the case of upgrading in the wrong order, you'll need to manually run http://(hostname of node hosting the Security forest):8001/security-upgrade-go.xqy?force=true. You can read more about MarkLogic Server and downgrades at:

Wait replication because your master and replica forest names don't match

By default, the "Connect Forests by Name" option is set to true. This means the server has certain expectations around how master and replica forests should be named

What to do about it? Set "Connect Forests by Name" to false, then manually connect master and replica forests. You can read more about wait replication due to forest name mismatch at:

Wait replication as a result of merge blackouts (completely disabled merges)

What is merging and why do we need merge blackouts?

MarkLogic Server does lazy deletes, which marks documents obsolete (but doesn't actually delete them). Merges are when obsolete documents are actually deleted - in bulk, while also optimizing your data. Merge blackouts prevent this deferred deletion and optimization from happening. Merge blackouts can also sometimes result in wait replication. Consider a database that has both master and local disk failover forests where you have configured a merge blackout with the “disable merges completely” option (instead of “limit merges to” option). If a node failure on any of the nodes holding some of these forests were to occur during the merge blackout period, as soon as the failed node comes back online, all the forests associated with that specific node go into a “wait replication” state until the merge blackout period ends or is manually removed.

Notes:

  • Avoid completely disabling merges
  • If you do need to control merges, it's much better to set the maximum merge size in your blackout to a smaller number (“limit merges to” option)

Introduction

When configuring database replication, it is important to note that the Connect Forests by Name field is true by default. This works great because, when new forests of the same name are later added to the Master and Replica databases, they will be automatically configured for Database Replication.

The issue

The problem arises when you use replica forest names that do not match the original Master forest names. In that case, you may find that failover events cause forests to get stuck in the Wait Replication state. The usual methods of failing back to the designated masters will not work - restarting the replicas will not work, and neither will shutting down cluster/removing labels/restarting cluster.

Resolution

In this case, the way to fix the issue is to set Connect Forests by Name to false, and then you must manually connect the Master forests on the local cluster to the Replica forests on the foreign cluster, as described in the documentation: Connecting Master and Replica Forests with Different Names.

it is worth noting that, starting MarkLogic 7, you are also allowed to rename the replica forests. Once you rename the replica forests to the same name as the forest name of the designated master database (e.g., the Security database should have a Security forest in both the master and replica), then they will be automatically configured for Database Replication, as expected.

Introduction

This article will show you how to add a Fast Data Directory (FDD) to an existing forest.

Details

The fast data directory stores transaction journals and stands. When the directory becomes full, larger stands will be merged into the data directory. Once the size of the fast data directory approaches its limit, then stands are created in the data directory.

Although it is not possible to add an FDD path to a currently-existing forest, it is possible to do the following:

1. Destroy an existing forest configuration (while preserving the data)

2. Re-create a forest with the same name & data, with an FDD added

 

The queries below illustrate steps one and two of the process. Note that you can also do this with Admin UI.

The query below will delete the forest configurations but not data.

Preparation:

1. Schedule a downtime window for this procedure (DO NOT DO THIS ON A LIVE PRODUCTION SYSTEM)

2. Insure that all ingestion and merging has stopped

3. Just to be on safer side, take a Backup of the forest first before applying this in Production

3. Detach the forest before running these queries


1) Use the following API to Delete an existing forest configuration

NOTE: make sure to set the $delete-data papameter to false().

admin:forest-delete(
$config as element(configuration),
$forest-ids as xs:unsignedLong*,
$delete-data as xs:Boolean {=FALSE}
) as element(configuration)


2) Use the following API to create a new forest  pointing to the old data directory which includes the configured FDD:

admin:forest-create(
$config as element(configuration),
$forest-name as xs:string,
$host-id as xs:unsignedLong,
$data-directory as xs:string?,
[$large-data-directory as xs:string?],
[$fast-data-directory as xs:string?]
) as element(configuration)



Here's an example query that uses these APIs:

xquery version "1.0-ml";

declare namespace html = "http://www.w3.org/1999/xhtml";

import module namespace admin = "http://marklogic.com/xdmp/admin" 
at "/MarkLogic/admin.xqy";

let $config := admin:get-configuration()

(: preserve some path values from the old forest :)

let $forest-name := "YOUR_FOREST_NAME"

let $new-fast-data := "YOUR_NEW_FAST_DATA_DIR"

let $old-data := admin:forest-get-data-directory($config, admin:forest-get-id($config, $forest-name))

let $old-large-data := admin:forest-get-large-data-directory($config, admin:forest-get-id($config, $forest-name))

return
admin:save-configuration(admin:forest-delete(
$config, admin:forest-get-id($config, $forest-name),
fn:false())),

let $config1 := admin:get-configuration()
return
admin:save-configuration(admin:forest-create(
    $config1,
    $forest-name,
    xdmp:host(),
    $old-data,
    $old-large-data,
    $new-fast-data
))

You can create and attach the forest in a single transaction. This is also possible using the admin UI (as two separate transactions) i.e. deleting only configuration of forest without data.

After attaching the forest, please re-index and data will then migrate to FDD. Note that the sample query needs to be executed on the host where the forest resides.


 

 

Introduction

Marklogic has shipped with a ReST API since MarkLogic 7.

In MarkLogic 8 the ReST API was vastly expanded, allowing ways for MarkLogic Database administrators to manage almost all common MarkLogic administration tasks over an HTTP connection to MarkLogic's ReST endpoints.

This Knowledgebase article will cover some examples of common administration tasks and will show some working examples to give you a taste of what can be done if you're using the latest version of MarkLogic Server.

While there are a significant number of examples throughout our extensive documentation in this area, many of these make use of CuRL. In this Knowledgebase article, we're going to use XQuery calls to demonstrate how the payloads are structured.

Creating a backup using a call to the ReST API (XQuery)

In the example code below, we demonstrate a call that will perform a backup of the Documents forest which places the backup in the /tmp directory.

Running the query in the above code example will return a response (in JSON format) containing a job ID for the requested task:

{
"job-id": "4903378997555340415", 
"host-name": "yourhostnamehere"
}

The next example will demonstrate a status check for a given job ID

Query the status of an active or recent job

The above query will return a response that would look like this:

{
"job-id": "4903378997555340415", 
"host-name": "yourhostnamehere", 
"status": "completed"
}

Further reading on the MarkLogic ReST API:

Alternatives to Configuration Manager

Overview

The MarkLogic Server Configuration Manager provided a read-only user interface to the MarkLogic Admin UI and could be used for saving and restoring configuration settings. The Configuration Manager tool was deprecated starting with MarkLogic 9.0-5, and is no longer available in MarkLogic 10.

Alternatives

There are a number of alternatives to the Configuration Manager. Most of the options take advantage of the MarkLogic Admin API, either directly or behind the scenes. The following is a list of the most commonly used options:

  • Manual Configuration
  • ml-gradle
  • Configuration Management API

Manual Configuration

For a single environment, the following Knowledge base covers the process of Transporting Resources to a New Cluster.

ml-gradle

For a repeatable process, the most widely used approach is ml-gradle.

A project would be created in Gradle, with the desired configurations. The project can then be used to deploy to any environment - test, prod, qa etc - creating a known configuration that can be maintained under source control, which is a best practice.

Similar to Configuration Manager, ml-gradle also allows for exporting the configuration of an existing cluster.

While ml-gradle is an open source community project that is not directly supported, it enjoys very good community and developer support.  The underlying APIs that ml-gradle uses are fully supported by MarkLogic.

Configuration Management API

An additional option is to use the Configuration Management API directly to export and import resources.

Summary

Both ml-gradle and the Configuration Management API use the MarkLogic Admin API behind the scenes but, for most use cases, our recommendation is to use ml-gradle rather than writing the same functionality from scratch.

Summary

On Internet Explorer 9 and Internet Explorer 10, application services UI should be run in Compatibility Mode.

Details:

When using the Application Services UI in Internet Explorer 9 or Internet Explorer 10, you may notice some minor UI bugs.  These minor UI bugs occur just within MarkLogic Application Services, NOT within application built with it.  These UI bugs can be avoided if you run IE 9 or IE 10 in compatibility view.

Instructions on how to configure compatibility modes in IE 9 or IE 10: 

1. Press ALT-T to bring up the Tools menu
2. On the Tools menu, click 'Compatibility View Settings' 
3. Add the domain to the list of domains to render in compatibility view.

Introduction

A question that customers frequently ask is for advice on managing backups outside the standard XQuery APIs or the web interface provided by MarkLogic.

This Knowledgebase article demonstrates two approaches to allow you to integrate the backup of a MarkLogic database into your dev-ops workflow by allowing such processes to be scripted or managed outside the product.

Creating a backup using the ReST API

You can use the ReST API to perform a database backup and to check on the status at any given time.

The examples listed below use XQuery to make the calls to the ReST API over http but you could similarly adapt the below examples to work with cURL - examples will also be given for this approach.

The process

Here is an example that demonstrates a backup of the Documents database:

Running this should give you a job id as part of the response (in this example, we're using JSON to format the response but this can easily be changed by modifying the headers elements in the above sample to return application/xml instead):

{"job-id":"8774639830166037592", "host-name":"yourhostnamehere"}

Below is an example that demonstrates checking for the status of a given backup with the job-id given in the first step:

Example: using cURL (instead of XQuery)

Adapting the above examples so they work from cURL instead, you can generate a call that looks like this:

curl -s -X POST  --anyauth -u username:password --header "Content-Type:application/json" -d '{"operation": "backup-database", "backup-dir": "/tmp/backup", "journal-archiving": true, "include-replicas": true}'  http://localhost:8002/manage/v2/databases/Documents\?format\=json

And to check on the status, the cURL payload could be modified to look like this:

{"operation": "backup-status", "job-id" : "8774639830166037592","host-name": "yourhostnamehere"}

Further reading

Summary

Customers using the MarkLogic AWS Cloud Formation Templates may encounter a situation where someone has deleted an EBS volume that stored MarkLogic data (mounted at /var/opt/MarkLogic).  Because the volume, and the associated data are no longer available, the host is unable to rejoin the cluster.  

Getting the host to rejoin the cluster can be complicated, but it will typically be worth the effort if you are running an HA configuration with Primary and Replica forests.

This article details the procedures to get the host to rejoin the cluster.

Preparing the New Volume and New Host

The easiest way to create the new volume is using a snapshot of an existing host's MarkLogic data volume.  This saves the work of manually copying configuration files between hosts, which is necessary to get the host to rejoin the cluster.

In the AWS EC2 Dashboard:Elastic Block Store:Volumes section, create a snapshot of the data volume from one of the operational hosts.

Next, in the AWS EC2 Dashboard:Elastic Block Store:Snapshots section, create a new volume from the snapshot in the correct zone and note note the new volume id for use later.

(optional) Update the name of the new volume to match the format of the other data volumes

(optional) Delete the snapshot

Edit the Auto Scaling Group with the missing host to bring up a new instance, by increasing the Desired Capacity by 1

This will trigger the Auto Scaling Group to bring up a new instance. 

Attaching the New Volume to the New Instance

Once the instance is online, and startup is complete connect to the new instance via ssh

Ensure MarkLogic is not running, by stopping the service and checking for any remaining processes.

  • sudo service MarkLogic stop
  • pgrep -la MarkLogic

Remove /var/opt/MarkLogic if it exists, and is mounted on the root partition.

  • sudo rm -rf /var/opt/MarkLogic

Edit /var/local/mlcmd and update the volume id listed in the MARKLOGIC_EBS_VOLUME variable to the volume created above.

  • MARKLOGIC_EBS_VOLUME="[new volume id],:25::gp2::,*"

Run mlcmd to attach and mount the new volume to /var/opt/MarkLogic on the instance

  • sudo /opt/MarkLogic/mlcmd/bin/mlcmd init-volumes-from-system
  • Check that the volume has been correctly attached and mounted

Remove contents of /var/opt/MarkLogic/Forests (if they exist)

  • sudo rm -rf /var/opt/MarkLogic/Forests/*

Run mlcmd to sync the new volume information to the DynamoDB table

  • sudo /opt/MarkLogic/mlcmd/bin/mlcmd sync-volumes-to-mdb

Configuring MarkLogic With Empty /var/opt/MarkLogic

If you did not create your volume from a snapshot as detailed above, complete the following steps.  If you created your volume from a snapshot, then skip these steps, and continue with Configuring MarkLogic and Rejoining Existing Cluster

  • Start the MarkLogic service, wait for it to complete it's initialization, then stop the MarkLogic service:
    • sudo service MarkLogic start
    • sudo service MarkLogic stop
  • Move the configuration files out of /var/opt/MarkLogic/
    • sudo mv /var/opt/MarkLogic/*.xml /secure/place (using default settings; destination can be adjusted)
  • Copy the configuration files from one of the working instances to the new instance
    • Configuration files are stored here: /var/opt/MarkLogic/*.xml
    • Place a copy of the xml files on the new instance under /var/opt/MarkLogic

Configuring MarkLogic and Rejoining Existing Cluster

Note the host-id of the missing host found in /var/opt/MarkLogic/hosts.xml

  • For example, if the missing host is ip-10-0-64-14.ec2.internal
    • sudo grep "ip-10-0-64-14.ec2.internal" -B1 /var/opt/MarkLogic/hosts.xml

  • Edit /var/opt/MarkLogic/server.xml and update the value for host-id to match the value retrieved above

Start MarkLogic and view the ErrorLog for any issues

  • sudo service MarkLogic start; sudo tail -f /var/opt/MarkLogic/Logs/ErrorLog.txt

You should see messages about forests synchronizing (if you have local disk failover enabled, with replicas) and changing states from wait or async replication to sync replication.  Once all the forests are either 'open' or 'sync replicating', then your cluster is fully operational with the correct number of hosts.

At this point you can fail back to the primary forests on the new instances to rebalance the workload for the cluster.

You can also re-enable xdqp ssl enabled, by setting the value to true on the Group Configuration page, if you disabled the setting as part of these procedures.

Update the Userdata In the Auto Scaling Group

To ensure that the correct volume will be attached if the instance is terminated, the Userdata needs to be updated in a Launch Configuration.

Copy the Launch Configuration associated with the missing host.

Edit the details

  • (optional) Update the name of the Launch Configuration
  • Update the User data variable MARKLOGIC_EBS_VOLUME and replace the old volume id with the id for the volume created above.
    • MARKLOGIC_EBS_VOLUME="[new volume id],:25::gp2::,*"
  • Save the new Launch Configuration

Edit the Auto Scaling Group associated with the new node

Change the Launch Configuration to the one that was just created and save the Auto Scaling Group.

Next Steps

Now that normal operations have been restored, it's a good opportunity to ensure you have all the necessary database backups, and that your backup schedule has been reviewed to ensure it meets your requirements.

Backup/Restore settings for Local Disk Failover

When configuring backups for a database, the 'include replica forests' setting is important  in order to handle forest failover events.   When 'include replica forests' is set to 'true', both the master and the replica forests will also be included in the database backup.

This KB article will go over an example failover scenario, and will show how a scheduled backup/restore works with different 'include replica forests' and 'journal archiving' settings.

Scenario

Consider a 3 node cluster with hosts Host-A, Host-B and Host-C; and a database 'backup-test' with the following forest assignments: (forests ending with 'p' are primary and those ending with 'r' are replica).  Under normal conditions, the primary forests will be in 'open' state, and the replica forests will be in the 'sync replicating' state.

Host A Host B Host C
forest-1p (open) forest-2p(open) forest-3p(open)
forest-3r (sync replicating) forest-1r (sync replicating) forest-2r (sync replicating)


Failover and Forest states

Now consider what happens when Host-A goes offline. When Host-A's primary forests complete failover, it's replica forests will take over.   The following will be the forest state layout when this happens

Host A Host B Host C
forest-1p (disabled) forest-2p (open) forest-3p (open)
forest-3r (disabled) forest-1r (open) forest-2r (sync replicating)

Backup Examples: 

When 'Include replica Forests' is false and 'Journal Archiving' is true

Forest 1p is disabled, and the corresponding replica forest-1r is now Open because of the failover.  In this case a backup task will not succeed during this time because replica forests have not been configured for backups. The following 'Warning' level message will be logged:

Warning: Not backing up database backup-test because first forest master forest-1p is not available, and replica backups aren't enabled

When Host-A is brought up again, the forest states will be

forest-1p - sync replicating
forest-1r - open

At this time, backups will succeed and because journal archiving is enabled, journals will be written to the backup data.

However, you will not be able to do a "point in time restore' using journal archiving. When the configured master is not the acting master and backup is not enabled for replicas, the following error occurs when a restore to a point in time is attempted :

Operation failed with error message: xdmp:database-restore((xs:unsignedLong("5138652658926200166"), "/space/20160927-1125008228810", xs:dateTime("2016-09-27T11:06:21-07:00"), fn:true(), ()) -- Unable to restore replica forest forest-1r because the master forest forest-1p is not also restored, or is not acting master. Check server logs.

To get past this, the forests need to be failed back in order to make the 'configured master' same as the 'acting master'

When 'Include replica forests' is true and 'Journal Archiving' is true

In this case, backups will succeed when forests are failed over to their replica forests because replica forests are configured for backups. And, because journal archiving is enabled, journals will be also written to the backup data.

Even in this case, point in time restore will not work similar to the previous case, until the forests are failed back.

Related documentation

MarkLogic Administrator's Guide: Backing up and Restoring a Database Following Local Disk Failover 

MarkLogic Administrator's Guide: Restoring Databases with Journal Archiving

MarkLogic Knowledgebase Article: Understanding the role of journals in relation to backup and restore journal archiving

MarkLogic Knowledgebase Article: Database backup / restore and local disk failover

Before executing significant operational procedures on production systems, such as

  • Production Go Live events;
  • Major version Upgrades;
  • Adding/removing nodes to a cluster;
  • Deploying a new application or an application upgrade;
  • ...

MarkLogic recommends:

  • Thorough testing of any operational procedures on non production systems.
  • Opening a ticket with MarkLogic Technical Support to give them a heads up, along with any useful collateral that would help expedite diagnostics of issues if any occur, such as
    • The finalized plan & timeline or schedule of the operational procedure
    • support dump, taken before the operational procedure, in order to record the configuration of the system ahead of time; This may come in handy if an incident occurs as we may want to know the actual changes that had been made. You can create a MarkLogic Server support dump from our Admin UI by selecting the 'Support' tab; select scope=cluster, detail=status only, destination=browser -> save output to disk. Attach the support dump to the ticket as a file either as an email attachment or uploading through our support portal. 
    • A few days of error logs from before the operational event so that we can determine whether artifacts in the error logs are new or whether they existed prior to the event.
    • You can alternatively turn Telemetry on before the event and force an upload of the support dump & error logs.
    • Any architecture or design details of the system that you are able to share.
  • Please make sure that all individuals who are responsible for the event and who may need to contact the MarkLogic Technical Support team are registered MarkLogic Support contacts. They can register for an account per instructions available at https://help.marklogic.com/marklogic/AccountRequest.  They will want to register before the event as ONLY registered support contacts can create a ticket with MarkLogic Technical Support. We do not want registration and entitlement verification to get in the way of the ability to work on an urgent production issue.
  • Review the MarkLogic Support Handbook - http://www.marklogic.com/files/Mark_Logic_Support_Handbook.pdf. The following sections in the "HOW TO RECEIVE SUPPORT SERVICES" chapter of the handbook are useful to be acquainted with before an incident occurs
    • Section: What to do Prior to Logging a Service Request 
    • Section: Working with Support
    • Section: Escallation Process
    • Section: Understanding Case Priority and Response Time Targets
  • For urgent issues (production outages), remember that you can raise an urgent incident per the instructions in the support handbook; MarkLogic takes urgent incidents seriously, as every urgent issue results in a text message being sent to every support engineer, engineering management and the senior executive at MarkLogic. 
  • Enable Debug level logging so that any issues that arise can be more easily diagnosed.  Debug level logging does not have any noticeable impact on system performance.

Summary

In some cases it is required to change the default environment variables of a MarkLogic Server installation or configuration

Making Changes to Defaults

When changes to the default configurations need to be made, we recommend using /etc/marklogic.conf to make those changes. The file will not exist in a default installation, and should be manually created. We recommend the file only contain the variables that are being changed or added. This file will also be unaffected by MarkLogic upgrades.

Note: We do not recommend making changes to /etc/sysconfig/MarkLogic, as this file is part of the MarkLogic installation package, and it may be replaced or changed during a MarkLogic upgrade with no notification. Any direct file customizations will be overwritten and lost, which can result in various problems when the MarkLogic service is restarted.

During startup, MarkLogic will first source it's own environment variable file, and then it will source /etc/marklogic.conf, which ensures the locally defined variables take precedence.

Changing the Default Data Directory

A common use of the /etc/marklogic.conf file is to change the default data directory (/var/opt/MarkLogic).

export MARKLOGIC_DATA_DIRECTORY = "/my/custom/path/MarkLogic"

If that file exists when the server is first initialized, then MarkLogic will run from the custom location. If MarkLogic has already been initialized, then you may need to stop the service and manually move /var/opt/MarkLogic to your custom location.

Using the MarkLogic AMI

When using the MarkLogic AMI, without using the MarkLogic Cloud Formation template, it is necessary to create /etc/marklogic.conf to disable the Managed Cluster feature.

export MARKLOGIC_MANAGED_NODE = 0

If this is done after the instance is launched, then you may encounter the issue mentioned in the KB SVC_SOCHN Warning During Start Up on AWS.

Common Configurable Variables

  • MARKLOGIC_INSTALL_DIR - Where the MarkLogic binaries are installed
  • MARKLOGIC_DATA_DIR - Where MarkLogic stores configurations and forest data
  • MARKLOGIC_EC2_HOST - Whether MarkLogic will utilize EC2 specific features and settings
  • MARKLOGIC_AZURE_HOST - Whether MarkLogic will utilize Azure specific features and settings
  • MARKLOGIC_MANAGED_NODE - Whether MarkLogic will utilize the Managed Cluster feature
  • MARKLOGIC_USER - User that MarkLogic runs as
  • MARKLOGIC_HOSTNAME - Manually set the MarkLogic host name. Must be set prior to initialization or the hostname from the OS will be used
  • TZ - Allows for MarkLogic to operate with a different time zone setting than the OS

Further reading

Best Practice for Adding an Index in Production

Summary

It is sometimes necessary to remove or add an index to your production cluster. For a large database with more than a few GB of content, the resulting workload from reindexing your database can be a time and resource intensive process, that can affect query performance while the server is reindexing. This article points out some strategies for avoiding some of the pain-points associated with changing your database configuration on a production cluster.

Preparing your Server for Production

In general, high performance production search implementations run with tight controls on the automatic features of MarkLogic Server. 

  • Re-indexer disabled by default
  • Format-compatibility set to the latest format
  • Index-detection set to none.
  • On a very large cluster (several dozen or more hosts), consider running with expunge-locks set to none
  • On large clusters with insufficient resources, consider bumping up the default group settings
    • xdqp-timeout: from 10 to 30
    • host-timeout: from 30 to 90

The xdqp and host timeouts will prevent the server from disconnecting prematurely when a data-node is busy, possibly triggering a false failover event. However, these changes will affect the legitimate time to failover in an HA configuration. 

Preparing to Re-index

When an index configuration must be changed in production, you should:

  • First, index-detection should is set back to automatic
  • Then, the index configuration change should be made

When you have Database Replication Configured:

If you have to add or modify indexes on a database which has database replication configured, make sure the same changes are made on the Replica cluster as  well. Starting with ML server version 9.0-7, index data is also replicated from the Master to the Replica, but it does not automatically check if both sides have the same index settings. Reindexing is disabled by default on a replica cluster. However, when database replication configuration is removed (such as after a disaster),  the replica database will reindex as necessary. So it is important that the Replica database index configuration matches the Master’s to avoid unnecessary reindexing.

Note: If you are on a version prior to 9.0-7 - When adding/updating index settings, it is recommended that you update the settings on the Replica database before updating those on the Master database; this is because changes to the index settings on the Replica database only affect newly replicated documents and will not trigger reindexing on existing documents.

Further reading -

Master and Replica Database Index Settings

Database Replication - Indexing on Replica Explained

  • Finally, the reindexer should be enabled during off-hours to reindex the content.

Reindexing works by reloading all the Uris that are affected by the index change, this process tends to create lots of new/deleted fragments which then need to be merged. Given that reindexing is very CPU and disk I/O intensive, the re-indexer-throttle can be set to 3 or 2 to minimize impact of the reindex.

After the Re-index

After the re-index has completed, it is important to return to the old settings by disabling the reindexer and setting index-detection back to none.

If you're reindexing over several nights or weekends, be sure to allow some time for the merging to complete. So for example, if your regular busy time starts at 5AM, you may want to disable the reindexer at around midnight to make sure all your merging is completed before business hours.

By following the above recommendations, you should be able to complete a large re-index without any disruption to your production environment.

Summary

MarkLogic Server can ingest and query all sorts of data such as XMLtextJSON, binary, generic, etc. There are some things to consider when choosing to simply load data "as-is" vs. doing some degree of data modeling or data transformation prior to ingestion.

Details

Loading data "as-is" can minimize time and complexity during ingest or document creation. That can, however, sometimes mean more complex, slower performing queries. It may also mean more storage space intensive indexing settings.

In contrast, doing some degree of data transformation prior to ingestion can sometimes result in dramatic improvements in query performance and storage space utilization due to reduced indexing requirments.

An Example

An simple example will demonstrate the how a data model can affect performance. Consider the data model used by Apple's iTunes:

<plist version="1.0">
<dict>
  <key>Major Version</key><integer>10</integer>
  <key>Minor Version</key><integer>1</integer>
  <key>Application Version</key><string>10.1.1</string>
  <key>Show Content Ratings</key><true/>
  <dict>
    <key>Track ID</key><integer>290</integer>
    <key>Name</key><string>01-03 Good News</string>
          …
  </dict>
</dict>
 

Note the multiple <key> sibling elements, at multiple levels - where both levels are named the same thing (in this case, <dict>). Let's say you wanted to query a document like this for "Application Version." In this case, time will be spent performing index resolution for the encompassing element (here, <key>). Unfortunately, because there are multiple sibling elements all sharing the same element name, all of those sibling elements will need to be retrieved and then evaluated to see which of them actually match the given query criteria. Consider a slightly revised data model, instead:

 

<iTunesLibrary version="1.0">
<application>
  <major-version>10</major-version>
  <minor-version>1</minor-version>
  <app-version>10.1.1</app-version>
  <show-content-ratings>true</show-content-ratings>
  <tracks>
    <track-id>290</track-id>
    <name>01-03 Good News</name>
          …
  </tracks>
</application>

Here, we only need to query and therefore retrieve and evaluate the single <app-version> element, instead of multiple retreivals/evaluations as in the previous example data model.  

At Scale

Although this is a simple example, when processing millions or even billions of records, eliminating small processing steps could have significant performance impact.

BEST PRACTICES FOR EXPORTING AND IMPORTING DATA IN BULK

Handling large amounts of data can be expensive in terms of both computing resources and runtime. It can also sometimes result in application errors or partial execution. In general, if you’re dealing with large amounts of data as either output or input, the most scalable and robust approach is to break-up that workload into a series of smaller and more manageable batches.

Of course there are other available tactics. It should be noted, however, that most of those other tactics will have serious disadvantages compared to batching. For example:

  • Configuring time limit settings through Admin UI to allow for longer request timeouts - since you can only increase timeouts so much, this is best considered a short term tactic for only slightly larger workloads.
  • Eliminating resource bottlenecks by adding more resources – often easier to implement compared to modifying application code, though with the downside of additional hardware and software license expense. Like increased timeouts, there can be a point of diminishing returns when throwing hardware at a problem.
  • Tuning queries to improve your query efficiency – this is actually a very good tactic to pursue, in general. However, if workloads are sufficiently large, even the most efficient implementation of your request will eventually need to work over subset batches of your inputs or outputs.

For more detail on the above non-batching options, please refer to XDMP-CANCELED vs. XDMP-EXTIME.

WAYS TO EXPORT LARGE AMOUNTS OF DATA FROM MARKLOGIC SERVER

1.    If you can’t break-up the data into a series of smaller batches - use xdmp:save to write out the full results from query console to the desired folder, specified by the path on your file system. For details, see xdmp:save.

2.    If you can break-up the data into a series of smaller batches:

            a.    Use batch tools like MLCP, which can export bulk output from MarkLogic server to flat files, a compressed ZIP file, or an MLCP database archive. For details, see Exporting Content from MarkLogic Server.

            b.    Reduce the size of the desired result set until it saves successfully, then save the full output in a series of batches.

            c.    Page through result set:

                               i.     If dealing with documents, cts:uris is excellent for paging through a list of URIs. Take a look at cts:uris for more details.

                               ii.     If using Semantics

                                             1.    Consider exporting the triples from the database using the Semantics REST endpoints.

                                             2.    Take a look at the URL parameters start? and pageLength? – these parameters can be configured in your SPARQL query to return the results in batches.  See GET /v1/graphs/sparql for further details.

WAYS TO IMPORT LARGE AMOUNTS OF DATA INTO MARKLOGIC SERVER

1.    If you’re looking to update more than a few thousand fragments at a time, you'll definitely want to use some sort of batching.

             a.     For example, you could run a script in batches of say, 2000 fragments, by doing something like [1 to 2000], and filtering out fragments that already have your newly added element. You could also look into using batch tools like MLCP

             b.    Alternatively, you could split your input into smaller batches, then spawn each of those batches to jobs on the Task Server, which has a configurable queue. See:

                            i.     xdmp:spawn

                            ii.    xdmp:spawn-function

2.    Alternatively, you could use an external/community developed tool like CoRB to batch process your content. See Using Corb to Batch Process Your Content - A Getting Started Guide

3.    If using Semantics and querying triples with SPARQL:

              a.    You can make use of the LIMIT keyword to further restrict the result set size of your SPARQL query. See The LIMIT Keyword

              b.    You can also use the OFFSET keyword for pagination. This keyword can be used with the LIMIT and ORDER BY keywords to retrieve different slices of data from a dataset. For example, you can create pages of results with different offsets. See  The OFFSET Keyword

Introduction

This article outlines various factors influencing the performance of xdmp:collection-delete function and furthermore provides general best practices for improving the performance of large collection deletes.

What are collections?

Collections in MarkLogic Server are used to organize documents in a database. Collections are a powerful and high-performance mechanism to define and manage subsets of documents.

How are collections different from directories?

Although both collections and directories can be used for organizing documents in a database, there are some key differences. For example:

  • Directories are hierarchical, where as collections are not. Consequently, collections do not require member documents to conform to any URI patterns. Additionally, any document can belong to any collection, and any document can also belong to multiple collections
  • You can delete all documents in a collection with the xdmp:collection-delete function. Similarly, you can delete all documents in a directory (as well as all recursive subdirectories and any documents in those directories) with a different function call - xdmp:directory-delete
  • You can set properties on a directory. You cannot set properties on a collection

For further details, see Collections versus Directories.

What is the use of the xdmp:collection-delete function?

xdmp:collection-delete is used to delete all documents in a database that belong to a given collection - regardless of their membership in other collections.

  • Use of this function always results in the specified unprotected collection disappearing. For details, see Implicitly Defining Unprotected Collections
  • Removing a document from a collection and using xdmp:collection-delete are similarly contingent on users having appropriate permissions to update the document(s) in question. For details, see Collections and Security
  • If there are no documents in the specified collection, then nothing is deleted, and the function still returns the empty sequence

What factors affect performance of xdmp:collection-delete?

The speed of xdmp:collection-delete depends on several factors:

Is there a fast operation mode available within the call xdmp:collection-delete?

Yes. The call xdmp:collection-delete("collection-uri") can potentially be fast in that it won't retrieve fragments. Be aware, however, that xdmp:collection-delete will retrieve fragments (and therefore perform much more slowly) when your database is configured with any of the following:

What are the general best practices in order to improve the performance of large collection deletes?

  • Batch your deletes
    • You could use an external/community developed tool like CoRB to batch process your content
    • Tools like CoRB allow you to create a "query module" (this could be a call to cts:uris to identify documents from a number of collections) and a "transform module" that works on each URI returned. CoRB will run the URI query and will use the results to feed a thread pool of worker threads. This can be very useful when dealing with large bulk processing. See: Using Corb to Batch Process Your Content - A Getting Started Guide
  • Alternatively, you could split your input (for example, URIs of documents inside a collection that you want to delete) into smaller batches
    • Spawn each of those batches to jobs on the Task Server instead of trying to delete an entire collection in a single transaction
    • Use xdmp:spawn-function to kick off deletions of one document at a time - be careful not to overflow the task server queue, however
      • Don't spawn single document deletes
      • Instead, make batches of size that work most efficiently in your specific use case
    • One of the restrictions on the Task Server is that there is a set queue size - you should be able increase the queue size as necessary
  • Scope deletes more narrowly with the use of cts:collection-query

Related knowledgebase articles:

 

Introduction

MarkLogic Server delivers performance at scale, whether we're talking about large amounts of data, users, or parallel requests. However, people do run into performance issues from time to time. Most of those performance issues can be found ahead of time via well-constructed and well-executed load testing and resource provisioning.

There are three main aspects to load testing against and resource provisioning for MarkLogic:

  1. Building your load testing suite
  2. Examining your load testing results
  3. Addressing hot spots

Building your load testing suite

The biggest issue we see with problematic load testing suites is unrepresentative load. The inaccuracy can be in the form of missing requests, missing query inputs, unanticipated query inputs, unanticipated or underestimated data growth rates, or even a population of requests that's skewes towards different load profiles compared to production traffic. For example - a given load test might heavily exercise query performance, only to find in production that ingest requests represent the majority of traffic. Alternatively, perhaps one kind of query represents the bulk of a given load test, when in reality that kind of query is dwarfed by the number of invocations of a different kind of query.

Ultimately, to be useful, a given load test needs to be representative of production traffic. Unfortunately, the less representative a load test is, the less useful it will be.

Examining your load testing results

Beginning with version 7.0, MarkLogic Server ships a Monitoring History dashboard, visible from any host in your cluster at port 8002/history. The Monitoring History dashboard will illustrate usage of resources such as CPU, RAM, disk I/O, etc... both at the cluster and individual host levels. The Monitoring History dashboard will also illustrate the occurance of read and write locks over time. It's important to get a handle on both resource and lock usage in the course of your load test as both will limit the performance of your application - but the way to address those performance issues depends on which class of usage is most prevalent.

Addressing hot spots

By having a representative load test and closely examining your load testing results, you'll likely find hot spots or slow performing parts of your application. MarkLogic Server's Monitoring History allows you to correlate resource and lock usage over time against the workload being submitted by your load tests. Once you find a hot spot, it's worthwhile examining it more closely by either running those requests in isolation, or at larger scales. For example, you could run 4x and 16x the number of parallel requests, or 4x and 16x the number of inputs to an individual request - both of which will give you an idea of how the suspect requests scale in response to increased load.

Once you've found a hot spot - what should you do about it? Well, that ultimately depends on the kind of usage you're seeing in your cluster's Monitoring History. If it's clear that your suspect requests are running into a resource bound (for example, 100% utilization of CPU/RAM/disk I/O/etc.), then you'll either need to provision more of that limiting resource (either through more machines, or more powerful machines, or both), or reduce the amount of load on the system provisioned as-is. It may also be possible to re-architect the suspect request to be more efficient with regard to its resource usage.

Alternatively you may find that your system is not, in fact, seeing a resource bound - where it appears there are plenty of spare CPU cycles/free RAM/low amounts of disk I/O/etc. If you're seeing poor performance in that situation, it's almost always the case that you'll instead see large spikes in the number of read/write locks taken as the your suspect requests work through the system. Provisioning more hardware resources may help to some small degree in the presence of read/write locks, but what really needs to happen is the requests need to be re-architected to use as few locks as possible, and preferrably to run completely lock free.

 

 

 

Introduction

While there are many different ways to define schemas in MarkLogic Server, one should be aware of both the location strategy the server will use (defined here: http://docs.marklogic.com/guide/admin/schemas), as well as the different locations in which your particular schema may reside.

Schema Location

Schemas can reside in either the Schemas database defined for your content database, or within the server's Config directory.  If there is no explicit schema map defined, the server will use the following schema location strategy:

1) If the XQuery program explicitly references a schema for the namespace in question, MarkLogic Server uses this reference.
2) Otherwise, MarkLogic Server searches the schema database for an XML schema document whose target namespace is the same as the namespace of the element that MarkLogic Server is trying to type.
3) If no matching schema document is found in the database, MarkLogic Server looks in its Config directory for a matching schema document.
4) If no matching schema document is found in the Config directory, no schema is found.

There can sometimes be issues with step #2 when there are multiple schema documents in the schema database whose target namespace matches the namespace of the element that MarkLogic Server is trying to type. In that situation, it would be best to explicitly define a default schema mapping - schema maps can be defined through the the Admin API or the Admin User Interface. Be aware that you can define schema mappings at both the group level (in which case the mapping would then apply to all application servers in the group) or at the individual application server level.

Best Practices

Now that we know how the server locates schemas and where schema can potentially reside - what are there best practices?

In general, it's best to localize your schema impacts as narrowly as possible. For example, instead of using a single Schemas database or the server's one and only Config directory, it would instead be better to define a specific Schemas database that would be used for the relevant content database. Similarly, unless you know you need a defined schema mapping to apply to every application server in a group, it would instead be better to define your schema mappings at the application server level as opposed to the group level.

Summary

Although not exhaustive, this article lists some best practices for the use of MarkLogic Server and Amazon's VPC

Details

  1. Nodes within a MarkLogic cluster need to communicate with one another directly, without the presence of a load balancer in-between them.
  2. Whether in the context of a VPC or not, before attempting to join a node to a cluster, one should verify whether each node is able to ping or to ssh from the one node to the other (or vice versa). If you're not able to ping or ssh from one machine to another, then issues seen during a MarkLogic cluster join is very likely to be localized to the network configuration and should be diagnosed at the network layer.
  3. The following items should be double-checked when using VPCs:
    1. If a private subnet is used for any MarkLogic instance, that subnet needs access to the public internet for the following situations:
      1. If Managed Cluster support is used, MarkLogic requires access to AWS services which require outbound connectivity to the internet (at minimum to the AWS service web sites).
      2. If foreign clusters are used then MarkLogic needs to connect to all hosts in the foreign cluster
      3. If Amazon S3 is used then MarkLogic needs to communicate with the S3 public web services.
    2. It is a assumed that the creator of the VPC has properly configured all subnets which MarkLogic needs to be installed to have outbound internet. There are many ways that private subnets can be configured to communicate outbound to the public internet. NAT instances are one example [AWS VPC NAT]. Another option is using DirectConnect to route outbound traffic through the organization's internet connection.
    3. All subnets which host instances running MarkLogic in the same cluster need to be able to communicate via port 7999.
    4. Inbound ssh connectivity is required for command line administration of each server requiring port 22 to be accessible from either a VPN or a public subnet.
    5. With regard to application traffic (as opposed to intra-cluster traffic as seen during cluster joining) connectivity to the MarkLogic server(s) needs to be open to whatever applications for which it is required. Application traffic can be sent through an internal or external load balancer, a VPN, direct access from applications in the same subnet or routing through another subnet.

Introduction

This knowledgebase article contains critical tips and best practices you'll need to know to best use MarkLogic Server with your favorite BI Tools.

BI Tool Q&A

Q: What's a TDE? Is that a Tableau Data Extract?

A: In MarkLogic terms, TDE stands for Template Driven Extraction. A template is a document (XML or JSON) that declares how a view is to be populated. It defines a context -- the root path of all the documents that are involved in this view -- then, for each column in the view, it defines a column name, type, and a path to the data inside the document. You can define the value of a column using several pieces of data in the document, plus some functions, even some programming operations such as IF. For example, if your documents have the "last-updated" year and month and day in different parts of the document, your Template can pull in those three pieces, concatenate them, then cast the result as a date.

Q: When modifying TDEs, do I need to reindex?

A: TDEs map an SQL-like view on top of MarkLogic. If you change an existing view, you do need to reindex the database. Before kicking off a resource- and time- intensive reindex, however, be aware that there are some TDE configurations that cannot be updated. You can read more about exactly which kinds of TDEs may or may not be updated at the following knowledgebase article: Updating a TDE View.

Q: Can MarkLogic handle queries that require a large number of columns?

A: Yes, but you'll want to pay attention to potential performance impacts. In general, it's much better to spread a large number of columns across multiple TDEs, instead of having a single TDE containing all those same columns. Data modeling is also important here - TDEs should be meaningful with regard to their intended use. Definitely check out MLU's Data Modeling Series, in particular Progressive Transformation using the Envelope Pattern and Impact of Normalization: Lessons Learned.

Q: What are some common patterns and antipatterns for good performance with BI tools?

A: First, avoid using Nullable columns in filters and drilldowns. There are optimizations in MarkLogic Server's SQL engine to detect patterns with "null" - but different BI tool generate their code in different ways and can sometimes result in code that circumvents those optimizations. In general, if performance is a priority, it's usually better to use an actual value such as "N/A" or "0".

Second, enable Query Reduction or similar options in your BI tool of choice. Without this option, if you choose to filter on a year - say "2018" - and then also select "2019", multiple SQL queries will be sent to MarkLogic in quick succession unnecessarily.

Q: What do I need to watch out for when connecting my BI tool to MarkLogic?

A: If performance is a priority, exercise caution when using joins. In general, the best practice is to create collections of data in MarkLogic that represent the subsets of data needed externally as closely as possible. You can learn more about what tools are available to see how many and what kind of joins are being used by your query in the What debugging tools are available for Optic, SQL, or SPARQL code in MarkLogic Server? knowledgebase article, and you can learn more about how to create more meaningful data models and subsets of your data models in the aforementioned MLU's Data Modeling Series, as well as in the MarkLogic World presentation Getting the Most from MarkLogic Semantics (also available in video form).

References

Introduction

If you're looking to use any of the interfaces built on top of MarkLogic's semantics engine (Optic API, SQL, or SPARQL) - you'll want to make sure you're using the best practices itemized in this knowledgebase article. It's not unusual to see one or even two orders of magnitude performance improvements, as a result. Note that this article is really just a distillation of the MarkLogic World presentation "Getting the Most from MarkLogic Semantics" - available in both pdf and YouTube formats.

Best Practices for Using Semantics at Scale

1) Scope your query - more constrained queries will do less work, and will therefore take less time

  • Trim resultsets early
  • Partition
    • Query partitions or subsets of your data, instead of your entire database
    • Define partitions with Collections
    • Make use of your partitions with collection queries
    • Use cts:query to partition even further
  • Keep like-triples in the same document
  • Use MarkLogic indexes to scope a query
    • Collection query (or SPARQL FROM) to partition the RDF space
    • Put ontologies and other lookup/mapping triples into their own graphs/collections
    • Consider pushing-down some SPARQL FILTERs to the document

2) Pay attention to your data model

3) Resultset size specific tips

  • For small resultsets – from SPARQL, get the docs with a search
  • For large resultsets
    • Get docs in a single read, no joins
    • Large result sets may incur connection churning overhead – paginate large resultsets to ensure connection reuse

4) Hardware tips

  • Add more memory - allows the optimizer to choose faster plans
  • Add more hardware - allows for increased parallelization

5) Avoid unnecessary work

  • Re-use queries with bind variable - query plan is cached for 5 minutes
  • Dedup processing
    • De-duplication has no effect on results if you have no duplicate triples and/or you use DISTINCT
    • Skipping dedup processing can result in substantial performance improvements

Introduction

Problems can occur when trying to explicitly search (or not search) parts of documents when using a global configuration approach to include and exclude elements.

Global Approach

Including and excluding elements in a document using a global configuration approach can lead to unexpected results that are complex to diagnose.  The global approach will require positions to be enabled in your index settings, expanding the disk space requirements of your indexes and may result in greater processing time of your position dependent queries.  It may also require adjustments to your data model to avoid unintended includes or excludes; and may require changes to your queries in order to limit the the number of positions used.

If circumstances dictate that you must instead use the less preferred global configuration approach, you can read more about including/excluding elements in word queries here: http://docs.marklogic.com/guide/admin/wordquery#id_77008

Recommended Approach

In general, it's better to define specific fields, which are a mechanism designed to restrict your query to portions of documents based on elements. You can read more about fields here: http://docs.marklogic.com/guide/admin/fields

 

 

Introduction

Backing up multiple databases simultaneously may make some of the backups fail with error XDMP-FORESTOPIN.

 

Details

While configuring a scheduled backup, one can also select to backup the associated auxiliary databases like security, schemas, triggers. Generally, all the content databases share these auxiliary databases so issue may arise when more than one scheduled backup tries to backup the same auxiliary database. When two backups try to backup the same auxiliary database, the backup will fail throwing XDMP-FORESTOPIN error. Generally this error comes when the system attempts to start one forest operation (backup, restore, remove, clear, etc.) while another, exclusive operation is already in progress. For example, starting a new backup while a previous backup is still in progress.

 

Recommendations

One should be extra cautious while configuring scheduled backups and selecting auxiliary databases with them. If one really wants to backup the auxiliary databases with the content database then one needs to pay special attention to the timing and ensure that no two backups pose this timing threat.

As most of the applications don't make frequent changes to their auxiliary databases hence MarkLogic recommends to schedule backup for them separately - instead of selecting them together with the content databases.

Introduction

In MarkLogic 8, support for native JSON and server side JavaScript was introduced.  We discuss how this affects the support for XML and XQuery in MarkLogic 8.

Details

In MarkLogic 8, you can absolutely use XML and XQuery. XML and XQuery remain central to MarkLogic Server now and into the future. JavaScript and JSON are complementary to XQuery and XML. In fact, you can even work with XML from JavaScript or JSON from XQuery.  This allows you to mix and match within an application—or even within an individual query—in order to use the best tool for the job.

See also:

Server-side JavaScript and JSON vs XQuery and XML in MarkLogic Server

XQuery and JavaScript interoperability

Introduction

Sometimes you may find that there are one or more tasks that are taking too long to complete or are hogging too many server resources, and you would like to remove them from the Task Server.  This article presents a way to cancel active tasks in the Task Server.

Details

To cancel active tasks in the Task Server, you can browse to the Admin UI, navigate to the Status tab of the Group's Task Server, and cancel the tasks. However, this may get tedious if there are many tasks to be terminated.

As an alternative, you can use the server monitoring built-ins to programmatically find and cancel the tasks. The documentation for the MarkLogic Server API contains includes information for all the builtin functions you will need (refer to http://docs.marklogic.com/xdmp/server-monitoring).

Sample Script

Here is a sample script that removes the task based on the path to the module that is being executed:

let $host-id := xdmp:host()
let $host-task-server-id := xdmp:host-status($host-id)//*:task-server/*:task-server-id/text()
let $task-server-status := xdmp:server-status($host-id,$host-task-server-id)
let $task-server-requests := $task-server-status/*:request-statuses
let $scheduled-task-request := $task-server-requests/*:request-status[*:request-text = "/PATH/TO/SCHEDULED/TASK/MODULE.XQY"]/*:request-id/text()
return
   xdmp:request-cancel($host-id,$host-task-server-id,$scheduled-task-request)

Summary

MarkLogic stores all signed Certificates, private keys, and Certificate Authority Certificates inside the Security Database. The Security Database also stores Users, Passwords, Roles, Privileges, and many other Authentication related configurations. While setting up DR Cluster, many Administrators prefers to Replicate the Security Database to a DR (Disaster Recovery) cluster to avoid re-configuring DR cluster with Same User/Role/Privileges etc. 

Security Database Replication presents design challenges and issues while Accessing Application Servers on the DR cluster.

  • Certificates installed on the Master Cluster Security Database will get replicated to the DR cluster Security Database; However those Replicated Certificates are not useful to the DR Cluster, since Signed Certificates are typically tied to a single host (though exceptions include SAN and Wild Card Certificates).  
  • At the same time, since replicated databases are read-only, we are not able to install a new Signed Certificates on the DR Cluster as the replicated Security Database is read-only.

This article discusses the different aspect of the above problem and provides a solution.

Configuration: Security Database replicated to DR Cluster

For article discussion purpose, we will consider a 3 node Master cluster coupled to a 3 node DR cluster, where the Security DB is replicated from Master to DR Cluster. We will also have an Application Server configured attached to "DemoTemp1" Template in Master cluster. 

       Master_Cluster_Hosts.png         DR_Cluster_Hosts.png

Issues in DR Cluster.

Certificate Authentication based on CN field 

When client browsers connect to the application server using HTTPS, they check to make sure your SSL Certificate matches the host name in the address bar. There are three ways for browsers to find a match:

  1.    The host name (in the address bar) exactly matches the Common Name (CN) in the certificate's Subject.
  2.    The host name matches a Wildcard Common Name. For example, www.example.com matches the common name *.example.com.
  3.    The host name is listed in the Subject Alternative Name field.

The most common form of SSL name matching is the first option -  SSL client compares server name to the Common Name in the server's certificate. 

Since Temporary Signed Certificates have CN field of Master Cluster nodes, the Application Server on the DR Cluster will fail when used with the MarkLogic generated Temporary Signed Certificate.

Certificate Requests

When we attach Template on DR Cluster to any application server and generate a certificate request, MarkLogic Server will generates a Temporary Signed Certificate for all the nodes in Cluster in the Application Server Group.

Master_Cert_Template_Status.png    DR_Cert_Template_Status_1.png

To install Certificate Signed by 3rd party, replacing temporary Signed Certificate, we will need to generate a certificate requests. You can generate a certificate requests in MarkLogic for All nodes using the Request button under "Needed Certificate Request" on Certificate Template "Status" tab.

  • On the Master cluster, MarkLogic will generate 3 Certificate requests with CN field matching for each of 3 nodes. All 3 new Certificate Request are internally stored in the Security Database.
  • On the DR Cluster, Clicking Certificate Request will result in an ERROR, since the DR Cluster has a replicated Security Database that is in a Read-Only ("open replica") state i.e. security database updates arel not allowed.

Pending Certificate Requests

Each Certificate request are intended for specific individual nodes, as Certificate request originator will incorporate client FDQN into Certificate CN field while request generation. MarkLogic Server will use the hostname (which in most cases matches your FDQN) as the CN field value in the Certificate Request.

Certificate request generated on Master Cluster are stored in Security Database, which will get replicated to DR Cluster Security Database (as/when Security DB replication is configured); However Certificate requests generated on Master Cluster are not relevant to DR Cluster as they have Master Cluster nodes FQDN as CN Fields in them.

Master_Cert_Template_Status_Post_Request.png    DR_Cert_Template_Status_Post_Request.png

Solution

To install Signed Certificates intended for the DR Cluster, where Certificate CN field matches the FQDN of DR Cluster, we will need to install the DR cluster's Signed Certificates on the Master Cluster.  That certificate will then be replicated to the DR Cluster through the normal database replication of the Security database. 

Step 1. Generate Certificate Request (intended for DR nodes).

You would generate Certificate request using XQuery on QConsole against the Security database on the Master cluster itself, but the values used in your XQuery will be for DR/Replica Cluster nodes FQDN. For example, for the first node in DR Cluster "engrlab-130-026.engrlab.marklogic.com, you would run below Query from Query Console on any Node on Master Cluster against Security Database. We will change the FQDN value to each node and run Query total 3 times.

xquery version "1.0-ml"; 
import module namespace pki = "http://marklogic.com/xdmp/pki" at "/MarkLogic/pki.xqy";
pki:generate-certificate-request(
      pki:template-get-id(
           pki:get-template-by-name("DemoTemp1")),
                                    "engrlab-130-026.engrlab.marklogic.com",
                                    "engrlab-130-026.engrlab.marklogic.com",
                                    ())

Step 2. Download Certificate Request and Get them Signed.

We should be able to see Certificate request pertaining to each nodes (for Master as well as DR Nodes) on Certificate Template status tab on Master Cluster GUI and DR Cluster GUI both. Download them and get them signed by the favorite Certificate Authority.

Master_Cert_Template_Status_QC_Request.png    DR_Cert_Template_Status_QC_Request.png

Step 3. Install All Signed Certificates (for Master + DR Nodes) on Master Cluster 

Install all Signed Certificates (including Cert intended for Replica Cluster) on Master Cluster Admin GUI Certificate Template Import tab. If we try to Install Certificates on DR/Replica cluster from Admin GUI, we will get XDMP-FORESTNOT --Forest Security not available: open replica Error. Our Application Server on the DR Cluster will find the appropriate Certificates for the node from the list of all Certificates. Below screenshot shows the status of Certificate Template from Master cluster as well as DR cluster (Both should be identical).

Master_Cert_Template_Status_Final.png    DR_Cert_Template_Status_Final.png

Step 4. Importing Pre-Signed Cert where Keys are generated outside of MarkLogic.

Please read "Import pre-signed Certificate and Key for MarkLogic HTTPS App Server" to import Certificate Req/Key generated outside of MarkLogic; For our purpose, we will need to import Certificates (and their respective Keys) for both Clusters (Master as well as DR/Replica) from the QConsole on Master Cluster itself.

Further Reading

Summary

Each node in MarkLogic Server Cluster has a hostname, a human-readable nickname corresponding to the network address of the device. MarkLogic retrieves the hostname from underlying operating system during installation. On Linux, we can retrieve platform hostname value by running "$ hostname" from a shell prompt. 

$ hostname

129-089.engrlab.marklogic.com

In most environments, hostname is the same as the platform's Fully-Qualified-Domain-Name (FQDN). However, there are scenarios where hostname could be different than the FQDN. On such environments you would use FQDN (engrlab-129-089.engrlab.marklogic.com) to connect to platform instead of hostname

$ ping engrlab-129-089.engrlab.marklogic.com

PING engrlab-129-089.engrlab.marklogic.com (172.18.129.89) 56(84) bytes of data.

64 bytes from engrlab-129-089.engrlab.marklogic.com (172.18.129.89): icmp_seq=1 ttl=64 time=0.011 ms

During Certificate Installation to Certificate template on environments where hostname and FDQN mismatch, MarkLogic looks for the CN field in the Installed Certificate to find a matching hostname in the cluster. However since CN field (reflecting FDQN) does not match the hostname known to MarkLogic, MarkLogic does not assign  the installed Certificate to any specific host in Cluster.

Subject: C=US, ST=NJ, L=Princeton, O=MarkLogic, OU=Eng, CN=engrlab-129-089.engrlab.marklogic.com

Installing Certificates in this scenario results in the installed Certificate not replacing the Temporary Certificate, and the Temporary Certificate will still be used with HTTPS App Server instead of the installed Certificates.

This article details different solutions to address this issue. 

Solution:

1) Host Name change

By default MarkLogic picks the hostname value presented by the underlying operating system. However we can always change the hostname string stored in MarkLogic Server after installation using Admin API admin:host-set-name ( http://docs.marklogic.com/admin:host-set-name )

Changing the hostname in MarkLogic (to reflect the FDQN name) will not affect the underlying Platform/OS hostname values, but will result in MarkLogic being able to find the correct host for the installed Certificate (CN field = hostname), and thus able to link then installed Certificate to specific host in Cluster.

2) XQuery code linking Installed Cert to specific Host

You can also use below XQuery code from QConsole against Security DB (as content source) to update Certificate xml files in Security DB, linking Installed Certifiate to Specific host.

Please change the Certificate Template-Name, and Host-Name in below XQuery to reflect values from your environment.

 

Also, note that above will not replace/overwrite the temporary Certificate, however our App Server will start using Installed Certificate from this point instead of Temporary Certificate. One can also delete the now unused Temporary Certificate file from QConsole without any negative effect.

3) Certificate with Subject Alternative Name (SAN Cert)

You can also request your IT (or Certificate issuer) to provide a Certificate with altSubjectName that matches MarkLogic's understanding of the host. MarkLogic, during the Installation of the Certificate, will look for Alternative names and link Certificate to correct host based on altSubjectName field.

 

Further Reading

 

Introduction: When you may need to change the state of forests

In most cases, all forests in your MarkLogic cluster will be configured to allow all (any) updates to be made.

If we consider running the following example in Query Console:

In the majority of cases, calling the above function should return "all", indicating that the forest is in a state to allow incoming queries to read data from the forest and to allow queries to update content (and to add new content) into that forest.

At any given time, a forest can be configured to be in one of four different states:

  • all
  • read-only
  • delete-only
  • flash-backup

You may want to change the state of the forests in a given database for several reasons

read-only
To run your application in maintenance mode where data can be read but no data on-disk can be changed
delete-only
In a situation where you are migrating data from a legacy database or removing data from a given forest
flash-backup
In a situation where you need to quiesce all forests in a given database for long enough to allow you to make a file level backup of the forest data.

Forest states explained

Sample state management module

Below is an example template for modifying the state of all forests in a given database:

Further reading

Forest States
http://docs.marklogic.com/guide/admin/forests#id_43487
Setting Forests to "read only"
http://docs.marklogic.com/guide/admin/forests#id_72520
Setting Forests to "delete only"
http://docs.marklogic.com/guide/admin/forests#id_20932

Introduction

This article discusses some of the issues you should think about when preparing to change the IP address' of a MarkLogic Server.

Detail: 

If the hostnames stay the same, then changing IP addresses should not have any adverse side effects since none of the default MarkLogic Server settings require an IP address.

Here are some caveats:

  1. Make sure there are no application servers that have an 'address' setting to an IP address that will no longer be accessible/exist after the change.
  2. Similarly, make sure there a no external (to MarkLogic Server) dependencies on the original IP addresses.
  3. Make sure you allow some time (on the order of minutes) for the routing tables to propagate across the DNS servers before bringing up MarkLogic Server.
  4. Make sure the hosts themselves are reachable via the standard Unix channels (ping, ssh, etc) before starting MarkLogic Server.
  5. Make sure you test this in a non-production environment box before you implement it in production.

Introduction

If you have an existing MarkLogic Server instance running on EC2, there may be circumstances where you need to change the size of available storage.

This article discusses approaches to ensure a safe increase in the amount of available storage for your EC2 instances without compromising MarkLogic data integrity.

This article assumes that you have started your cluster using the CloudFormation templates provided by MarkLogic.

The recommended method (I.) is to shut down the cluster, do the resize using snapshots and start again. If you wish to avoid downtime an alternative procedure (II.) using multiple volumes and rebalancing is described below.

In both procedures we are recommending a single, large EBS volume as opposed to multiple smaller ones because:

1. Larger EBS volumes have faster IO as described by the Amazon EBS Volume types at http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html

2. You have to keep enough spare capacity on every single volume to allow for merges.  MarkLogic disk space requirements are described in our Installation Guide.

I. Resizing using AWS snapshots

This is the recommended method. This procedure follows the same steps as official Amazon AWS documentation, but highlights MarkLogic specific steps. Please review AWS Documentation in detail before proceeding:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-expand-volume.html

1. Make sure that you have an up to date backup of your data and a working restore plan.

2. Stop the MarkLogic cluster by going to AWS Console -> CloudFormation -> Actions -> Update Stack

aws-update-stack.png

Click through the pages and leave all other settings intact, but change Nodes to and review and confirm updating the stack. This will stop the cluster.

This is also covered in Marklogic EC2 documentation:

https://docs.marklogic.com/guide/ec2/managing#id_59478

4. Create a snapshot of the volume to resize.

5. Create a new volume from the snapshot.

Ensure that the new volume is sufficiently large to cover MarkLogic disk space requirements (generally at least 1.5x of the planned total forest size).

6. Detach the old volume.

7. Attach the newly expanded volume.

Steps 4-7 are exactly as covered in AWS documentation and have no Marklogic specific parts.

8. Restart MarkLogic cluster, by going to AWS Console -> CloudFormation -> Actions -> Update Stack and changing Nodes to the original setting.

9. Connect to the machine using SSH and resize the logical partition to match the new size. This is covered in AWS documentation, the commands are:

- resize2fs for ext3 and 4

xfs_growfs for xfs

10. The new volume will have a different id. You need to update the CloudFormation template so that the data volumes are retained and remounted when the cluster or nodes are restarted. The easiest way is to use mlcmd shell script provided by Marklogic. Also using SSH, run the following:

/opt/MarkLogic/bin/mlcmd sync-volumes-to-mdb

This will synchronise the EBS volume id with the CloudFormation template.

At this point the procedure is complete and you can delete the old EBS volume and once you have verified that everything is working fine, also delete the snapshot created in step 4.

II. Resizing with no downtime, using MarkLogic Rebalancing

This method avoids cluster downtime but it is slightly more complicated than procedure 1 and rebalancing will take additional time and add load to the cluster during rebalancing. In most cases procedure 1 takes far less time to complete, however, the cluster is down for the duration. With this procedure the cluster can serve requests at all times.

This procedure follows the same steps as official Amazon AWS documentation where possible, but highlights MarkLogic specific steps. Please review AWS Documentation in detail before proceeding:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-expand-volume.html

The procedure is described in more detail in the MarkLogic Server on Amazon EC2 Guide at https://docs.marklogic.com/guide/ec2/managing#id_81403

1. Create a new volume.

Ensure that the new volume is sufficiently large to cover MarkLogic disk space requirements (generally at least 1.5x of the planned total forest size).

2. Attach the volume to the EC2 instance. Please take a note of the EC2 device mount point, for example /dev/sdg and see here where it maps to in Linux and in RedHat: https://docs.marklogic.com/guide/ec2/managing#id_17077

3. SSH into the instance and execute the /opt/MarkLogic/bin/mlcmd init-volumes-from-system command to create a filesystem for the volume and update the Metadata Database with the new volume configuration. The init-volumes-from-system command will output a detailed report of what it is doing. Note the mount directory of the volume from this report.

4. Once the volume is attached and mounted to the instance, log into the Administrator Interface on that host and create a forest or forests, specifying host name of the instance and the mount directory of the volume as the forest Data Directory. For details on how to create a forest, see Creating a Forest in the Administrator's Guide.

5. Once the status of the new forest is set to "open", attach the new forest(s) to the database and retire all the forest(s) on the old volume. If you only have 1 data volume then this includes forests for Schemas, Security, Triggers, Modules etc. It is possible to script this part using XQuery, JS or REST:

https://docs.marklogic.com/admin:forest-create

This will trigger rebalancing - database fragments will start to move to the new forests. This process will take several hours or days, depending on the size of data and the Admin UI will show you an estimate.

The Admin UI for this is covered here: https://docs.marklogic.com/guide/admin/forests#id_93728

and here is more information on rebalancing: https://docs.marklogic.com/guide/admin/database-rebalancing#id_87979

6. Once the old forest(s) have 0 fragments in them you can detach them and delete the old forest(s). The migration to a new volume is complete.

7. Optional removing of the old volume. If your original volume was data only, the original volume should be empty after this procedure and you can:

a) unmount the volume in Linux

b) delete the volume in AWS EC2 console

c) issue /opt/MarkLogic/bin/mlcmd sync-volumes-to-mdb. This will preserve the new volume mappings in the Cloud Formation template and the volumes will be preserved and remounted when nodes are restarted or even terminated.

Introduction

A common use case in many business applications is to find if an element exists in any document or not. This article provide ways to find such documents and explain points that should be taken care of while designing a solution.

 

Solution

In general, existence of an element in a document can checked by using below XQuery.

cts:element-query(xs:QName('myElement'),cts:and-query(()))

Note the empty cts:and-query construct here. An empty cts:and-query is used to fetch all fragments.

Hence running below search query will bring back all the documents having element "myElement".

 

Wrapping the query in cts:not-query will bring back all the documents *not* having element "myElement" 

 

As a search using cts:not-query is only guaranteed to be accurate if the underlying query that is being negated is accurate from its index resolution, hence to check existence of a specific XPath, we need to index that XPath.
e.g. if you want to find documents having /path/1/A (and not /path/2/A) then you can create a field index for path /path/1/A and then use it in your query instead.

 

Things to remember

1.) Have unique element name in a single document i.e. try not to use same element name at multiple places within a document if they have different meaning for your use case. Either give them different element names or put them under different namespaces to remove any ambiguity. e.g. if you have element "table" at two places in a single document then you can put them both under different namespaces such as html:table & furniture:table or you can name them differently such as html_table & furniture_table.

2.) If element names are unique within a document then you don't need to create additional indexes. If element names are not unique within a document and you are interested in only a specific XPath then create path(field) indexes on those XPaths and use the same in your not-query.

 

Introduction

MarkLogic Server has shipped with full support for the W3C XML Schema specification and schema validation capabilities since version 4.1 (released in 2009).

These features allow for the validation of complete XML documents or elements within documents against an existing XML Schema (or group of Schemas), whose purpose is to define the structure, content, and typing of elements within XML documents.

You can read more about the concepts behind XML Schemas and MarkLogic's support for schema based validation in our documentation:

https://docs.marklogic.com/guide/admin/schemas

Caching XML Schema data

In order to ensure the best possible performance at scale, all user created XML Schemas are cached in memory on each individual node within the cluster using a portion of that node's Expanded Tree Cache.

Best practices when making changes to pre-existing XML Schemas: clearing the Expanded Tree Cache

In some cases, when you are redeploying a revised XML Schema to an existing schema database, MarkLogic can sometimes refer to an older, cached version of the schema data associated with a given document.

Therefore, it's important to note that whenever you plan to deploy a new or revised version of a Schema that you maintain, as a best practice, it may be necessary to clear the cache in order to ensure that you have evicted all cached data stored for older versions of your schemas.

If you don't clear the cache, you may sometimes get references to the old, cached schema references and as result, you may get errors like:

XDMP-LEXVAL (...) Invalid lexical value

You can clear all data stored in the Expanded Tree Cache in two ways:

  1. By restarting MarkLogic service on every host in the cluster. This will automatically clear the cache, but it may not be practical on production clusters.
  2. By issuing a call to xdmp:expanded-tree-cache-clear() command on each host in the cluster. You can run the function in query console or via REST endpoint and you will need a user with admin rights to actually clear the cache.

An example script has been provided that demonstrates the use of XQuery to execute the call to clear the Expanded Tree Cache against each host in the cluster:

Please contact MarkLogic Support if you encounter any issues with this process.

Related KB articles and links:

Summary

XDMP-ODBCRCVMSGTOOBIG can occur when a non-ODBC process attempts to connect to an ODBC application server.  A couple of reasons that this can happen is that there is an http application that has been accidentally configured to point to the ODBC port, or a load balancer is sending http health checks to an ODBC port. There are a number of common error messages that can indicate whether this is the case.

Identifying Errors and Causes

One method of determining the cause of an XDMP-ODBCRCVMSGTOOBIG error is to take the size value and convert it to Characters.  For example, given the following error message:

2019-01-01 01:01:25.014 Error: ODBCConnectionTask::run: XDMP-ODBCRCVMSGTOOBIG size=1195725856, conn=10.0.0.101:8110-10.0.0.103:54736

The size, 1195725856, can be converted to the hexadecimal value 47 45 54 20, which can be converted to the ASCII value "GET ".  So what we see is a GET request being run against the ODBC application server.

Common Errors and Values

Error Hexadecimal Characters
XDMP-ODBCRCVMSGTOOBIG size=1195725856 47 45 54 20 "GET "
XDMP-ODBCRCVMSGTOOBIG size=1347769376 50 55 54 20 "PUT "
XDMP-ODBCRCVMSGTOOBIG size=1347375956 50 4F 53 54 "POST"
XDMP-ODBCRCVMSGTOOBIG size=1212501072 48 45 4C 50 "HELP"

Conclusion

XDMP-ODBCRCVMSGTOOBIG errors, do not affect the operation of MarkLogic Server, but can cause error logs to fill up with clutter.  Determining that the errors are caused by an http request to an ODBC port can help to identify the root cause, so the issue can be resolved.

Summary

Meters data can be a good resource for getting an approximation of the number of requests being managed by the server at a given time. It's also important to understand how Meters data is generated, should there be a discrepancy between the Meters samples, and the entries in the access log.

Meters Request Data

The Meters data is designed to record a sampling of activity, every few seconds. Meters data is not designed to accurately record server request rates much lower than every few seconds. Request rates are 15-second moving averages, recalculated every second and available in real time through the xdmp:host-status, xdmp:server-status and xdmp:forest-status built-in functions.

Meters Samples

The metering subsystem samples these real-time rates on the minute and saves the samples in the Meters database. Meters sampled data of events that occur less frequently than the moving average period will be lower than the number of access log entries. The difference between the two will depend on when the last event happened and when the sample was taken.

This mean that if an event happens once a minute, the request rate will rise when an event happens, but then decay away within a few seconds. If the sample is taken after the event has decayed, the saved meters data will be lower than the actual number of requests

Conclusion

The result of the Meters sampling method means that it is not unusual for Meters to under report the number of requests in certain circumstances.

Summary

In MarkLogic Server v7.0-2, the tokenizer keys, for languages where MarkLogic provides generic language support, were removed so that they now all use the same key. For example, Greek falls into this class of languages. This change was made as part of an optimization for languages in which MarkLogic Server has advanced stemming and tokenization support.  

Stemmed searches that include characters from languages that do not have advanced language support, performed on MarkLogic Server v7.0-2 or later releases, against content loaded on a version previous to v7.0-2, may not return the expected results.

Resolution

In order to successfully run these stemmed searches, you can either:

  • Reindexing the database ; or
  • Reinsert the affected documents (i.e. the documents that contain characters in languages for which MarkLogic Server only has generic language support).

If these are not possible in your environment, you can always run the query unstemmed.

An Example

The following example demonstrates the issue

  1. On MarkLogic Server version 7.0-1, insert a document (test.xml) that contains the Greek character 'ε'.
  2. Run this query 
    xdmp:estimate( cts:search( doc('test.xml'), 'ε')),
    cts:contains( doc('test.xml'), 'ε')
  3. The query will return the correct results: 1, true
  4. Upgrade MarkLogic Server to version 7.0-3 or later and run the query again
  5. The query will return incorrect results: 0, false 
  6. Reindex the database and re-run the query
  7. The query will return the correct result once again.
     

Introduction

Marklogic Server persists its configuration in XML files (for example, databases.xml, hosts.xml, etc.), copies of which exist on each node of a cluster. While you could use the Admin UI to manage an individual cluster's configuration, at scale and over multiple environments, the best practice is to build a source-control managed script that uses MarkLogic's configuration management APIs - specifically Configuration Management API (CMA) and REST Management API (RMA).

What is RMA?

RMA (REST Management API- /manage/v2) provides the ability to easily capture detailed information about MarkLogic Server objects and processes such as hosts, databases, forests, application servers, and groups from any tool that can make a RESTful call.

You can read more about the MarkLogic REST Management API at - http://docs.marklogic.com/REST/management.

What is CMA?

Configuration Management API (CMA) is a new, higher-level interface built on top of the REST Management API (RMA). CMA is intended to more easily integrate with downstream tooling like MarkLogic's ml-gradle and Java API, as well as third party options like node.js, bash, and curl.

Customers will typically want to package up configurations to manage the deployment of applications into development, test, and production environments. CMA makes it easier to set up complex MarkLogic features such as replication and failover across these different environments by providing common 'canned' scenarios.

You can read more about the MarkLogic Configuration Management API at - http://docs.marklogic.com/REST/configuration-management-api

How to invoke CMA?

Customer can create and apply configurations in three ways:

1. REST Management API: manage/v3: REST endpoint for generating and applying configurations.

Please refer to http://docs.marklogic.com/REST/configuration-management-api

For example: 

1.1 http://host:8002/manage/v3?format=json : This will return the configuration data in json format. If we change the format to xml, we will get the configurations in xml format.

1.2 http://host:8002/manage/v3?format=zip: This will return package.zip. This archive contains the configuration files (database, forest hosts..etc), README, and ml-gradle property/build files. README from the zip file has the details about how to install ml-gradle and apply the configurations to the other instances.

2. XQuery: cma.xqy --- XQuery library for generating and applying configurations.

Please refer to https://docs.marklogic.com/cma for more details.

cma:apply-config Apply a named configuration, overriding parameters and setting options.
cma:generate-config Retrieve an individual resource, set of resources, or full cluster configuration; generate a configuration from scenarios.

For example:

xquery version "1.0-ml";
import module namespace cma="http://marklogic.com/manage/config"
   at "/MarkLogic/cma.xqy";
cma:apply-config($zip)

3. JavaScript: cma.sjs --- JavaScript library for generating and applying configurations.

Please refer to https://docs.marklogic.com/js/cma for more details.

Function name Description
cma.applyConfig Apply a named configuration, overriding parameters and setting options.
cma.generateConfig Retrieve an individual resource, set of resources, or full cluster configuration; generate a configuration from scenarios.

For example, to create a REST server:

// Create a REST server.
'use strict';

var cma = require('/MarkLogic/cma.sjs');

var json = {
    "config": [{
    "forest":[{"forest-name":"mydb1-f1"},{"forest-name":"mymodulesdb-f1"}],
        "database": [{
                "database-name": "myDb",
        "forest":["mydb1-f1"]
            },
            {
                "database-name": "myModulesDb",
        "forest":["mymodulesdb1-f1"]
            }
        ],
        "server": [{
            "server-name": "restapiServer",
            "server-type": "http",
      "group-name":"Default",
            "root": "/",
            "port": "8900",
      "url-rewriter": "/MarkLogic/rest-api/8000-rewriter.xml",
            "content-database": "myDb",
            "modules-database": "myModulesDb"
        }]
    }]
};
 
cma.applyConfig(json);

Takeaways

  • Use source controlled scripts exercising MarkLogic's Configuration Management API to reliably and consistently manage configuration changes across your environments.
  • Avoid a single monolithic configuration script. The best practice here is to modularize your configuration changes in the form of multiple scripts. If you have multiple databases and configurations, one recommendation would be to maintain the scripts per database. It is easier to maintain and also apply these changes to the instance.
  • Depending on your configuration requirements, you may find your script needing to make calls to both the higher level Configuration Management API as well as the lower level REST Management API. If you have performance issues with CMA/RMA Rest api's, you can even call the Javascript/X-Query api directly from your code.

Introduction

This Knowledgebase article outlines the procedure to enable HTTPS on an AWS Elastic Load Balancer (ELB) using Route 53 or an external supplier as the DNS provider and with an AWS generated certificate.

The AWS Certificate Manager (ACM) automatically manages and renews the certificate and this certificate will be accepted by all current browsers without any security exceptions.

The downside is that you do need control over your Hosted DNS name entry - either through Route 53 or through another provider.

Prerequisites

  1. MarkLogic AWS Cluster
  2. An AWS Route 53 hosted Domain or similar externally hosted Domain; the procedure described in this article assumes that Route 53 is being used, however where possible we have tried to detail the changes needed and these should also be applicable for another external DNS provider.

Procedure

  1. Click on your hostname in Route 53 to edit it

  1. Create a new Alias Record Set to point to your Elastic Load Balancer.

  1. In the Record Set entry on the right hand side, enter an Alias name for your ELB host, select Alias and from the Alias Target select the ELB load balancer to use, then click the Create button to update the Route 53 entry.

  1. In can take a little while for AWS to propagate the DNS update throughout the network but once it is available it is worth checking that you are able to reach your MarkLogic cluster using the new address, e.g.

  1. Once the Route 53 entry is updated and available you will need to request a new certificates through ACM, if you have other certificates already in ACM you can select Request a certificate

Otherwise select Get Started with Provision Certificates and select Request a public certificate

  1. Enter your required Certificate domain name and click Next:

Note: This should match your DNS Alias name entry created in Step 3.

In addition you can also add additional records such as a "Wildcard" entry, this is particularly useful if you want to use the same certificate for multiple hostnames, e.g if you have Clusters identified by versions such as ml9.[yourdomain].com & ml10.[yourdomain].com

  1. Select DNS as the Validation Method and click "Review"

  1. Before confirming and proceeding check the Hostnames are correct as certificates with invalid hosts names will not be usable.

  1. To complete validation, AWS will require you to add random CNAME entries to the DNS record to confirm that you are the owner. If you are using Route 53 this is as simple as selecting each entry in turn, numbers will vary depending on the number of Doamin name entries you specified in step 6, and clicking "Create record in Route 53". Once all entries have been created click Continue

  1. If the update is successful a Success message is displayed

  1. If your DNS Hostname is provided by an external provider you will need to download the entries using the "Export DNS configuration to a file link" and provide this information to your DNS provider to make the necessary updates.

The file is a simple CSV file and specifies one or more CNAME entries that need to be created with the required name and values. Once the AWS DNS validation process picks up these changes have been made the certificate creation process will be completed automatically.

Domain Name,Record Name,Record Type,Record Value
marklogic.[yourdomain].com,_c3949adef7f9a61dd6865a13e65acfdb.marklogic.[yourdomain].com.,CNAME,_7ec4e5ce2cf31212e20ce68d9d0ab9fd.kirrbxfjtw.acm-validations.aws.
*.[yourdomain].com,_9b2138934ee9bbe8562af4c66591d2de.[yourdomain].com.,CNAME,_924153c45d53922d31f7d254a216aed0.kirrbxfjtw.acm-validations.aws.
  1. Once the Certificate has been validated by either of the methods in Steps 9 or 11 the certificate will be marked as Issued and be available for the Load Balancer to use.

  1. Configure the ELB for HTTPS And the new AWS generated Certificate
  2. Edit the ELB Listeners and change the Cipher

  1. (Optional) For production environments it is recommended to allow TLSv1.2 only

  1. Next select the Certificate and repeat Steps 15 and 16 for each listener that you want to secure.

  1. From the ACM available certificates select the newly generated certificate for this domain and click Save

  1. Save the Listeners updates and ensure the update was successful.

  1. You should now be able to access your MarkLogic cluster securely over HTTPS using the AWS generated certificate.

Introduction

HAProxy (http://www.haproxy.org/) is a free, fast and reliable solution offering high availability, load balancing and proxying for TCP and HTTP-based applications.

MarkLogic 8 (8.0-8 and above) and MarkLogic 9 (9.0-4 and above) include improvements to allow you to use HAProxy to connect to MarkLogic Server.

MarkLogic Server supports balancing application requests using both the HAProxy TCP and HTTP balancing modes depending on the transaction mode being used by the MarkLogic application as detailed below:

  1. For single-statement auto-commit transactions running on MarkLogic version 8.0.7 and earlier or MarkLogic version 9.0.3 and earlier, only TCP mode balancing is supported. This is due to the fact that the SessionID cookie and transaction id (txid) are only generated as part of a multi-statement transaction.
  2. For multi-statement transactions or for single-statement auto-commit transactions running on MarkLogic version 8.0.8 and later or MarkLogic version 9.0.4 and later both TCP and HTTP balancing modes can be configured.

The Understanding Transactions in MarkLogic Server and Single vs. Multi-statement Transactions in the MarkLogic documentation should be referenced to determine whether your application is using single or multi-statement transactions.

Note: Attempting to use HAProxy in HTTP mode with Single-statement transactions prior to MarkLogic versions 8.0.8 or 9.0.4 can lead to unpredictable results.

Example configurations

The following example configurations detail only the parameters relevant to enabling load balancing of a MarkLogic application, for details of all parameters that can be used please refer to the HAProxy documentation.

TCP mode balancing

The following configuration is an example of how to balance requests to a 3-node MarkLogic application using the "roundrobin" balance algorithm based on the source IP address. The health of each node is checked by a TCP probe to the application server every 1 second.

backend app
mode tcp
balance roundrobin
stick-table type ip size 200k expire 30m
stick on src
default-server inter 1s
server app1 ml-node-1:8012 check id 1
server app2 ml-node-2:8012 check id 2
server app3 ml-node-3:8012 check id 3

HTTP mode balancing

The following configuration is an example of how to balance requests to a 3-node MarkLogic application using the "roundrobin" balance algorithm based on the "SessionID" cookie inserted by the MarkLogic server.

The health of each node is checked by issuing an HTTP GET request to the MarkLogic health check port and checking for the "Healthy" response.

backend app
mode http
balance roundrobin
cookie SessionID prefix nocache
option httpchk GET / HTTP/1.1\r\nHost:\ monitoring\r\nConnection:\ close
http-check expect string Healthy
server app1 ml-node-1:8012 check port 7997 cookie app1
server app2 ml-node-2:8012 check port 7997 cookie app2
server app3 ml-node-3:8012 check port 7997 cookie app3

Summary

MarkLogic Server organizes Trusted Certificate Authorities (CA) by Organization Name.  Trusted Certificate Authorities are the issuers of digital certificates, which in turn are used to certify the public key on behalf of the named subject as given in the certificate.  These certificates are used in the authentication process by:

  1. A MarkLogic Application Server configured to use SSL (HTTPS).
  2. Any Web Client which is making a connection to a MarkLogic Application Server over HTTPS (in the case of SSL Client Authentication).

Example Scenarios

Consider the following example:

$openssl x509 -in CA.pem -text -noout
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 18345409437988140316 (0xfe97fcaf8a61b51c)
    Signature Algorithm: sha1WithRSAEncryption
        Issuer: C=US, ST=CA, L=San Carlos, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic CA
        Validity
            Not Before: Nov 30 04:08:31 2015 GMT
            Not After : Nov 29 04:08:31 2020 GMT
        Subject: C=US, ST=CA, L=San Carlos, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic CA

In this example, From viewing the Trusted CA Subject field, the CA Certificate name will be listed with the organisation name of "MarkLogic Corporation" (O=MarkLogic Corporation) in MarkLogic's list of Certificate Authorities.

You can view the full list of currently configured Trusted Certificate Authorities by logging into the MarkLogic administration Application Server (on port 8001) and viewing the status page: Configure -> Security -> Certificate Authorities

Trusted CA Certificate without Organization name (O=)

In some cases, there are legitimate Trusted CA Certificates which do not contain any further information about the Organization responsible for the certificate.

The example below shows a sample self signed root CA (DemoLab CA) which highlights this scenario:

$openssl x509 -in DemoLabCA.pem  -text -noout
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 12836463831212471403 (0xb22447d80f91b46b)
    Signature Algorithm: sha1WithRSAEncryption
        Issuer: CN=DemoLab CA
        Validity
            Not Before: Nov 30 05:23:13 2015 GMT
            Not After : Nov 29 05:23:13 2020 GMT
        Subject: CN=DemoLab CA

If this Certificate were to be loaded into the MarkLogic, no name would appear under the list of <em>Certificate Authorities</em>in the list provided through the administration Application Server at Configure -> Security -> Certificate Authorities

In the case of the above example, it would be difficult to use the certificate validated by DemoLab CA (and to use DemoLab CA as our Trusted Certificate Authority) as MarkLogic will only list certificates that are associated with an Organization.

Solution

To workaround this issue, we can configure MarkLogic to use the certificate through some scripting with Query Console.

1) Loading the CA using Query Console

Start by using a call to pki:insert-trusted-certificates to load the Trusted CA into MarkLogic.  The sample Query Console code below demonstrates this process (Please ensure this query is executed against the Security database)

Make a note of value of the id returned by MarkLogic. It will return an unsigned long (xs:unsignedLong) which is the id value that can be used later to retrieve that certificate

2) Attach Trusted CA with "SSL Client Certificate Authorities" using Query Console

The next step is to associate the certificate that we just inserted from our filesystem (DemoLabCA.pem) with a given MarkLogic Application Server. Once this is done, any client connecting to that application server over SSL will be presented with the cerificate and DemoLab CA will be used to match the certificate using the Common Name value (Common Name eq "DemoLab CA")

3) Verify attached Trusted CA for Client Cetificate Authorities

Executing the above code should return the same identifier (for the Trusted CA) as returned as result of the code executed in step 1. Additionally, we can see that our Application Server (DemoAppServer) is now configured to expect an SSL Client Certificate Authority signed by DemoLab CA.

Further Reading

Introduction

MarkLogic Server is engineered to scale out horizontally by easily adding forests and nodes. Be aware, however, that when adding resources horizontally, you may also be introducing additional demand on the underlying resources.

Details

On a single node, you will see some performance improvement in adding additional forests, due to increased parallelization. This is a point of diminishing returns, though, where the number of forests can overwhelm the available resources such as CPU, RAM, or I/O bandwidth. Internal MarkLogic research (as of April 2014) shows the sweet spot to be around six forests per host (assuming modern hardware). Note that there is a hard limit of 1024 primary forests per database, and it is a general recommendation that the total number of forests should not grow beyond 1024 per cluster.

At cluster level, you should see performance improvements in adding additional hosts, but attention should be paid to any potentially shared resources. For example, since resources such as CPU, RAM, and I/O bandwidth would now be split across multiple nodes, overall performance is likely to decrease if additional nodes are provisioned virtually on a single underlying server. Similarly, when adding additional nodes to the same underlying SAN storage, you'll want to pay careful attention to making sure there's enough I/O bandwidth to accommodate the number of nodes you want to connect.

More generally, additional capacity above a bottleneck generally exacerbates performance issues. If you find your performance has actually decreased after horizontally scaling out some part of your stack, it is likely that a part of your infrastructure below the part at which you made changes is being overwhelmed by the additional demand introduced by the added capacity.

Summary

MarkLogic Application Servers will keep a connection open after completing and responding to a request, waiting for another new request, until the Keep Alive timeout expires. However, there is an exception scenario where the connection will close regardless of timeout settings when the content is larger then 1 MB. This article is intended to provide further insight into connection close with respect to Payload size.

HTTP Header

Connection-Length

In general, Application Servers communicating in HTTP send the Content-Length header as part of their response HTTP Headers to indicate how many bytes of data the client application should expect to receive. For example

HTTP/1.1 200 OK
Content-type: application/sparql-results+json; charset=UTF-8
Server: MarkLogic
Content-Length: 1264
Connection: Keep-Alive
Keep-Alive: timeout=5

This requires Application Servers to know the length of the entire response data before the very first bytes (Response HTTP Headers) are put on to the wire. For small amounts of data, the time to calculate the content-length is fast; For large amounts of content, the calculation may be time consuming with the extreme being that the client finds the server unresponsive due to the delay in calculating the entire response length. Additionally, the server may need to bring the entire content into Memory Buffer, putting further burden on server resources.

Chunked-encoding

To allow servers to begin transmitting dynamically-generated content before knowing the total size of that content, HTTP 1.1 supports chunked encoding. This technique is widely used in music & video streaming and other industries. Chunked encoding eliminates the need of knowing the entire content length before sending a portion of the data, thus making the server looks more responsive.

At the time of this writing, MarkLogic Server (v8.0-6 and earlier releases) does not support chunked encoding. However, do look for this feature in future releases of MarkLogic Server.

Connection Close

In MarkLogic Server v7 and v8, MarkLogic Server closes the connection after transmitting content greater 1MB, which allows MarkLogic to avoid calculating content length in advance. The client will not see Content-Length Header for Larger (>1MB) content in HTTP Response from MarkLogic. Instead it will receive a Connection Close header in HTTP Response. After sending the entire content, MarkLogic Server will terminate the connection, to indicate to Client that the end of content has been reached.

Closing the existing connection for content larger then 1MB is an exception to the Keep-Alive configuration. This may result in unexpected behavior on clients that relying on MarkLogic Server respecting the Keep-Alive configuration, so this behavior should be accounted while designing Client Application Connection Pool.

Client Applications may have to send TCP SYN again to establish new connection to send subsequent request, which will add overhead of TCP 3 way handshake before sending next request. However, in the context of the data transfer for larger payload (>1MB), where many more round trips are added in overall communication, overhead of TCP 3 way handshake is very nominal.

Further Reading

Summary

CSV files are a very common data exchange format. It is often used as an export format for spreadsheets, databases or any other application. Depending on the application, you might be able to change the delimiter character to a #hash or *asterix etc. One of the default delimiter definitions is a tab character. Content Pump supports reading and loading such CSV files.

Detail

The Content Pump -delimiter option defines which delimiter will be used to split the columns. Defining a tab as a value for the delimiter option on the command line isn't straight forward.

Loading tab delimited data files with content pump can result in an error massage like the following:

mlcp>bin/mlcp.sh IMPORT -host localhost -port 9000 -username admin -password secret -input_file_path sample.csv -input_file_type delimited_text -delimiter '    ' -mode local
13/08/21 15:10:20 ERROR contentpump.ContentPump: Error parsing command arguments: 
13/08/21 15:10:20 ERROR contentpump.ContentPump: Missing argument for option: delimiter
usage: IMPORT [-aggregate_record_element <QName>]
... 

Depending on the command line shell, a tab needs to be escaped to be understand from the shell script: 

On bash shell, this should work: -delimiter $'\t'
On Bourne shell, this should work: -delimiter 'Ctrl+V followed by tab' 
Alternative way would be to use: -delimiter \x09 

If none of these work, another approach you can try is to use the -options_file /path/to/options-file parameter. The options file can contains all of the same parameters as the command line does. The benefit of using an option file is that the command line is simpler and characters are interpreted as intended. The options file will contain multiple lines where the first line is always the action like IMPORT,  EXPORT etc. followed by a pair of lines. The first line is the option parameter and second the value for the option.

A sample could look like the following:

IMPORT
-host
localhost
-port
9000
-username
admin
-password
secret
-input_file_path
/path/to/sample.csv
-delimiter
' '
-input_file_type
delimited_text


Make sure the file is saved in UTF-8 format to avoid any parsing problems. To define a tab as delimiter, place a real tab between single quotes (i.e. '<tab>')

To use this option file with mlcp execute the following command:

Linux, Mac, Solaris:

mlcp>bin/mlcp.sh -options_file /path/to/sample.options

Windows:

mlcp>bin/mlcp.bat -options_file /path/to/sample.options

The options file can take any paramter which mlcp understands. It is important that the action command is defined on the first line. It is also possible to use both command line parameters and the option file. Command line parameters take precedence over those defined in the options file.

Summary

There are sometimes circumstances where the MarkLogic data directory owner can be changed.  This can create problems where MarkLogic Server is unable to read and/or write its own files but is easily corrected.

MarkLogic Server user

There are sometimes circumstances where the MarkLogic data directory owner can be changed; this can create problems where MarkLogic Server is unable to read and/or write its own files.

The default location for the data directory on Linux is /var/opt/MarkLogic and the default owner is daemon.

If you are using a nondefault (non-daemon) user to run MarkLogic, for example mlogic, you would usually have 

    export MARKLOGIC_USER=mlogic

in 

    /etc/marklogic.conf 

Correct the data directory ownership

If the file ownership is incorrect, the way forward is to change the ownership back to the correct user.  For example, if using the default user daemon:

1.  Stop MarkLogic Server.

2.  Make sure that the user you are using is correct and available on this machine.

3.  Change the ownership of all the MarkLogic files (by default /var/opt/MarkLogic and any/all forests for this node) to daemon.  The change needs to be made recursively below the directory to include all files.  Assuming all nodes in the cluster run as daemon, you can use another unaffected node as a check.  You may need to use root/sudo permissions to change owner.  For example:

chown -R daemon:daemon /var/opt/MarkLogic

4.  Start MarkLogic Server.  It should now come up as the correct user and able to manage its files.

References

Introduction:

MarkLogic Server allows you to set-up an alerting application to notify users when new content is available that matches a predefined query. This can be achieved through the Alerting API with the Content Processing Framework (CPF). CPF is designed to keep state for documents, so it is easy to use CPF to keep track of when a document in a particular scope is created or updated, and then perform some action on that document. However, although alerting works for document updates and inserting, it does not occur for document deletes. You will have to create a custom CPF pipeline to catch the delete through an appropriate status transition.

Details

To achieve alerting for document delete, you will have to write your own custom pipeline with status transition to handle deletes. For example:

<status-transition>
   <annotation>custom delete action</ annotation>
   <status>deleted</p:status>
   <priority>5000</p:priority>
   <always>true</always>
   <default-action>
       <module>/custom-delete-action.xqy</module>
   </default-action>
</status-transition>

The higher 'priority' value and 'always' = true indicates that the custom pipeline has precedence over the default status change handling pipeline to handle document deletes.  Similarly, in the action module, you can write your custom code for alerting.

Note: By default, when a document is deleted, the on-delete pre-commit trigger is fired and it calls the action in the Status Change Handling pipeline (if enabled) for ‘delete’ status transition. It is recommended that you do not modify this pipeline as it can cause compatibility problems in future upgrades and releases of MarkLogic server.

Introduction

If you're looking at the MarkLogic Admin UI on port 8001, you may have noticed that the status page for a given database displays the last backup dateTime for a given database.

We have been asked in the past how this gets computed so the same check can be performed using your own code.

This Knowledgebase article will show examples that utilise XQuery to get this information and will explore the possibility of retrieving this using the MarkLogic ReST API

XQuery: How does the code work?

The simple answer is in the forest status for each of the forests in the database (note these values only appear if you have created a backup already).  For the sake of these examples, let's say we have a database (called "test") which contains 12 forests (test-1 to test-12).  We can get the backup status for these using a call to our ReST API:

http://localhost:8002/manage/LATEST/forests/test-1?view=status&format=html

In the results returned, you should see something like this:

last-backup : 2016-02-12T12:30:39.916Z datetime
last-incr-backup : 2016-02-12T12:37:29.085Z datetime

In generating that status page, what the MarkLogic code does is to create an aggregate: a database doesn't contain documents in MarkLogic; it contains forests and those forests contain documents.

Continuing the example above (with a database called "test" containing 12 forests) if I run the following:

This will return the forest status(es) for all forests in the database "test" and return the forest names using XPath, so in this case, we would see:

<forest-name xmlns="http://marklogic.com/xdmp/status/forest">test-1</forest-name>
[...]
<forest-name xmlns="http://marklogic.com/xdmp/status/forest">test-12</forest-name>

Our admin UI is interrogating each forest in turn for that database and finding out the metrics for the last backup.  So to put that into context, if we ran the following:

This gives us:

<last-backup xmlns="http://marklogic.com/xdmp/status/forest">2016-02-12T12:30:39.946Z</last-backup>
[...]
<last-backup xmlns="http://marklogic.com/xdmp/status/forest">2016-02-12T12:30:39.925Z</last-backup>

The code (or the status report) doesn't want values for all 12 forests, it just wants the time the last forest completed the backup (because that's the real time the backup completed), so our code is running a call to fn:max:

Which gives us the max value (as these are all xs:dateTimes, it's finding the most recent date), which in the case of this example is:

2016-02-12T12:30:39.993Z

The same is true for the last incremental backup (note all that we're changing here is the XPath to get to the correct element:

So we can get the max value for this by getting the most recent time across all forests:

This would give us 2016-02-12T12:37:29.161Z

Using the ReST API

The ReST API also allows you to get this information but you'd need to jump through a few hoops to get to it; the ReST API status for a given database would give you the names of all the forests attached to that database:

http://localhost:8002/manage/LATEST/databases/test

And from there you could GET the information for all of those forests:

http://localhost:8002/manage/LATEST/forests/test-1?view=status&format=html
[...]
http://localhost:8002/manage/LATEST/forests/test-12?view=status&format=html

Once you'd got all those values, you could do what MarkLogic's admin code does and get the max values for them - although at this stage, it might make more sense to write a custom endpoint that returns this information, something like:

Where you could make a call to that module to get the aggregates (e.g.):

http://[server]:[port]/[modulename.xqy]?db=test

This would return the database status for any given parameter-name that is passed in.

 

Problem:

When searching for matches using OR'ed word-queries, and in the case where there are overlapping matches, (i.e. one query contains the text of another query), the results of a cts:highlight query are not as desired.

 

For example:

 

let $p := <p>From the memoirs of an accomplished artist</p>

 

let $query :=

 

cts:or-query(

(cts:word-query("accomplished artist"),

cts:word-query("memoirs of an accomplished artist"))

)

 

return cts:highlight($p, $query, <m>{$cts:text}</m>)

 

 The desired outcome of this would be:

               <p>From the <m>memoirs of an accomplished artist</m> </p>

 Whereas, the actual results are:

                <p>From the <m>memoirs of an </m> <m>accomplished artist</m></p>

 

This behavior is by design and the results are expected. It is because cts:highlight  breaks up overlapping  areas into separate matches.

The cts:highlight built-in variables – $cts:queries and $cts:action help in understanding how this works, as well as to work-around this problem.

  $cts:queries --> returns the matching queries for each of the matched texts.

  $cts:action --> can be used with xdmp:set to specify what should happen next

  • "continue" - (default) Walk the next match. If there are no more matches, return all evaluation results.
  • "skip" - Skip walking any more matches and return all evaluation results
  • "break" - Stop walking matches and return all evaluation results

   For eg., replacing the return statement with the following in the original query:

return

 cts:highlight($p, $query,

<m>{$cts:text,<number-of-matches>{count($cts:queries)}</number-of-matches>,

<matched-by>{$cts:queries}</matched-by>}</m>)

 

==>

 

<p>From the

     <m>memoirs of an

     <number-of-matches>1</number-of-matches>

     <matched-by>

      <cts:word-query xmlns:cts="http://marklogic.com/cts">

       <cts:text xml:lang="en">memoirs of an accomplished artist</cts:text>

      </cts:word-query>

    </matched-by>

     </m>

 

   <m>accomplished artist

   <number-of-matches>2</number-of-matches>

    <matched-by>

      <cts:word-query xmlns:cts="http://marklogic.com/cts">

     <cts:text xml:lang="en">memoirs of an accomplished artist</cts:text>

      </cts:word-query>

      <cts:word-query xmlns:cts="http://marklogic.com/cts">

    <cts:text xml:lang="en">accomplished artist</cts:text>

      </cts:word-query>

    </matched-by></m></p>

 

These results give us a better understanding of how the text is being matched. We can see that " accomplished artist" is matched by both the word-queries 'accomplished artist' and 'memoirs of an accomplished artist'; hence the results of cts:highlight seem different.

To work around this problem, we can insert a small piece of code: 

 

let $p := <p>From the memoirs of an accomplished artist</p>

let $query :=

     cts:or-query(

        (cts:word-query("accomplished artist"),

        cts:word-query("memoirs of an accomplished artist")))

 

     return cts:highlight($p,$query,

 

       ( if (count($cts:queries) gt 1) then xdmp:set($cts:action, "continue")

         else

       ( let $matched-text := <x>{$cts:queries}</x>/cts:word-query/cts:text/data(.)

        return <m>{$matched-text}</m> )

        ))

 

==>

 

<p>From the <m>memoirs of an accomplished artist</m></p>

 

 

Please note that this solution relies on assumptions about what's inside the or-query, but this example could be modified to handle other overlapping situations.

 

   

 



      These results giv

      e us a better understanding of how the text is being matched. We can see that " accomplished artist" is matched by both the word-queries, and hence the results of cts:highlight seem different.

      Summary

      Packer from HashiCorp is an open source provisioning tool, allowing for the automated creation of machine images, extending the ability to manage infrastructure to machine images. Packer supports a number of different image types including AWS, Azure, Docker, VirtualBox and VMWare.

      These powerful tools can be used together to deploy a MarkLogic Cluster to AWS using the MarkLogic CloudFormation Template, using a customized Amazon Machine Image (AMI). The MarkLogic CloudFormation Template is the preferred method recommended by MarkLogic for building out MarkLogic clusters within AWS. By default the MarkLogic CloudFormation Template uses the official MarkLogic AMIs.

      While this guide will cover a some portions of Terraform, the primary focus will be using Packer to customize an official MarkLogic AMI. For more detailed information on Terraform, we recommend reading Deploying MarkLogic to AWS with Terraform, which includes more detailed information on using Terraform, as well as the example files referenced later in this article.

      Setting Up Packer

      For the purpose of this example, I will assume that you have already installed the AWS CLI, with the correct credentials, and you have installed Packer.

      Packer Templates

      A Packer template is a JSON configuration file that is used to define the image that we want to build. Templates have a number of keys available for defining the machine image, but the most commonly used ones are builders, provisioners and post-processors.

      • builders are responsible for creating the images for various platforms.
      • provisioners is the section used to install and configure software running on machines before turning them into images.
      • post-processors are actions applied to the images after they are created.

      Creating a Template

      For our example, we are going to take the official MarkLogic AMI and apply some customizations before creating a new image.

      Defining Variables

      Variables help make the build more flexible, so we will utilize a seperate variables file, vars.json, to define parts of our build.

      {
      "vpc_region": "us-east-1",
      "vpc_id": "vpc-06d3506111cea30d0",
      "vpc_public_sn_id": "subnet-03343e69ae5bed127",
      "vpc_public_sg_id": "sg-07693eb077acb8635",
      "ami_filter": "release-MarkLogic-10*",
      "ami_owner": "679593333241",
      "instance_type": "t3.large",
      "ssh_username": "ec2-user"
      }

      Creating Our Template

      Now that we have some of the specific build details defined, we can create our template, base_ami.json. In this case we are going to use the build and provisioners keys in our build.

      {
        "builders": [
          {
            "type": "amazon-ebs",
            "region": "{{user `vpc_region`}}",
            "vpc_id": "{{user `vpc_id`}}",
            "subnet_id": "{{user `vpc_public_sn_id`}}",
            "associate_public_ip_address": true,
            "security_group_id": "{{user `vpc_public_sg_id`}}",
            "source_ami_filter": {
              "filters": {
              "virtualization-type": "hvm",
              "name": "{{user `ami_filter}}",
              "root-device-type": "ebs"
              },
              "owners": ["{{user `ami_owner`}}"],
              "most_recent": true
            },
            "instance_type": "{{user `instance_type`}}",
            "ssh_username": "{{user `ssh_username`}}",
            "ami_name": "ml-{{isotime \"2006-01-02-1504\"}}",
            "tags": {
              "Name": "ml-packer"
            }
          }
      ],
        "provisioners": [
          {
            "type": "shell",
            "script": "./baseInit.sh"
           },
          {
            "destination": "/tmp/",
            "source": "./marklogic.conf",
            "type": "file"
          },
          {
            "type": "shell",
            "inline": [ "sudo mv /tmp/marklogic.conf /etc/marklogic.conf" ]
          }
        ]
      }

      In the build section we have defined the network and security group configurations and the source AMI details. We have also defined the naming convention (ml-YYYY-MM-DD-TTTT) for the our new AMI with ami_name and added a tag, ml-packer. Both of those will make it easier to find our AMI when it is time to use it with Terraform.

      Provisioners

      In our example, we are using the shell provisioner to execute a script against the machine, the file provisioner to copy the marklogic.conf file to the machine, and the shell provisioner to move the file to /etc/, all of which will be run prior to creating the image. There are also provisioners available for Ansible, Salt, Puppet, Chef, and PowerShell, among others.

      Provisioning Script

      For our custom image, we've determined that we need an additional piece of software installed, which we will do inside a script. We've named the script baseInit.sh, and it is stored in the same directory as our packer template.

      #!/bin/bash
      echo "**** Starting setup.sh ****"
      echo "Installing Git"
      sudo yum install -y git
      echo "**** Finishing setup.sh ****"

      Executing Our Build

      Now that we've completed setting up our build, it's time to use packer to create the image.

      packer build -debug -var-file=vars.json base_ami.json

      Here you can see that we are telling packer to do a build using base_ami.json and referencing our variables file with the -var-file flag. We've also added the -debug flag which will disable parallelism and enable debug mode. In debug mode, packer will stop after each step and prompt you to hit Enter to go to the next step.

      The last part of the build output will print out the details of our new image:

      ==> Builds finished. The artifacts of successful builds are:
      --> amazon-ebs: AMIs were created:
      us-east-1: ami-0100....

      Terraform and the MarkLogic CloudFormation Template

      At this point we have our image and want to use it when deploying the MarkLogic CloudFormation Template. Unfortunately there is no simple way to do this, as the MarkLogic CloudFormation Template does not have the option to specify a custom AMI. Fortunately Terraform has some functions available that we can use to make the changes to the Template.

      Variables

      First we want to add a couple entries to our existing Terraform variables file.

      variable "ami_tag" {
        type = string
        default = "ml-packer"
      }

      variable "search_string" {
        type = string
        default = "ImageId: "
      }

      The first variable, ami_tag is the tag we added to AMI when it was built. The second variable, search_string will be described in the Updates to Terraform Root Module section below.

      Data Source

      To retrieve the AMI, we need to define a data source. In this case it will be an aws_ami data source. We are going to call the file data-source.tf.

      data "aws_ami" "ml_ami" {
        filter {
          name = "state"
          values = ["available"]
        }

        filter {
          name = "tag:Name"
          values = ["${var.ami_tag}"]
        }
        owners = ["self"]
        most_recent = true
      }

      So we are filtering the available AMIs, only looking at ones that are owned by our own account (self), tagged with the value that we defined in our variables file, and then if more than one AMI is returned, using the most recent.

      Updates to Terraform Root Module

      Now we are ready to make a couple of updates to our Terraform root module file to integrate the new AMI into our deployment. In our last example, we used the MarkLogic CloudFormation template from its S3 bucket. For this deployment, we are going to use a local copy of the template, mlcluster-template.yaml.

      Replace the template_url line with the following line:

      template_body = replace(file("./mlcluster-template.yaml"), "/${var.search_string}.*/","${var.search_string} ${data.aws_ami.ml_ami.id}")

      When we updated the variables in our Terraform variable file, we created the variable search_string. In the MarkLogic CloudFormation Template, the value for the Image ID is identified by the region and whether you are running the Essential Enterprise or Bring Your Own License version of MarkLogic Server. Here we are taking a regular expression, and using the replace function to manually update the line to reference the AMI we just created with Packer, which we have already retrieved already.

      Deploying with Terraform

      Now we are ready to run Terraform to deploy our cluster. First we want to double check that the template looks correct before we attempt to create the CloudFormation stack. The output of terraform plan will show the CloudFormation template that will be deployed. Check the output to make sure that the value for ImageId shows our desired AMI

      Once we have confirmed our new AMI is being referenced, we can then run terraform apply to create a new stack using the template. This can be validated by opening a command line on one of the new hosts, and checking to see if Git is installed, and if /etc/marklogic.conf exists:

      Wrapping Up

      At this point, we have now customized the official MarkLogic AMI to create our own AMI using Packer. We have then used Terraform to update the MarkLogic CloudFormation Template and to deploy a CloudFormation stack based on the updated template.

      Introduction

      In the Scalability, Availabilty & Failover Guide, the node communication section describes a quorum as >50% of the nodes in a cluster.

      Is it possible for a database to be available for reads and writes, even if a quorum of nodes is not available in the cluster?

      The answer is yes, there are configurations and sequences of events that can lead to forests remaining online when there are fewer than 50% of the hosts being online.

      Details

      If a single forest in a database is not available, the database is not be accessible. It is also true that as long as all of a database's forests are available in the cluster, the database will be available for reads and writes regardless of any quorum issues.

      Of course, the Security database must also be available in the cluster for the cluster to function.

      Forest Availability: Simple Case

      In the simplest case, if you have a forest that is not configured with either local disk failover or shared disk failover and as long as the forest's host is online and exists in the cluster, the forest will be available regardless of any quorum issues.

      To explain this case in more detail: if we have a 3-node MarkLogic cluster containing 3 hosts (let's call them host-a, host-b and host-c); if we were to then initialize host-a as the primary host (so this is the first host is set up in the cluster and is the host containing the master security database) and we then join host-b and host-c to host-a to complete the cluster. 

      Shortly after that, if we shut both the joiner hosts (host-b and host-c) down, so only host host-a remained online, we would see a chain of messages in the primary host's ErrorLog that indicated there was no longer quorum within the cluster:

      2020-05-21 01:19:14.632 Info: Detected quorum (3 online, 1 suspect, 0 offline)
      2020-05-21 01:19:18.570 Warning: Detected suspect quorum (3 online, 2 suspect, 0 offline)
      2020-05-21 01:19:29.715 Info: Disconnecting from domestic host host-b.example.marklogic.com because it has not responded for 30 seconds.
      2020-05-21 01:19:29.715 Info: Disconnected from domestic host host-b.example.marklogic.com
      2020-05-21 01:19:29.715 Info: Detected suspect quorum (2 online, 1 suspect, 1 offline)
      2020-05-21 01:19:33.668 Info: Disconnecting from domestic host host-c.example.marklogic.com because it has not responded for 30 seconds.
      2020-05-21 01:19:33.668 Info: Disconnected from domestic host host-c.example.marklogic.com
      2020-05-21 01:19:33.668 Warning: Detected no quorum (1 online, 0 suspect, 2 offline)

      Under these circumstances, we would be able to access the host's admin GUI on port 8001 and it would respond without issue.  We would be able to access Query Console on that host on port 8000 and would be able to inspect the primary host's databases.  We would also be able to access the Monitoring History on port 8002 - all directly from the primary host.

      In this scenario, because the primary host remains online and the joining hosts are offline; and because we have not yet set up failover anywhere, there is no requirement for quorum, so host-a remains accessible.

      If host-a also happened to have a database with forests that only resided on that host, these would be available for queries at this time.  However, this is a fairly limited use case because in general, if you have a 3-node cluster, you would have a database whose forests reside on all three hosts in the cluster with failover forests configured on alternating hosts. 

      As soon as you do this, if you lose one host and you don't have failover configured, the database would now become unavailable (due to a crucial forest being offline) and if you had failover forests configured, you would still be able to access the database on the remaining two hosts.

      However, if you then shut down another host, you would lose quorum (which is a requirement for failover).

      Forest Availability: Local Disk Failover

      For forests configured for local disk failover, the sequence of events is important:

      In response to a host failure that makes an "open" forest inaccessible, the forest will failover to the configured forest replica as long as a quorum exists and the configured replica forest was in the "sync replicating" state. In this case, the configured replica forest will transition to the "open" state; the configured replica forest becomes the acting master forest and is available to the database for both reads and writes.

      Additionally, an "open" forest will not go offline in response to another host being evicted from the cluster.

      However, once cluster quorum is lost, forest failovers will no longer occur.

      Conclusion

      Depending on how your forests are distributed in the cluster and depending of the order of host failures, it is possible that a database can remain online even when there is no longer a quorum of hosts in the cluster.

      Of course, databases with many forests spread across many hosts typically can't stay online if you lose quorum because some forest(s) will become unavailable.

      Recommendation

      Even though it is possible to have a functioning cluster with less than a quorum of hosts online, you should not architect your high availability solution to depend on it.

      Summary

      This article discusses what happens when you backup or restore your database after a local disk failover event on one of the database forests.

      Introduction

      MarkLogic Server provides high availability in the event of a data node failure. Data node failures can include operating system crashes, MarkLogic Server restarts, power failures, or persistent system failures; for example hardware failures. With Forest level failover enabled and configured, a machine that hosts a forest can go down and the MarkLogic Server cluster automatically recovers from the outage and keep continuing to process queries without any immediate action needed by an administrator. In MarkLogic Server, if a forest becomes unavailable then the entire database to which this forest is attached becomes unavailable for further query operations. Without failover, such a failure requires a manual intervention (such as administrator) to either reconfigure the forest to another host or to remove this forest from the configuration (cluster). With failover, you can configure the forest to automatically switch to a replica forest on a different host. MarkLogic Server Failover provides for high availability and maintains data and transactional integrity in the event of a data node failure.

      The failover scenarios are well documented on our developer web site.

      Local Disk Failover

      You to configure a forest on another host to serve as a replica forest which will take over when a primary master forest's host goes offline. Local-disk failover allows you to create one or more replica forests for each primary forest. Replica forests contain the exact same data as the primary forest and are kept consistent transactionally. 

      It is helpful to use the following terms to refer to the forest configurations and states:

      • Configured Master is the forest which is originally configured as the primary forest.
      • Configured Replica is a forest on another host that is configured as a replica forest of the primary. 
      • Acting Master is the forest that is server as the master forest, regardless of the configuration.
      • Acting Replica is the forest that is server as the replica forest, regardless of the configuration.

      Database Backup when a forest is failed over

      If you attempt to take a Database back up or perform a database restore when One of the forests of the database had failed over to the replica (i.e. Configured Replica is serving as Acting Master), it may result in XDMP-FORESTNOTOPEN or XDMP-HOSTDOWN errors.

      When a database backup takes place, by default, everything associated with database gets backed up. You can also choose to backup any individual forests (only the forests selected while configuring backup are backed up). T

      Replica Forest will only be backed up when the 'Include replica forests' are enabled.  If you have not configured the backup to include replica forests, then the replica forests will not be backed up even if it is the acting master. If the Configured Master is also not available, then neither forest will be backed up. In this circumstance, you may see a message in the error logs similar to "Warning: Not backing up database test because first forest master is not available, and replica backups aren't enabled."

      Restore when a forest is failed over

      Restore's will fail if executed when a forest is failed over (i.e. Configured Replica is serving as Acting Master). In this circumstance, you may see a message in the error logs similar to "Operation failed with error message. Check server logs." or "XDMP:HOSTDOWN".

      How to detect if a forest is failed over

      In the Admin UI:

      1. Click the Forests icon in the left tree menu;
      2. Click the Summary tab;
      3. You see the configured replica in open state; (This indicates that the Configured Replica is serving as Acting Master).

      At the time of the failover event, you may see messages in the Error Log similar to:
      2013-10-03 12:49:53.873 Info: Disconnecting from domestic host rh6v-intel64-9.marklogic.com in cluster 16599165797432706248 because it has not responded for 30 seconds.
      2013-10-03 12:49:53.873 Info: Disconnected from host rh6v-intel64-9.marklogic.com
      2013-10-03 12:49:53.873 Info: Unmounted forest test_P
      2013-10-03 12:49:53.875 Info: Forest test_R assuming the role of master with new precise time 13808297938747190
      2013-10-03 12:49:53.875 Debug: Recovering undo on forest test_R
      2013-10-03 12:49:53.875 Debug: Recovered undo at endTimestamp 13807844927734200 minQueryTimestamp 0 on forest test_R

      Revert back from the failover state:

      When the configured master is the acting replica, this is considered the "failover state".  In order to revert back, you must either restart the acting master forest or restart the host in which the acting master forest is locally mounted. After restarting, the forest will automatically revert to Configured Master if it's host is online. To check the status of the forests, see the Forests Summary tab in the Admin Interface. 


      Conclusion 

      For backup and restore to work correctly, clusters configured with local disk failover must have no forests in a failed over state. If a cluster is configured with local disk failover, and if some of its forests are failed over to their local disk replicas, the conditions causing the fail over must be resolved, and the cluster must be returned to the original forest configuration before backup and restore operations may resume.

      INTRODUCTION

      From the documentation:

      Queries on a Replica database must run at a timestamp that lags the current cluster commit timestamp due to replication lag. Each forest in a Replica database maintains a special timestamp, called a Non-blocking Timestamp, that indicates the most current time at which it has complete state to answer a query. As the Replica forest receives journal frames from its Master, it acknowledges receipt of each frame and advances its nonblocking timestamp to ensure that queries on the local Replica run at an appropriate timestamp. Replication lag is the difference between the current time on the Master and the time at which the oldest unacknowledged journal frame was queued to be sent to the Replica.

      To read more:

      http://tinyurl.com/7zwq4l2

      SCENARIO

      Consider the following customer scenario:

      • The storage the database resides on at one site fails.
      • This requires the customer to run for a period of time on a single site.
      • The storage / MarkLogic server are recovered at the site where the failure occurred.
      • The customer needs to re-establish replication between the two sites

      QUESTIONS AND ANSWERS

      Q: Should we tune the lag limit to suit our application?

      AWe have found in our own performance testing that increasing the lag limit beyond the default is typically not helpful.

      When the master has a sustained rate of updates, a large lag limit causes it to run quickly ahead of the replica, then stall for an extended period of time until the replica catches up. This pattern repeats over and over and gives inconsistent performance on the master.

      A smaller lag limit causes the master to suspend updates more frequently but for shorter periods of time, resulting in more consistent perceived performance.

      Q: Is there any option to restore the replica database to a point in time from a backup of the master database & re-initiate replication from that point onwards?

      A: It's fine to restore a backup to the failed system when it comes back online and before configuring replication in the reverse direction.

      Q: Is there a limit to how old a backup of the replica database can be (e.g. can a replica be restored from months back in comparison to the master) and will it still sync back to the master without issue? And does this depend on what journal data is available?

      A: There is no limit to how old a backup can be; the system will calculate all the deltas and apply them.

      Q: Are there any documented API built-ins for any of these things?

      A: Indeed; all the replication information is available through a call to xdmp:forest-status()

      xdmp:forest-status( 
        xdmp:database-forests( 
          xdmp:database("MyDatabase"), 
          fn:true()))

      For further information:

      http://tinyurl.com/d6vbpk4

      Q: Can you also advise if the replication lag limit mentioned in section 1.2.5 and the related possibility of transactions stalling on the master database applies during the bulk replication phase?

      A: As long as the replica's forests are in "open replica" state, the replica will respond to queries at any commit timestamp it is able to support irrespective of whether replication is lagged.

      A new feature in MarkLogic 5 is an application server setting for multi-version concurrency control (by default this is set to contemporaneous - meaning it will run from the latest timestamp that any query has committed - irrespective of whether there are still transactions in-flight).

      Conversely, if nonblocking is chosen (i.e. if you create an application server to query a replica database and you set multi-version concurrency control to nonblocking), the server will choose the last timestamp where all pending transactions are known to have successfully committed.

      If you wish to evaluate a query against a replica database you can use xdmp:database-nonblocking-timestamp() to determine the most current query timestamp that will not block.

      Introduction

      Database Replication replicates fragments/documents from a source database to a target database. You may see different database sizes (even when active fragment counts are then same) between Master and Replica Databases. This article provides overview of variables and reasons behind such observation.

      Database Replication:

      Database Replication operates at the forest level by copying journal frames from a forest in the Master database and replaying them on a corresponding forest in the foreign replica database. In other words, this means that when Journal frames are replayed in the replica database, the same group of documents in a single stand of the master database, does not necessarily reside in the same stand on the replica database - i.e. the distribution of fragments within stands are different between the master and replicas. 

      Also, Note that Master and Replica forests can be distributed differently across hosts in each cluster. Even when they are distributed identically (Master DB forest name to Replica DB forest name) you could still see a different number stand between them.

      Database Size, Deleted Fragment and Merge:

      Current Database Size depends on number of factors like number of documents, index, deleted fragments in Stand etc. Deleted Fragments in any stand itself depends on Merge Policy, Background Merge process, Processing Cycle available, Linux Memory Config, Memory Usage at any given time, and application usage pattern.

      Conclusion:

      Master Cluster and Replica Cluster are separate entities. Although connected, they operate independently. Replica Database on target cluster provides data consistency. However how data can be spread across different stands than the corresponding master, including the retention of deleted fragments, will differ between Master and Replica Cluster. Hence you may see different sizes between Master and Replica Databases, even where the active fragments are the same.

      Further Reading

      Introduction

      If your MarkLogic Server has it's logging level set to "Debug", it's common to see a chain of 'Detecting' and 'Detected' messages that look like this in your ErrorLogs:

      2015-01-27 11:11:04.407 Debug: Detected indexes for database Documents: ss, fp, fcs, fds, few, fep, sln
      2015-01-27 11:11:04.407 Debug: Detecting compatibility for database Documents
      2015-01-27 11:11:04.407 Debug: Detected compatibility for database Documents

      This message will appear immediately after forests are unmounted and subsequently remounted by MarkLogic Server.

      What would cause the forests to be unmounted and remounted

      • Heavy network activity leading to a cluster (XDQP) "Heartbeat" timeout
      • Changes made to forest configuration or indexes
      • Any incident that may cause a "Hung" message

      What are "Hung" messages?

      Whenever you see a "Hung" message it's very often indicative of a loss of connection to the IO subsystem (especially the case when forests are mounted on network attached storage rather than local disk). Hung messages are explained in a little more detail in this Knowledgebase article:
      https://help.marklogic.com/Knowledgebase/Article/View/35/0/hung-messages-in-the-errorlog

      What do the "Detected" messages mean and what can I do about them?

      Whenever you see a group of "Detecting" messages:

      2015-01-14 13:06:26.016 Debug: Detecting indexes for database XYZ

      There was an event where MarkLogic chose to (or was required to) attempt to unmount and remount forests (and the event may also be evident in your ErrorLogs).

      The detecting index message will occur soon after a remount, indicating that MarkLogic Server is examining forest data to check whether any reindexing work is required for all databases available to the node which have Forests attached:

      2015-01-14 13:06:26.687 Debug: Detected indexes for database XYZ: ss, wp, fp, fcs, fds, ewp, evp, few, fep

      The line immediately below indicates that the scan has been completed and the database has been identified as having been configured with a number of indexes. For the line above, these are:

      ss
      stemmed searches
      wp
      word positions
      fp
      fast phrase searches
      fcs
      fast case sensitive searches
      fds
      fast diacritic sensitive searches
      ewp
      element word positions
      evp
      element value positions
      few
      fast element word searches
      fep
      fast element phrase searches

      From this list, we are able to determine which indexes were detected.  These messages will occur after every remount if you have index detection set to automatic in the database configuration.

      Every time the forest is remounted, in addition to a recovery process (where the Journals are scanned to ensure that all transactions logged were safely committed to on-disk stands), there are a number of other tests the server will do. These are configured with three options at database level:

      • format compatibility
      • index detection
      • expunge locks

      By default, these three settings are configured with the "automatic" setting (in MarkLogic 7), so if you have logging set to "Debug" level, you'll know that these options are being worked through on remount:

      2015-01-14 13:06:26.016 Debug: Detecting indexes for database XYZ (represents the task for "automatic" index detection where the reindexer checks for configuration changes)
      2015-01-14 13:06:26.687 Debug: Detecting compatibility for database XYZ (represents the task for "automatic" format compatibility where the on-disk stand format is detected)

      These default values may change in accross releases of MarkLogic Server. In MarkLogic 8, expunge locks is set to none but the other two are still set to automatic.

      Can these values be changed safely and what happens if I change these?

      Unmounting / remounting times can be made much shorter by configuring these settings away from automatic but there are some caveats involved; if you need to upgrade to a future release of the product, it's likely that the on-disk stand format may change (it's still 5.0 even when MarkLogic 8 is released) and so setting format compatibility to 5.0 should cause the "Detecting compatibility" messages to disappear and speed up remount times.

      The same is true for disabling index detection but it's important to note that changing index settings on the database will no longer cause the reindexer to perform any checks on remount; in this case you would need to enable this for changes to database index settings to be reindexed.

      Summary

      This article will provide steps to debug applications using the Alerting API that are not triggering an alert.

      Details

      1) Check that all required components are present in the database where alerting is setup: config, actions, rules.   Run the attached script 'getalertconfigs.xqy' through the Query Console and review the output.  

      2) As documented in our Search Developer's Guide, Test the alert manually with alert:invoke-matching-actions(). 

      Example:

      alert:invoke-matching-actions("my-alert-config-uri", 
            <doc>hello world</doc>, <options/>)

      3) Use the rule's query to test against the database to check that the expected documents are returned by the query.

      Take the query text from the rule and run it through Query Console using a cts:search() on the database.  This will confirm whether the expected documents are a positive match.  If the documents are returned and no alert is triggered, then further debugging will be needed on the configuration or the query may need to be modified.

      Introduction 

      Division operations involving integer or long datatypes may generate XDMP-DECOVRFLW in MarkLogic 7. This is the expected behavior but it may not be obvious upon initial inspection.  

      For example, similar queries with similar but different input values executed in Query Console on Linux/Mac machine running MarkLogic 7 gives the following results

      1. This query returns correct results

      let $estimate := xs:unsignedLong("220")

      let $total := xs:unsignedLong("1600")

      return $estimate div $total * 100

      ==> 13.75

      2. This query returns the XDMP-DECOVRFLOW Error

       

      let $estimate := xs:unsignedLong("227")

      let $total := xs:unsignedLong("1661")

      return $estimate div $total * 100

      ==> ERROR : XDMP-DECOVRFLW: (err:FOAR0002)

      Details

      The following defines relevant behaviors in MarkLogic 7 and previous releases.

      • In MarkLogic 7, if all the operands involved in div operations are integer, long or integer sub-types in XML, then the resulting value of the div operation are stored as xs:decimal.
      • In versions previous to MarkLogic 7, if an xs:decimal value is large and occupies all digits then it was implicitly cast into an xs:double for further operations - i.e. beginning with MarkLogic, implict casting no longer occurs in this situation .
      • xs:decimal can accomodate 18 digits as a datatype.
      • In MarkLogic 7 on Linux & Mac, xs:decimal can occupy all digits depending upon actual value ( 227 div 1661 = 0.1366646598434677905 ), all 18 digits occupied in xs:decimal
      • MarkLogic 7 on Windows does not perform division with full decimal precision ( 227 div 1661 produces 0.136664659843468 ); as a result, not all 18 digits occupied in xs:decimal
      • MarkLogic 7 will generates Overflow Exception : FOAR0002, when an operation is performed on an xs:decimal that is already at full decimal precision

      In the example above, multiplying the result with 100 gives an error in Linux/Mac, while its OK on Windows.

      Recommendations:

      We recommend xs:double be used for all division related operations in order to explicitly cast resulting value to larger data-type.

      For example: These will return results

      xs:double($estimate) div $total * 100

      $estimate div $total * xs:double(100)

      .

       

       

       

      Context:

      There are options 'maintain last modified' and 'maintain directory last modified' on the Admin UI for a database, which when turned on add properties to every document inserted in the database.  There may be a need to remove all the property fragments of all the documents in the database when the properties no longer need to be retained.

      Problem:

      Turning these options off for a database ensure that properties will not be created for new documents. However, existing document properties will not be removed by turning these settings off.

      Solution:

      To delete existing document properties, the following query can be used:

       

      xdmp:node-delete(xdmp:document-properties(“your-document-uri”))

       

      Please make sure that 'maintain last modified' and 'maintain directory last modified' options are turned off for the database, so that the property fragment does not get recreated for the document.

       

       

      Summary

      Terraform from HashiCorp is a deployment tool that many organizations use to manage their infrastructure as code. It is platform agnostic, allowing for the deployment and configuration of on-site physical infrastructure, as well as cloud infrastructure such as AWS, Azure, VSphere and more.

      Terraform uses the Hashicorp Configuration Language (HCL) to allow for concise descriptions of infrastructure. HCL is JSON compatible language, and was designed to be both human and machine friendly.

      This powerful tool can be used to deploy a MarkLogic Cluster to AWS using the MarkLogic CloudFormation Template. The MarkLogic CloudFormation Template is the preferred method recommended by MarkLogic for building out MarkLogic clusters within AWS.

      Setting Up Terraform

      For the purpose of this example, I will assume that you have already installed Terraform, the AWS CLI and you have configured the credentials. You will also need to have a working directory that has been initialized using terraform init.

      Terraform Providers

      Terraform uses Providers to provide access to different resources. The Provider is responsible for understanding API interactions and exposing resources. The AWS Provider is used to provide access to AWS resources.

      Terraform Resources

      Resources are the most important part of the Terraform language. Resource blocks describe one or more infrastructure objects, like compute instances and virtual networks.

      The aws_cloudformation_stack resource, allows Terraform to create a stack from a CloudFormation template.

      Choosing a Template

      MarkLogic provides two templates for creating a managed cluster in AWS.

      • MarkLogic cluster in new VPC
      • MarkLogic cluster in an existing VPC
      I've chosen to deploy my cluster to an VPC. When deploying to an existing VPC, you will need to gather the VPC ID, as well as the Subnet IDs for the public and private subnets.

      Defining Variables

      The MarkLogic CF Template takes a number of input variables, including the region, availability zones, instance types, EC2 keys, encryption keys, licenses and more. We have to define our variables so they can be used as part of the resource.

      Variables in HCL can be declared in a separate file, which allows for deployment flexibility. For instance, you can create a Development resource and a Production resource, but using different variable files.

      Here is a snippet from our variables file:

      variable "cloudform_resource_name" {
      type = string
      default = "Dev-Cluster-CF"
      }
      variable "stack_name" {
      type = string
      default = "Dev-Cluster"
      }
      variable "ml_version" {
      type = string
      default = "10.0-4"
      }
      variable "availability_zone_names" {
      type = list(string)
      default = ["us-east-1a","us-east-1b","us-east-1c"]
      }
      ...

      In the snippet above, you'll notice that we've defined the availability_zone_names as a list. The MarkLogic CloudFormation template won't take a list as an input, so later we will join the list items into a string for the template to use.

      This also applies to any of the other lists defined in the variable files.

      Using the CloudFormation Resource

      So now we need to define the resource in HCL, that will allow us to deploy a CloudFormation template to create a new stack.

      The first thing we need to do, is tell Terraform which provider we will be using, defining some default options:

          provider "aws" {
          profile = "default"
          #access_key = var.access_key
          secret_key = var.secret_key
          region = var.aws_region
          }

      Next, we need to define the `aws_cloudformation_stack` configuration options, setting the variables that will be passed in when the stack is created:

          resource "aws_cloudformation_stack" "marklogic" {
          name = var.cloudform_resource_name
          capabilities = ["CAPABILITY_IAM"]
      
      
          parameters = {
          IAMRole = var.iam_role
          AdminUser = var.ml_admin_user
          AdminPass = var.ml_admin_password
          Licensee = "My User - Development"
          LicenseKey = "B581-REST-OF-LICENSE-KEY"
          VolumeSize = var.volume_size
          VolumeType = var.volume_type
          VolumeEncryption = var.volume_encryption
          VolumeEncryptionKey = var.volume_encryption_key
          InstanceType = var.instance_type
          SpotPrice = var.spot_price
          KeyName = var.secret_key
          AZ = join(",","${var.avail_zone}")
          LogSNS = var.log_sns
          NumberOfZones = var.number_of_zones
          NodesPerZone = var.nodes_per_zone
          VPC = var.vpc_id
          PublicSubnets = join(",","${var.public_subnets}")
          PrivateSubnets = join(",","${var.private_subnets}")
          }
          template_url = "${var.template_base_url}${var.ml_version}/${var.template_file_name}"
          }

      Deploying the Cluster

      Now that we have defined our variables and our resources, it's time for the actual deployment.

      $> terraform apply

      This will show us the work that Terraform is going to attempt to perform, along with the settings that have been defined so far.

      Once we confirm that things look correct, we can go ahead and apply the resource.

      Now we can check the AWS Console to see our stack

      And we can also use the ELB to login to the Admin UI

      Wrapping Up

      We have now deployed a 3 node cluster to an existing VPC using Terraform. The cluster is now ready to have our Data Hub, or other application installed.

      Deploying MarkLogic in AWS with Ansible

      Summary

      Ansible, owned by Red Hat, is an open source provisioning, configuration and application deployment tool that many organizations use to manage their infrastructure as code. Unlike options such as Chef and Puppet, it is agentless, utilizing SSH to communicate between servers. Ansible also does not need a central host for orchestration, it can run from nearly any server, desktop or laptop. It supports many different platforms and services allowing for the deployment and configuration of on-site physical infrastructure, as well as cloud and virtual infrastructure such as AWS, Azure, VSphere, and more.

      Ansible uses YAML as its configuration management language, making it easier to read than other formats. Ansible also uses Jinja2 for templating to enable dynamic expressions and access to variables.

      Ansible is a flexible tool can be used to deploy a MarkLogic Cluster to AWS using the MarkLogic CloudFormation Template. The MarkLogic CloudFormation Template is the preferred method recommended by MarkLogic for building out MarkLogic clusters within AWS.

      Setting Up Ansible

      For the purpose of this example, I will assume that you have already installed Ansible, the AWS CLI, and the necessary python packages needed for Ansible to talk to AWS. If you need some help getting started, Free Code Camp has a good tutorial on setting up Ansible with AWS.

      Inventory Files

      Ansible uses Inventory files to help determine which servers to perform work on. They can also be used to customize settings to indiviual servers or groups of servers. For our example, we have setup our local system with all the prerequisites, so we need to tell Ansible how to treat the local connections. For this demonstration, here is my inventory, which I've named hosts

      [local]
      localhost              ansible_connection=local

      Ansible Modules

      Ansible modules are discreet units of code that are executed on a target. The target can be the local system, or a remote node. The modules can be executed from the command line, as an ad-hoc command, or as part of a playbook.

      Ansible Playbooks

      Playbooks are Ansible's configuration, deployment and orchestration language. Playbooks are how the power of Ansible, and its modules is extended from basic configuration, or manangment, all the way to complex, multi-tier infrastructure deployments.

      Chosing a Template

      MarkLogic provides two templates for creating a managed cluster in AWS.

      1. MarkLogic cluster in new VPC
      2. MarkLogic cluster in an existing VPC

      I've chosen to deploy my cluster to an VPC. When deploying to an existing VPC, you will need to gather the VPC ID, as well as the Subnet IDs for the public and private subnets.

      Defining Variables

      The MarkLogic CF Template takes a number of input variables, including the region, availability zones, instance types, EC2 keys, encryption keys, licenses and more. We have to define our variables so they can be used as part of the resource.

      Variables in Ansible can be declared in a separate file, which allows for deployment flexibility.

      Here is a snippet from our variables file:

      # vars file for marklogic template and version
      ml_version: '10.0-latest'
      template_file_name: 'mlcluster.template'
      template_base_url: 'https://marklogic-template-releases.s3.amazonaws.com/'

       

      # CF Template Deployment Variables
      aws_region: 'us-east-1'
      stack_name: 'Dev-Cluster-An3'
      IAMRole: 'MarkLogic'
      AdminUser: 'admin'
      ...

      Using the CloudFormation Module

      So now we need to create our playbook, and choose the module that will allow us to deploy a CloudFormation template to create a new stack. The cloudformation module allows us to create a CloudFormation stack.

      Next, we need to define the cloudformation configuration options, setting the variables that will be passed in when the stack is created.

      # Use a template from a URL
      - name: Ansible Test
        hosts: local

       

        vars_files:
          - ml-cluster-vars.yml

       

        tasks:
          - cloudformation:
              stack_name: "{{ stack_name }}"
              state: "present"
              region: "{{ aws_region }}"
              capabilities: "CAPABILITY_IAM"
              disable_rollback: true
              template_url: "{{ template_base_url+ml_version+'/'+ template_file_name }}"
            args:
              template_parameters:
                IAMRole: "{{ IAMRole }}"
                AdminUser: "{{ AdminUser }}"
                AdminPass: "{{ AdminPass }}"
                Licensee: "{{ Licensee }}"
                LicenseKey: "{{ LicenseKey }}"
                KeyName: "{{ KeyName }}"
                VolumeSize: "{{ VolumeSize }}"
                VolumeType: "{{ VolumeType }}"
                VolumeEncryption: "{{ VolumeEncryption }}"
                VolumeEncryptionKey: "{{ VolumeEncryptionKey }}"
                InstanceType: "{{ InstanceType }}"
                SpotPrice: "{{ SpotPrice }}"
                AZ: "{{ AZ | join(', ') }}"
                LogSNS: "{{ LogSNS }}"
                NumberOfZones: "{{ NumberOfZones }}"
                NodesPerZone: "{{ NodesPerZone }}"
                VPC: "{{ VPC }}"
                PrivateSubnets: "{{ PrivateSubnets | join(', ') }}"
                PublicSubnets: "{{ PublicSubnets | join(', ') }}"
              tags:
                Stack: "ansible-test"

      Deploying the cluster

      Now that we have defined our variables created our playbook, it's time for the actual deployment.

      ansible-playbook -i hosts ml-cluster-playbook.yml -vvv

      The -i option allows us to reference the inventory file we created. As the playbook runs, it will output as it starts and finishes tasks in the playbook.

      PLAY [Ansible Test] ************************************************************************************************************

       

      TASK [Gathering Facts] *********************************************************************************************************
      ok: [localhost]

       

      TASK [cloudformation] **********************************************************************************************************
      changed: [localhost]

      When the playbook finishes running, it will print out a recap which shows the overall results of the play.

      PLAY RECAP *********************************************************************************************************************
      localhost                  : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

      This recap tells us that 2 tasks ran successfully, resulted in 1 change, and no failed tasks, which is our sign that things worked.

      If we want to see more information as the playbook runs we can add one of the verbose flags (-vor -vvv) to provide more information about the parameters the script is running, and the results.

      Now we can check the AWS Console to see our stack:

      And we can also use the ELB to login to the Admin UI

      Wrapping Up

      We have now deployed a 3 node cluster to an existing VPC using Ansible. The cluster is now ready to have our Data Hub, or other application installed.  We can now use the git module to get our application code, and deploy our code using ml-gradle.

      Introduction

      This KB article lists some available tools for continuous integration and automatically deploying the MarkLogic Server

      Deployment Options

      ml-gradle is a gradle plugin that can be used for configuration and application deployments. Application deployments are maintained as projects, which can deployed to any environment - Development, QA, Production, etc.

      The MarkLogic Configuration Management API is a RESTful API that allows retrieving, generating, and applying configurations for MarkLogic clusters, databases, and application servers.

      The MarkLogic The Management API is a REST-based API that allows you to administer MarkLogic Server and access MarkLogic Server instrumentation with no provisioning or set-up. You can use the API to perform administrative tasks such as initializing or extending a cluster; creating databases, forests, and App Servers; and managing tiered storage partitions. The API also provides the ability to easily capture detailed information on MarkLogic Server objects and processes, such as hosts, databases, forests, App Servers, groups, transactions, and requests from a wide variety of tools.

      The MarkLogic Admin APIs provide a flexible toolkit for creating new and managing existing configurations of MarkLogic Server.

      Integration Testing

      MarkLogic Unit Test is a testing component that was originally part of the Roxy project. This component enables you to build unit tests that are written in and can test against both XQuery and Server-side JavaScript.

      Implementation Specific Tools

      CloudFormation Templates

      MarkLogic CloudFormation templates enable you to launch clusters with an Elastic Load Balancer, Elastic Block Storage, Auto Scaling Group, and so on. Your cluster can be in either one Availability Zone or three Availability Zones. Multiple nodes can be placed within each Availability Zone. You can choose whether to deploy to an existing VPC, or a new VPC. The templates can also be used with tools like Terraform and Ansible

      Python

      The MarkLogic Python API aims to provide complete coverage of the capabilities in the MarkLogic REST API in idiomatic Python.

      Jenkins

      Jenkins is often used with MarkLogic Server for building deployable artifacts, staging build artifacts, running automated tests, and deploying said artifacts. Jenkins has great REST endpoints that make it easy to get / put job configurations, and enable / disable jobs from scripts.

      Jenkins provides a driver to the continuous integration / continuous delivery process that can integrate with other tools. In combination with ml-gradle, it can be used to run deploy module/unit test on code check-in.

      One pipeline example used with Jenkins is to:

      1. Pull the code from Git
      2. Deploy to DEV with ml-gradle
      3. Run MarkLogic Unit Test
      4. Email a report of the success/failure
      5. Kick off job to deploy to another environment

      Also noted that the most important best practice here would be to make sure Jenkins runs primarily off of a host other than a MarkLogic host.

      SUMMARY

      This article will help MarkLogic Administrators to monitor the health of their MarkLogic cluster. By studying the attached scripts, you will learn how to find out which hosts are down and which forests have failed over, enabling you to take the necessary recovery actions.

      Initial Setup

      On a separate Linux host (not a member of the cluster), download the file attachments from this article, making sure that they all reside within the same directory.

      Here is a general description of each file:

      cluster-name.conf - Example configuration file used by script. Configures information for monitoring one ML cluster. 

      ml-ck-for-life.sh - A very simple, low-load check that all the nodes of a cluster are up and running.

      ml-ck-for-health.sh - A more detailed check for essential cluster functionality with alerting (paging and/or emails to DBAs) if warranted. This script relies on at least one external XQuery file (mon-report-failed-over-forests.xqy) and makes use of the REST MGMT API as well as REST XQuery requests.

      mon-report-failed-over-forests.xqy - External XQuery file used by ml-ck-for-health.sh

       

      Preparing the CONF File for Use on Your Cluster

      Before running the scripts, the cluster-name.conf needs to be customized for your specific cluster. Start by changing the file name to match the name of your cluster, e.g.,

      $ mv cluster-name.conf some-other-name.conf

      Where "some-other-name" is the actual name of the cluster, or of the application that is hosted on that cluster.

      Next, you will need to customize some of the internal variables inside the CONF file itself. Here is the contents of the cluster-name.conf file, as downloaded:

      CLUSTER_NAME="CLUSTER-NAME"
      CLUSTER_NODES=( node1.my-company.com node2.my-company.com node3.my-company.com )
      # MarkLogic Credentials for the REST Management port - 8002
      USER_PW_MGMT=rest-manager-user:re-manager-password
      # MarkLogic Credentials for the XQuery eval port - 8000
      USER_PW_XQ=user-name:user-password
      UNIX_USER=unix-user-name
      PAGE_ADDRESSES=ml.alert.page@my-company.com
      MAIL_ADDRESSES=ml.alert.mail@my-company.com

      ---------  end of listing ---------

      For CLUSTER_NAME, provide the cluster-name listed in the cluster's /var/log/MarkLogic/clusters.xml file.

      For CLUSTER_NODES, write in the host-names for each node in your cluster.

      For USER_PW_MGMT, provide the user-name and password for the REST MANAGEMENT user, the format is name:password.

      For USER_PW_XQ, provide the user-name and password for the user who will execute the XQuery scripts, the format is name:password.

      The UNIX_USER is a local Unix username with the correct rwx access rights for this directory.

      The PAGE_ADDRESSES & MAIL_ADDRESSES are alert email addresses who will be notified whenever there is a failover event.

      Periodicity

      The script ml-ck-for-health.sh was created with the idea it would be run repeatedly at a certain interval to keep tabs on system health. For example, it can be configured to be invoked with a cron job. A frequency of 5 to 120 minutes is a good candidate range. Ten minutes is a good time if you would like to be woken up (on average) within 5 minutes of a failover event.

      Setting up SSH Passwordless Login

      In monitoring script ml-ck-for-health.sh, section (6) FOREST STATUS CHANGE, requires ssh access to the cluster hosts. That is because this section greps through MarkLogic server ErrorLogs. To enable this part of the script to run without prompting the user, "ssh passwordless login" should be setup between the monitoring host and all the cluster hosts.There are many examples of how to do this on the internet, for example: http://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/ Alternatively, this monitoring section can be commented out.

      Also regarding section (6), the “grep” command is setup up to grep the latest 10 minutes from the ErrorLog. If this script is configured to be run less often then every 10 minutes, the “grep” command line should be adapted to cover the desired period between script runs.

      Example Usage

      You are now ready to execute the failover monitoring scripts! Here is how you would execute them:


      $ ./ml-ck-for-health.sh some-other-name.conf MY-CLUSTER-NAME

      $ ./ml-ck-for-life.sh some-other-name.conf

      [where "some-other-name" and MY-CLUSTER-NAME are your actual CONF and cluster-name, as described above]

      Monitoring Multiple Clusters

      So, given a monitoring machine with a directory of cluster configuration files in the style of cluster-name.conf, those configuration files could be iterated through to monitor a suite of clusters from a single monitoring machine. It should be fairly easy to build a custom shell script to iterate through various cluster CONF files.

      Final thought and Limitations

      Please be aware that the ml-ck-for-health.sh script is only partially implemented. In particular, the Replication Lag and Replication Failure sections are left as exercises for the user.

      This script is being presented as a backup, lowest common denominator monitoring solution. For a more complete solution, you should explore other options, such as Splunk or Nagios.

       

       

       

      Introduction

      According to Wikipedia, DevOps is a set of practices that combines software development (Dev) and IT operations (Ops) with the goal of shortening the Systems Development Lifecycle, and providing continuous delivery with high software quality. This KB will provide some guidance for system deployment and configuration, which can be integrated into an organizations DevOps processes.

      For more information on using MarkLogic as part of a Continuous Integration/Continuous Delivery process, see the KB  Deployment and Continuous Integration Tools.

      Deploying a Cluster

      Deploying a MarkLogic cluster that will act as the target environment for the application code being developed is one piece of the DevOps puzzle. The approach that is chosen will depend on many things, including the tooling already in use by an organization, as well as the infrastructure that will be used for the deployment.  We will cover two of the most common environments, On-Premise and Cloud.

      On-Premise Deployments

      On-Premise deployments, which can include using bare metal servers, or Virtual Machine infrastructure (such as VMware), are one common environment. You can deploy a cluster to an on-premise environment using tools such as shell scripts, or Ansible. In the Scripting Administrative Tasks Guide, there is a section on Scripting Cluster Management, which provides some examples of how a cluster build can be automated.

      Once the cluster is deployed, some of the specific configuration tasks that may need to be performed on the cluster can be done using the Management API, and the Configuration Management API.

      Cloud Deployments

      Cloud deployments utilize flexible compute resources provided by vendors such as Amazon Web Services (AWS), or Microsoft Azure. The Management API, and the Configuration Management API can

      For AWS, MarkLogic provides an example CloudFormation template, that can be used to deploy a cluster to Amazon's AWS EC2 Environment. Tools like the AWS Command Line Interface (CLI), Terraform or Ansible can be used to extend the MarkLogic CloudFormation template, and automate the process of creating a cluster in the AWS EC2 environment.  MarkLogic has provided an example , which can be utilized to . The template can be used to deploy a cluster using the AWS CLI. The template can also be used to Deploy a Cluster Using Terraform, or it can be used to Deploy a Cluster Using Ansible.

      For Azure, MarkLogic has provided Solution Templates for Azure which can be extended for automated deployments using the Azure CLI, Terraform or Ansible.

      As with the on-premise deployments, configuration tasks can be performed on the cluster using the Management API, and the Configuration Management API.

      Summary

      This is just a brief introduction into some aspects of DevOps processes for deploying and configuring a MarkLogic Cluster.

      Summary:

      After adding or removing a forest and correspond replica forest in a database, we have seen instances where the Rebalancer does not properly distribute the documents amongst existing and newly added forests.

      For this particular instance, XDMP-HASHLOCKINGRETRY debug level error message reported repeatedly in the error logs.  The messages would look something like: 

      2016-02-11 18:22:54.044 Debug: Retrying HTTPRequestTask::handleXDBCRequest 83 because XDMP-HASHLOCKINGRETRY: Retry hash locking. Forests config hash does not match.

      2016-02-11 18:22:54.198 Debug: Retrying ForestRebalancerTask::run P_initial_p2_01 50 because XDMP-HASHLOCKINGRETRY: Retry hash locking. Forests config hash does not match.

      Diagnosing

      Gather statistics about the rebalancer to see the number of documents being scheduled. If you run attached script “rebalancer-preview.xqy” in the query console of your MarkLogic Server cluster, it will produce rebalancer statistics in tabular format.

      • Note that you must first change the database name (YourDatabaseNameOnWhichNewForestsHaveBeenAdded) on the 3rd line of the XQuery script “rebalancer-preview.xqy”:

      declare variable $DATABASE as xs:string := xdmp:get-request-field("db", "YourDatabaseNameOnWhichNewForestsHaveBeenAdded");

      If experiencing this issue, the newly added forests will show zero in the “Total to be moved” column in the generated html page.

      Resolving

      Perform a cluster wide restart in order to get past this issue.  The restart is required to reload all of the configuration files across the cluster.  The rebalancer will also check to see if additional rebalancing work needs to occur. The rebalancer should work as expected now and the  XDMP-HASHLOCKINGRETRY messages should no longer appear in the logs. If you run the rebalancer-preview.xqy script again, the statistics should now show the the number of documents being scheduled to be moved.

      You can also validate the rebalancer status from the Database Status page in the Admin UI.

      The XDMP-HASHLOCKINGRETRY rebalancer issue has fixed in the latest MarkLogic Server releases.  However, the rebalancer-preview.xqy script can be used to help diagnose other perceived issues with the Rebalancer.

       

      Search fundamentals

       

      Difference between cts:contains and fn:contains

       1) fn:contains is a substring match, where as cts:contains performs query matching

       2) cts:contains therefore can utilize general queries and stemming, where fn:contains does not

       

      For example:-

       

      Example.xml

      <test>daily running makes you fit</test>

       

      •         fn:contains(fn:doc(“Example.xml”),”ning”)

                True

      •          cts:contains(fn:doc(“Example.xml”),”ning”)

               False

       

         

      •         fn:contains(fn:doc(“Example.xml”),”ran”)

                 False

      •         cts:contains(fn:doc(“Example.xml”),”ran”)

                  True

       

       

      Note:-

      The cts:contains examples are checking the document against cts:word-querys.  Stemming reduces words down to their root, allowing for smaller term lists.

       

      1) Words from different languages are treated differently, and will not stem to the same root word entry from another language.

      2) Note: Nouns will not stem to verbs and vice versa. For example, the word “runner” will not stem to “run”.

      References

      Introduction

      MarkLogic Server provides a variety of  disaster recovery (DR) facilities including full backup, incremental backup, and journal archiving that when combined with other ML features can create a complete disaster recovery strategy. This paper shows some examples of how these features can be combined. It is not comprehensive nor does it reflect features offered only in the latest releases.

      Details

      This article will cover three perspectives. First, a quick overview of the metrics used by businesses to measure the quality of their Disaster Recovery strategies will be covered. Next, an overview of how to combine the features that MarkLogic offers in various categories will be given.

      More?: High Availability and Disaster Recovery features ,  High Availability & Disaster Recovery datasheetScalability, Availability, and Failover Guide 

      Disaster Recovery Criteria

      In order to configure MarkLogic Server to perform well in Disaster Recovery situations, we should first define what parameters we will use to measure each possible approach. For most situations, these four measures are used: 

      Long Term Retention Policy (LTR): Long Term Retention Policy can be driven by any number of business, regulatory and other criteria. It is included here because MarkLogic's backup files are often a key part of an LTR strategy. 

      Recovery Point Objective (RPO)The requirement for how up-to-date the database has to be post-recovery with respect to its state immediately before the incident that required recover.

      Recovery Time Objective (RTO)The requirement for the time elapsed between the incident and the recovery to the RPO.

      CostThe storage cost, the computational resource cost and  the operations cost of the overall deployment strategy.

      Flexible Replication Features

      Flexible replication can be used to support LTR objectives but is generally not useful for Disaster Recovery

      More? Flexible Replication Guide

      Platform Support Features

      Flash backup provides a way to leverage backup features of your deployment platform while maintaining transaction integrity. Platform specific solutions can often achieve RPO and RTO targets that would be impossible through other means.

      More? Flash Backup

      High Availability Features

      Forest replication provides recovery from host failures.

      More? Scalability, Availability, and Failover Guide

      Disaster Recovery Features

      Database Replication

      Database Replication is the process of maintaining copies of forests on databases in multiple MarkLogic Server clusters.

      More? Understanding Database Replication

      Backups

      Of all your backup options, full backups restore the quickest, but take the most time to backup and possibly the most storage space. Each full backup is a backup set in that it contains everything you need to restore to the point of the backup.

      Full backups with journal archiving allow restores to a point after the backup, but the journal archive grows in an unbounded way with the number of transactions, and replaying the journals to get to your recovery point takes time proportional to the number of transactions in the journal archive, so over time, this becomes less efficient.

      With full + incremental backups, a backup set is a full backup, plus the incremental backups taken after that full backup. Incremental backups are quick to backup, but take longer to restore, and over time the backup set gets larger and larger, so it may end up consuming more backup space than a full backup alone (depending on your backup retention policy).

      Full + incremental backups with journal archiving have the same characteristics as incremental backups, except that you can roll forward from the most recent incremental. With this strategy, the journal archive doesn't grow in an unbounded way because the archive is purged when you take the next incremental backup. Note that if your RPO is between incremental backups, you must also enable a merge timestamp by setting the merge timestamp to a negative value (see below).

      More?: Administrator’s Guide to Backing Up and Restoring a Database  How does "point-in-time" recovery work with Journal Archiving? 

      Forest Merge Configurations

      Forest merges recover the disk space occupied by deleted documents. A negative merge timestamp delays that permanent deletion. If we want incremental backups to contain all the fragments that were deleted since the last incremental backup then we want to set the delay to a period greater than the incremental backup period. This requires more disk space for the incremental backups and also requires additional space in the live database, but provides the most flexibility.

      Setting retain-until-backup on a given database (thru the Admin UI or thru an API call) has a similar effect by telling the server to keep the deleted fragments until a full backup or an incremental backup completes. Many clients choose to use both the negative merge timestamp and retain until backup options together.

      More?: admin:database-set-merge-timestamp  admin:database-set-retain-until-backup

      Other Features

      The need for a negative merge timestamp can be understood by remembering that forest merges recover the disk space occupied by deleted documents. A negative merge timestamp delays that permanent deletion. If we want incremental backups to contain all the fragments that were deleted since the last incremental backup then we want to set the delay to a period greater than the incremental backup period. This requires more disk space for the incremental backups and also requires additional space in the live database, but provides the most flexibility.

      Setting retain-until-backup on a given database (thru the Admin UI or thru an API call) has a similar effect by telling the server to  keep the deleted fragments until a full backup or an incremental backup. Many clients choose to use both the negative merge timestamp and retain until backup options together.

      More?: admin:database-set-merge-timestamp,  admin:database-set-retain-until-backup 

      Conclusion

      Planning to meet a Long Term Retention (LTR) policy, a Recovery Point Objective (RPO) and a Recovery Time Objective (RTO) and a Cost goal is a key part of developing an overall MarkLogic deployment plan. MarkLogic offers a wealth of tools that can complement each other when they are properly coordinated. As is clear from this article, the choices are many, broad, and interrelated.

      Introduction

      In the more recent versions of MarkLogic Server, there are checks in place to prevent the loading of invalid documents (such as documents with multiple root nodes).  However, documents loaded in earlier versions of MarkLogic Server can now result in duplicate URI or duplicate document errors being reported.

      Additionally, under normal operating conditions, a document/URI is saved in a single forest. If somehow the load process gets compromised, then user may see issues like duplicate URI (i.e. same URI in different forests) and duplicate documents (i.e. same document/URI in same forest).

      Resolution

      If the XDMP-DBDUPURI (duplicate URI) error is encountered, refer to our KB article "Handling XDMP-DBDUPURI errors" for procedures to resolve.

      If one doesn't see XDMP-DBDUPURI errors but running fn:doc() on a document returns multiple nodes then it could be a case of duplicate document in same forest. To check that the problem is actually duplicate documents, one can either do an xdmp:describe(fn:doc(...)) or fn:count(fn:doc((...)). If these commands return more than 1 e.g. xdmp:describe(fn:doc("/testdoc.xml")) returns (fn:doc("/testdoc.xml"), fn:doc("/testdoc.xml")) or fn:count(fn:doc("/testdoc.xml")) returns 2 then the problem is of duplicate documents in the same forest (and not duplicate URIs).

      To fix duplicate documents, the document will need to be reloaded.

      Introduction

      This article talks about effects of case sensitivity of search term on search score and thus on final order of search results for a secondary query which is using cts:boost-query and weight. The case-insensitive word term is treated as the lower case word term, so there can be no difference in the frequencies and scores of results for any-case/case-insensitive search term and lowercase search term with “case-sensitive” option or when neither "case-sensitive" nor "case-insensitive" is present. If neither "case-sensitive" nor "case-insensitive" is present, text of search term is used to determine case sensitivity.

      Understanding relevance score

      In MarkLogic Search results are returned in a relevance order. The most relevant results are first in result sequence and least relevant are last.
      More details on relevance score and its calculation are available at, https://docs.marklogic.com/guide/search-dev/relevance

      Of many ways to control this relevance score one way is to use a secondary query to boost relevance score, https://docs.marklogic.com/guide/search-dev/relevance#id_30927 . This article takes advantage of examples using secondary query to boost relevance scores and impact of text case (upper, lower or unspecifed) of search terms on relevance score on order of results returned.

      A few examples to understand this scenario

      Consider a few scenarios where below mentioned queries are trying to boost certain search results up using cts:boost-query and weight for word "washington" in returned results.

      Example 1: Search with lowercase search term and option for case not specified

      Query1:
      xquery version "1.0-ml";
      declare namespace html = "http://www.w3.org/1999/xhtml";

      for $hit in
      ( cts:search(
      fn:doc()/test,

      cts:boost-query(cts:element-word-query(xs:QName("test"),"George" ),
      cts:element-word-query(xs:QName("test"),"washington",(), 10.0) )
      )
      )

      return element hit {
      attribute score { cts:score($hit) },
      attribute fit { cts:fitness($hit) },
      attribute conf { cts:confidence($hit) },
      $hit
      }


      Results for Query1:
      <hit score="28276" fit="0.9393904" conf="0.2769644">
      <test>Washington, George... </test>
      </hit>
      ...
      ...
      <hit score="16268" fit="0.7125317" conf="0.2100787">
      <test>George washington was the first President of the United States of America...</test>
      </hit>
      ...

       

      Example 2: Search with lowercase search term and case-sensitive option

      Query2:
      xquery version "1.0-ml";
      declare namespace html = "http://www.w3.org/1999/xhtml";

      for $hit in
      ( cts:search(
      fn:doc()/test,

      cts:boost-query(cts:element-word-query(xs:QName("test"),"George" ),
      cts:element-word-query(xs:QName("test"),"washington",("case-sensitive"), 10.0) )
      )
      )

      return element hit {
      attribute score { cts:score($hit) },
      attribute fit { cts:fitness($hit) },
      attribute conf { cts:confidence($hit) },
      $hit
      }


      Results for Query2:
      <hit score="28276" fit="0.9393904" conf="0.2769644">
      <test>Washington, George... </test>
      </hit>
      ...
      ...
      <hit score="16268" fit="0.7125317" conf="0.2100787">
      <test>George washington was the first President of the United States of America...</test>
      </hit>
      ...

       

      Example 3: Search with uppercase search term and option case-insensitive, in cts:boost-query like below with rest of query similar to above queries

      Query3:

      cts:boost-query(cts:element-word-query(xs:QName("test"),"George" ),
      cts:element-word-query(xs:QName("test"),"Washington",("case-insensitive"), 10.0) )

      Results for Query3:
      <hit score="28276" fit="0.9393904" conf="0.2769644">
      <test>Washington, George... </test>
      </hit>
      ...
      ...
      <hit score="16268" fit="0.7125317" conf="0.2100787">
      <test>George washington was the first President of the United States of America...</test>
      </hit>
      ...


      Clearly above queries are producing the same scores with same fitness and confidence scores. This is because the case-insensitive word term is treated as the lower case word term, so there can therefore be no difference in the frequencies of those two terms (any-case/case-insensitive and lowercase/case-sensitive), and therefore no difference in scoring. Thus no difference in scores of results for Query3 and Query2.
      And for cases where case sensitivity is not specified, text of search term is used to determine case sensitivity. For Query3 text of search term contains no uppercase hence it treated as "case-insensitive".

       

      Now let us now take look at a query with a word with uppercase and case-sensitive option in query.

      Example 4: Search with uppercase search term and option case-sensitive, in cts:boost-query like below with rest of query similar to above queries

      Query4:

      cts:boost-query(cts:element-word-query(xs:QName("test"),"George" ),
      cts:element-word-query(xs:QName("test"),"Washington",("case-sensitive"), 10.0) )

      Results for Query4:
      <hit score="44893" fit="0.9172696" conf="0.3489831">
      <test>Washington, George was the first... </test>
      </hit>
      ...
      ...
      <hit score="256" fit="0.0692672" conf="0.0263533">
      <test>George washington was the first President of the United States of America...</test>
      </hit>
      ...

       

      As we can clearly see the scores are changed for results for Query4 and thus final order of results is also updated.


      Conclusion:

      While using a secondary query having cts:boost-query and weight, to boost certain search results up, it is important to understand the impact of case of search text on result sequence. A case-insensitive word term is treated as the lower case word term, so there can therefore be no difference in the frequencies of any-case/case-insensitive and lowercase/case-sensitive search terms, and therefore no difference in scoring. For search term with upper case alphabets in text and with “case-sensitive” option scores are boosted up as expected in comparison with a “case-insensitive search”. If neither "case-sensitive" nor "case-insensitive" is present, text of search term is used to determine case sensitivity. If text of search term contains no uppercase, it specifies "case-insensitive". If text of search term contains uppercase, it specifies "case-sensitive".

       

      Background

      MarkLogic Server includes element level security (ELS), an addition to the security model that allows you to specify security rules on specific elements within documents. Using ELS, parts of a document may be concealed from users who do not have the appropriate roles to view them. ELS can conceal the XML element (along with properties and attributes) or JSON property so that it does not appear in any searches, query plans, or indexes - unless accessed by a user with appropriate permissions.

      ELS protects XML elements or JSON properties in a document using a protected path, where the path to an element or property within the document is protected so that only roles belonging to a specific query roleset can view the contents of that element or property. You specify that an element is part of a protected path by adding the path to the Security database. You also then add the appropriate role to a query roleset, which is also added to the Security database.

      ELS uses query rolesets to determine which elements will appear in query results. If a query roleset does not exist with the associated role that has permissions on the path, the role cannot view the contents of that path.

      Notes:

      1. A user with admin privileges can access documents with protected elements by using fn:doc to retrieve documents (instead of using a query). However, to see protected elements as part of query results, even a user with admin privileges will need to have the appropriate role(s).
      2. ELS applies to both XML elements and JSON properties; so unless spelled out explicitly, 'element' refers to both XML elements and JSON properties throughout this article.

      You can read more about how to configure Element Level Security here, and can see how this all works at this Element Level Security Example.

      Node-update

      One of the commonly used document level capabilities is 'update'. Be aware, however, that document level update is too powerful to be used with ELS permissions as someone with document level update privileges could update not only a node, but also delete the whole document. Consequently, a new document-level capability - 'node-update' - has been introduced. 'node-update' offers finer control when combined with ELS through xdmp:node-replace and xdmp:node-delete functions as they can be used to update/delete only the specified nodes of a document (and not the document itself in its entirety).

      Document-level vs Element-level security

      Unlike at the document-level:

      • 'update' and 'node-update' capabilities are equivalent at the element-level. However, at the document-level, if a user only has a 'node-update' privilege to a document, you cannot delete the document. In contrast, 'update' privileges allows that user to delete the document
      • 'Read', 'insert' and 'update' are checked separately at the element level i.e.:
        • read operations - only permissions with 'read' capability are checked
        • node update operations - only permissions with 'node-update' (update) capability are checked
        • node insert operations - only permissions with  'insert' capability are checked

      Note: read, insert, update and node-update can all be used at the element-level i.e., they can be part of the protected path definition.

      Permissions:

      Document-level:

      1. update: A node can be updated by any user that has an 'update' capability at the document-level
      2. node-update:  A node can be updated by any user with a 'node-update' capability as long as they have sufficient privileges at the element-level

      Element-level:

      1. If a node is protected but no 'update/node-update' capabilities are explicitly granted to any user, that node can be updated by any user as long as they have 'update/node-update' capabilities at the document-level
      2. If any user is explicitly granted 'update/node-update' capabilities to that node at the element level, only that specific user is allowed to update/delete that node. Other users who are expected to have that capability must be explicitly granted that permission at the element level

      How does node-replace/node-delete work?

      When a node-replace/node-delete is called on a specific node:

      1. The user trying to update that node must have at least a 'node-update' (or 'update') capability to all the nodes up until (and including) the root node
      2. None of the descendant nodes of the node being replaced/deleted can be protected by a different roles. If they are protected:
        1. 'node-delete' isn’t allowed as deleting this node would also delete the descendant node which is supposed to be protected
        2. 'node-replace' can be used to update the value (text node) of the node but replacing the node itself isn’t allowed

      Note: If a caller has the 'update' capability at the document level, there is no need to do element-level permission checks since such a caller can delete or overwrite the whole document anyway.

      Takeaways:

      1. 'node-update' was introduced to offer finer control with ELS, in contrast to the document level 'update'
      2. 'update' and 'node-update' permissions behave the same at element-level, but differently at the document-level
        1. At document-level, 'update' is more powerful as it gives the user the permission to delete the entire document
        2. All permissions talk to each other at document-level. In contrast, permissions are checked independently at the element-level
          1. At the document level, an update permission allows you to read, insert to and update the document
          2. At the element level, however, read, insert and update (node-update) are checked separately
            1. For read operations, only permissions with the read capability are checked
            2. For node update operations, only permissions with the node-update capability are checked
            3. For node insert operations, only permissions with the insert capability are checked (this is true even when compartments are used).
      3. Can I use ELS without document level security (DLS)?
        1. ELS cannot be used without DLS
        2. Consider DLS the outer layer of defense, whereas ELS is the inner layer - you cannot get to the inner layer without passing through the outer layer
      4. When to use DLS vs ELS?
        1. ELS offers finer control on the nodes of a document and whether to use it or not depends on your use-case. We recommend not using ELS unless it is absolutely necessary as its usage comes with serious performance implications
        2. In contrast, DLS offers better performance and works better at scale - but is not an ideal choice when you need finer control as it doesn’t allow node-level operations 
      5. How does ELS performance scale with respect to different operations?
        1. Ingestion - depends on the number of protected paths
          1. During ingestion, the server inspects every node for ELS to do a hash lookup against the names of the last steps from all protected paths
          2. For every protected path that matches the hash, the server does a full test of the node against the path - the higher the number of protected paths, the higher the performance penalty
          3. While the hash lookup is very fast, the full test it comparatively much slower - and the corresponding performance penalty increases when there are a large number of nodes that match the last steps of the protected paths
            1. Consequently, we strongly recommend avoiding the use of wildcards at the leaf-level in protected paths
            2. For example: /foo/bar/* has a huge performance penalty compared to /foo/*/bar
        2. Updates - as with ingestion, ELS performance depends on the number of protected paths
        3. Query/Search - in contrast to ELS ingestion or update, ELS query performance depends on the number of query rolesets
          1. Because ELS query performance depends on the number of query rolesets, the concept of Protected PathSet was introduced in 9.0-4
          2. A Protected PathSet allows OR relationships between permissions on multiple protected paths that cover the same element
          3. Because query performance depends on the number of relevant query rolesets, it is highly recommended to use helper functions to obtain the query rolesets of nodes configured with element-level security

      Further Reading

      Introduction

      Some customers have reported problems when attempting to access the Configuration Manager application. In the past, this has been attributed to part of the upgrade process failing for some reason (for example: a port required by MarkLogic already being used) or in some cases it was due to a default databases being removed by the customer at some previous stage.

      XDMP-ARGTYPE Error

      If you see this error when you attempt to access the Configuration Manager:

      500 Internal Server Error XDMP-ARGTYPE XDMP-ARGTYPE: (err:XPTY0004) fn:concat( "could not initialize management plugins with scope: ", $reut:PLUGIN-SCOPE, ": ", xdmp:quote($e)) -- arg1 is not of type xs:anyAtomicType?

      Resolving the error

      Ensure you have an Extensions database configured by doing the following:

      • Log into the MarkLogic Admin interface on port 8001 - http://[your-host]:8001/
      • Under "Databases" box, ensure a database called Extensions is listed

      If it does not exist, download and run the script attached to this article (create-extensions-db.xqy).

      Summary

      Does MarkLogic provide encryption at rest?

      MarkLogic 9

      MarkLogic 9 introduces the ability to encrypt 'data at rest' - data that is on media (on disk or in the cloud), as opposed to data that is being used in a process. Encryption can be applied to newly created files, configuration files, or log files. Existing data files can be encrypted by triggering a merge or re-index of the data.

      For more information about using Encryption at Rest, see Encryption at Rest in the MarkLogic Security Guide.

      MarkLogic 8 and Earlier releases

      MarkLogic 8 does not provide support for encryption at rest for its own forests.

      Using Amazon S3 Encryption For Backups

      If you are hosting your data locally, would like to back up to S3 remotely, and your goal is that there cannot possibly exist unencrypted copies of your data outside your local environment, then you could backup locally and store the backups to S3 with AWS Client-Side encryption. MarkLogic does not support AWS Client-Side encryption, so this would need to be a solution outside MarkLogic.

      See also: MarkLogic documentation: S3 Storage.

      See also: AWS: Protecting Data Using Encryption.

      Introduction

      Here we compare XDBC servers and the Enhanced HTTP server in MarkLogic 8.

      Details

      XDBC servers are still fully supported in MarkLogic Server version 8. You can upgrade existing XDBC servers without making any changes and you can create new XDBC servers as you did in previous releases.

      The Enhanced HTTP Server is an additional feature on HTTP servers which is protocol and binary transport compatible with XCC clients, as long as you use the xcc.httpcompliant=true system property.

      The XCC protocol is actually just HTTP, but the details of how to handle body, headers, responses, etc., are "built in" to the XCC client libraries and the XDBC server. The HTTP server in MarkLogic 8 now shares the same low-level code and can dispatch XCC-like requests.

      Introduction

      This article talks about best practices for use of external proxies vs using rewriter rules in the Enhanced HTTP server.

      Details

      Whether to use external proxies versus using rewriter rules in the Enhanced HTTP application server is an application design tradeoff not dissimilar to using a single HTTP application server and a XQuery rewriter or endpoint that can dynamically dispatch to different databases and modules (using eval-in).  The Enhanced HTTP server does this type of dispatching much more efficiently, but the concept is similar, with the same pros and cons.

      It is mostly an application and business management issue—by sharing the same port you share the same server configuration (authentication, server settings) and the "outside world" only sees one port, so configuring port-based security on firewalls, routers, or load balancers is more difficult.

      Summary

      A forest reindex timeout error may occur when there are transactions holding update locks on documents for an extended period of time. A reindexer process is started as a result of a database index change or a major MarkLogic Server upgrade.  The reindexer process will not complete until after update locks are released.

      Example error text seen in the MarkLogic Server ErrorLog.txt file:

      XDMP-FORESTERR: Error in reindex of forest Documents: SVC-EXTIME: Time limit exceeded

      Detail

      Long running transactions can occur if MarkLogic Server is participating in a distributed transaction environment. In this case transactions are managed through a Resource Manager. Each transaction is executed in a two phase commit. In the first phase, the transaction will be prepared for a commit or a rollback. The actual commit or rollback will occur in the second phase. More details about XA transactions can be found in the Applicactions Developer Guide - Understanding Transactions in MarkLogic Server

      In a situation where the Resource Manager get's disconnected between the two phases, all transactions may be left in a "prepare" state within MarkLogic Server. The Resource Manager maintains transaction information and will clean up transactions left in "prepare" state after a successful reconnect. In the rare case where this doesn't happen, all transactions left in "prepare" state will stay in the system until they are cleaned up manually. The method to manually intervene is described in the XCC Developers Guide - Heuristically Completing a Stalled Transaction.

      In order for a XA transaction to take place, it needs to prepare the execution for the commit. If updates are being made to pre-existing documents, update locks are held against the URIs for those documents. When reindexing is occuring during this process, the reindexer will wait for these locks to be released before it can successfully reindex the new documents.   Because the reindexer is unable to complete due to these pending XA transactions, the hosts in the cluster are unable to completely finish the reindexing task and will eventually throw a timeout error.

      Mitigation

      To avoid these kind of reindexer timeouts, it is recommended that the database is checked for outstanding XA transactions in "prepare" state before starting a reindexing process. There are two ways to verify if the database has outstanding transactions in "prepare" state:

      • In the Admin UI, navigate  to each forest of the database and review the status page; or
      • Run the following XQuery code (in Query Console):

        xquery version "1.0-ml"; 
        declare namespace fo = "http://marklogic.com/xdmp/status/forest";   

        for $f in xdmp:database-forests(xdmp:database()) 
        return    
          xdmp:forest-status($f)//fo:transaction-coordinator[fo:decision-state = 'prepare']

      In the case where there are transactions in the "prepare" state, a roll-back can be executed:

      • In the Admin UI, click on the "rollback" link for each transaction; or
      • Run the following XQuery code (in Query Console):

        xquery version "1.0-ml"; 
        declare namespace fo = "http://marklogic.com/xdmp/status/forest";

        for $f in xdmp:database-forests(xdmp:database()) 
        return    
          for $id in xdmp:forest-status($f)//fo:transaction-coordinator[fo:decision-state = 'prepare']/fo:transaction-id/fn:string()
          return
            xdmp:xa-complete($f, $id, fn:false(), fn:false())

      Introduction

      Query Console is an interactive web-based query development tool for writing and executing ad-hoc queries in XQuery, Server-Side JavaScript, SQL and SPARQL. Query Console enables you to quickly test code snippets, debug problems, profile queries, and run administrative XQuery scripts.  Query Console uses workspaces to assist users with organizing queries.  A user can have multiple workspaces, and each workspace can have multiple queries.

      Issue

      In MarkLogic Server v9.0-11, v10.0-3 and earlier releases, users may experience delays, lag or latency between when a key is pressed on the keyboard, and when it appears in the Query Console query window.  This typically happens when there are a large number of queries in one of the users workspaces.

      Workaround

      A workaround to improve performance is to reduce the number of queries in each workspace.  The same number of queries can be managed by increasing the number of workspaces and reducing the number of queries in each workspace.  We suggest keeping no more than 30 queries in a workspace to avoid these latency issues.  

      The MarkLogic Development team is looking to improve the performance of Query Console, but at the time of this writing, this performance issue has not yet been resolved. 

      Further Reading

      Query Console User Guide

      Introduction

      Users of Java based batch processing applications, such as CoRB, XQSync, mlcp and the hadoop connector may have seen an error message containing "Premature EOF, partial header line read". Depending on how exceptions are managed, this may cause the Java application to exit with a stacktrace or to simply output the exception (and trace) into a log and continue.

      What does it mean?

      The premature EOF exception generally occurs in situations where a connection to a particular application server connection was lost while the XCC driver was in the process of reading a result set. This can happen in a few possible scenarios:

      • The host became unavailable due to a hardware issue, segfault or similar issue;
      • The query timeout expired (although this is much more likely to yield an XDMP-EXTIME exception with a "Time limit exceeded" message);
      • Network interruption - a possible indicator of a network reliability problem such as a misconfigured load balancer or a fault in some other network hardware.

      What does the full error message look like?

      An example:

      INFO: completed 5063408/14048060, 103 tps, 32 active threads
       Feb 14, 2013 7:04:19 AM com.marklogic.developer.SimpleLogger logException
       SEVERE: fatal error
       com.marklogic.xcc.exceptions.ServerConnectionException: Error parsing HTTP
       headers: Premature EOF, partial header line read: ''
       [Session: user=admin, cb={default} [ContentSource: user=admin,
       cb={none} [provider: address=localhost/127.0.0.1:8223, pool=0/64]]]
       [Client: XCC/4.2-8]
       at
       com.marklogic.xcc.impl.handlers.AbstractRequestController.runRequest(AbstractRequestController.java:116)
       at com.marklogic.xcc.impl.SessionImpl.submitRequest(SessionImpl.java:268)
       at com.marklogic.developer.corb.Transform.call(Unknown Source)
       at com.marklogic.developer.corb.Transform.call(Unknown Source)
       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
       at java.util.concurrent.FutureTask.run(FutureTask.java:166)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
       at java.util.concurrent.FutureTask.run(FutureTask.java:166)
       at
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
       at
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
       at java.lang.Thread.run(Thread.java:679)
       Caused by: java.io.IOException: Error parsing HTTP headers: Premature EOF,
       partial header line read: ''
       at com.marklogic.http.HttpHeaders.nextHeaderLine(HttpHeaders.java:283)
       at com.marklogic.http.HttpHeaders.parseResponseHeaders(HttpHeaders.java:248)
       at com.marklogic.http.HttpChannel.parseHeaders(HttpChannel.java:297)
       at com.marklogic.http.HttpChannel.receiveMode(HttpChannel.java:270)
       at com.marklogic.http.HttpChannel.getResponseCode(HttpChannel.java:174)
       at
       com.marklogic.xcc.impl.handlers.EvalRequestController.serverDialog(EvalRequestController.java:68)
       at
       com.marklogic.xcc.impl.handlers.AbstractRequestController.runRequest(AbstractRequestController.java:78)
       ... 11 more
       2013-02-14 07:04:19.271 WARNING [12] (AbstractRequestController.runRequest):
       Cannot obtain connection: Connection refused

      Configuration / Code: things to try when you first see this message

      A possible cause of errors like this may be due to the JVM starting garbage collection and this process taking long enough as to exceed the server timeout setting. If this is the case, try adding the -XX:+UseConcMarkSweepGC java option

      Setting the "keep-alive" value to zero for the affected XDBC application server will disable socket pooling and may help to prevent this condition from arising; with keep-alive set to zero, sockets will not be re-used. With this approach, it is understood that disabling keep-alive should not be expected to have a significant negative impact on performance, although thorough testing is nevertheless advised.

      Summary

      Here we discuss various methods for sharing metering data with Support:  telemetry in MarkLogic 9 and exporting monitoring data.

      Discussion

      Telemetry

      In MarkLogic 9, enabling telemetry collects, encrypts, packages, and sends diagnostic and system-level usage information about MarkLogic clusters, including metering, with minimal impact to performance. Telemetry sends information about your MarkLogic Servers to a protected and secure location where it can be accessed by the MarkLogic Technical Support Team to facilitate troubleshooting and monitor performance.  For more information see Telemetry.

      Meters database

      If telemetry is not enabled, make sure that monitoring history is enabled and data has been collected covering the time of the incident.  See Enabling Monitoring History on a Group for more details.  

      Backup of Meters database

      A backup of the full Meters database will provide all the available raw data and is very useful, but is often very large and difficult to transfer, so an export of a defined time range is often requested.

      Exporting data

      One of the attached scripts can be used in lieu of a Meters database backup. They will provide the raw metering XML files from a defined period of time and can be reloaded into MarkLogic and used with the standard tools.

      exportMeters.xqy

      This XQuery export script needs to be executed in Query Console against the Meters database and will generate zip files stored in the defined folder for the defined period of time.

      Variables for start and end times, batch size, and output directory are set at the top of the script.

      get-raw.sh

      This bash version will use MLCP to perform a similar export but requires an XDBC server and MLCP installed. By default the script creates output in a subdirectory called meters-export. See the attached script for details. An example command line is

      ./get-raw.sh localhost admin admin "2018-04-12T00:00:00" "2018-04-14T00:00:00"

      Introduction

      To avoid index bloat, MarkLogic only records positions in its indexes for words once for word-query fields. When word positions are necessary to accurately match element-word queries, they are normally used from the word-query field. When elements are excluded from the word query field, words under those elements are not indexed - so their positions are not recorded. In MarkLogic 7.0-5 and 8.0-1, a code change was included to avoid false negatives resulting from an element-word query expecting positions from words in elements descended from excluded elements. This code change was to not use positions from the word-query field for element-word searches if the word-query field has exclusions.

      Implications

      Unfortunately, this solution can sometimes result in false positives - which is captured in 7.0-5 bug #33207 and 8.0-1 bug #32686 (you can read more about both of these bugs in our Fixed Bugs Report). Consequently, a follow-up refinement was shipped in 7.0-5.1 & 8.0-2 to allow for the affected queries to be fully resolveable via indexes. To take advantage of this update, three changes are required:

      1) Upgrade to 7.0-5.1 or later, or 8.0-2 or later

      2) Database index settings must be updated to tell MarkLogic Server to use positions in this scenario and therefore avoid the previously seen false positives. There are two changes that could be made. Either:

      2a. The element in the element-word query must be explicitly included in the word-query field

      ...or:

      2b. All the word-query excluded elements must be configured as phrase-around elements.

      3) After the relevant database index settings are updated and the upgrade has been applied, a reindex must be performed

      If these changes are made, positions in the word-query field should then be used, which should then ultimately result in the elimination of false positives.

      Introduction

       A "fast data directory" is configurable for each forest, and can be set to a directory built on a fast file system, such as one using SSDs. Refer to Using a mix of SSD and spinning drives. If configured MarkLogic Server will try to put as many writes and seeks to the Fast Data Directory (FDD) as it can. As such, it will try to put as many on disk stands as possible onto the FDD. Frequently updated documents tend to reside in the smaller stands and thus are more likely to reside on the FDD.

      This article attempts to explain how you should account for the FDD when sizing disk space for your MarkLogic Server.

      Details

      Forest journals will be placed on the fast data directory. 

      Each time an automatic merge is performed, MarkLogic Server will attempt to save the results onto the forest's fast data directory. If there is not sufficient space on the FDD, MarkLogic Server will use the forest's primary data directory. To preserve space for future small stands, MarkLogic Server is conservative in deciding whether to put the merge destination stands on the FDD, which means that even if there is enough available space, it may store the result to the forests regular data directory. For more details, refer to the fundamental of resource consumption white paper. 

      It is also important to know when the Fast Data Directory is not used: Stands created from a manually triggered merges do not get stored on the fast data directory, but in the forest's primary data directory. Manual merges can be executed by calling the xdmp:merge function or from within the Admin UI; Forest-migrate  and Restoring backups do not put stands in the fast data directory.

      Conclusion

      MarkLogic Server maintains some disk space in the FDD for checkpoints and journaling. However, since the Fast Data Directory is not used in some procedures, we should not count the size of the FDD when sizing the disk space needed for forest data.

      Introduction

      The Performance Considerations section of the Loading Content Into MarkLogic Server documentation states 

      "When you load content, MarkLogic Server performs updates transactionally, locking documents as needed and saving the content to disk in the journal before the transaction commits. By default, all documents are locked during an update and the journal is set to preserve committed transactions, even if the MarkLogic Server process ends unexpectedly."

      There are two types of locking which are specified at the database level:

      • Fast locking employs a hashed locking scheme (based on the URI) where each fragment URI has a designated forest, so the lock created during the insert is restricted only to that forest.
      • Setting up a database with "strict" locking will force the coordination of an update lock across all forests in the database (and across the cluster) until the insert has taken place.

      Fast locking has been the default setting for newly created MarkLogic databases since MarkLogic 5 (released October 2011)

      When should I use strict locking?

      If at any point in your code, you are specifying the forest to insert document or fragment into (using a technique commonly referred to as in-forest evaluation), configuring the setting for that database at "strict" is definitely the safest choice. If your code always allows the server to determine the target forest for the document/fragment, you're perfectly safe using fast locking.

      In the situation where two different people create the same document (with the same URI) and where fast locking was taking place, this would result in:

      • A transaction culminating in an insert into a given forest (as assigned by the ML node servicing the request) for the first fragment
      • An "update" transaction (in the same forest) where the first fragment is then marked as deleted
      • A new fragment takes place of the first fragment to complete the second transaction

      Subsequent merges would then remove the stand entry for the first fragment (now deleted/replaced by the subsequent transaction)

      The fast option would not create a dangerous race condition unless your application would allow two different people to insert a document with the same URI into two different forests as two separate transactions and where URI assignment is handled by your XQuery/application layer; if the code responsible for making those transactions were to inadvertently assign the same URI to two different forests in a cluster, this could cause a problem that strict locking would guard against. If your application always allows MarkLogic to assign the forest for the document, there is no danger whatsoever in keeping to the server default of "fast" locking.

      Additionally - consider what kind of failover you system is using. When using fast journaling with local disk replication, the journal disk write needs to fail on both master and replica nodes in order for data loss to occur - so there's no need for strict in this scenario. In contrast, strict journaling should be used with shared-disk failover, as data loss is possible if using fast journaling and a single node fails before the OS flushes the buffer to disk.

      Is there a performance implication in switching to strict locking?

      Fast locking will be faster than strict locking, but the performance penalty is largely going to be dependent on a number of factors; the number of forests in a given database, the number of nodes across which the database forests are spread and the speed at which all nodes in the cluster can coordinate a transaction across the cluster (Network/IO) will all have some (potentially minimal) impact.

      If the conditions of your application suit, we recommend staying with the default of fast locking on all your databases.

      There may be reasons for using 'strict' locking - especially if you are considering loading documents using in-forest-evaluation in your code.

      Further reading

      https://docs.marklogic.com/guide/ingestion/performance

      Summary

      There are situations where the SVC-DIRREM, SVC-DIROPEN and SVC-FILRD errors occur on backups to an NFS mounted drive. This article explains how this condition can occur and describes a number of recommendations to avoid such errors.

      Under normal operating conditions, with proper mounting options for a remote drive, MarkLogic Server does not expect to report SVC-xxxx errors.  Most likely, these errors are a result of improper nfs disk mounting or other IO issues.

      We will begin by exploring methods to narrow down the server which has the disk issue and then list some things to look into in order to identify the cause.

      Error Log and Sys Log Observation

      The following errors are typical MarkLogic Error Log entries seen during an NFS Backup that indicate an IO subsystem error.   The System Log files may include similar messages.

              Error: SVC-DIRREM: Directory removal error: rmdir '/Backup/directory/path': {OS level error message}

              Error: SVC-DIROPEN: Directory open error: opendir '/Backup/directory/path': {OS level error message}

              Error: Backup of forest 'forest-name' to 'Bakup path' SVC-FILRD: File read error: open '/Backup/directory/path': {OS level error message}

      These SVC- error messages include the {OS level error message} retrieved from the underlying OS platform using generic C runtime strerror() system call.  These messages are typically something like "Stale NFS file handle" or "No such file or directory".

      If only a subset of hosts in the cluster are generating these types of errrors ...

      You should compare the problem host's NFS configuration with rest of the hosts in the cluster to make sure all of the configurations are consistent.

      • Compare nfs versions (rpm -qa | grep -i nfs)
      • Compare nfs configurations (mount -l -t nfs, cat /etc/mtab, nfsstat)
      • Compare platform version (uname -mrs, lsb_release -a) 

      NFS mount options 

      MarkLogic recommends the NFS Mount settings - 'rw,bg,hard,nointr,noac,tcp,vers=3,timeo=300,rsize=32768,wsize=32768,actimeo=0'

      • Vers=3 :  Must have NFS client version v3 or above
      • TCP : NFS must be configured to use TCP instead of default UDP
      • NOAC : To improve performance, NFS clients cache file attributes. Every few seconds, an NFS client checks the server's version of each file's attributes for updates. Changes that occur on the server in those small intervals remain undetected until the client checks the server again. The noac option prevents clients from caching file attributes so that applications can more quickly detect file changes on the server.
        • In addition to preventing the client from caching file attributes, the noac option forces application writes to become synchronous so that local changes to a file become visible on the server immediately. That way, other clients can quickly detect recent writes when they check the file's attributes.
        • Using the noac option provides greater cache coherence among NFS clients accessing the same files, but it extracts a significant performance penalty. As such, judicious use of file locking is encouraged instead. The DATA AND METADATA COHERENCE section contains a detailed discussion of these trade-offs.
        • NOTE: The noac option is a combination of the generic option sync, and the NFS-specific option actimeo=0.
      • ACTIME=0 : Using actimeo sets all of acregminacregmaxacdirmin, and acdirmax to the same "0" value. If this option is not specified, the NFS client uses the defaults for each of these options listed above.
      • NOINTR : Selects whether to allow signals to interrupt file operations on this mount point. If neither option is specified (or if nointr is specified), signals do not interrupt NFS file operations. If intr is specified, system calls return EINTR if an in-progress NFS operation is interrupted by a signal.
        • Using the intr option is preferred to using the soft option because it is significantly less likely to result in data corruption.
        • The intr / nointr mount option is deprecated after kernel 2.6.25. Only SIGKILL can interrupt a pending NFS operation on these kernels, and if specified, this mount option is ignored to provide backwards compatibility with older kernels.
      • BG : If the bg option is specified, a timeout or failure causes the mount command to fork a child which continues to attempt to mount the export. The parent immediately returns with a zero exit code. This is known as a "background" mount.
      • HARD (vs soft) : Determines the recovery behavior of the NFS client after an NFS request times out. If neither option is specified (or if the hard option is specified), NFS requests are retried indefinitely. If the soft option is specified, then the NFS client fails an NFS request after retrans retransmissions have been sent, causing the NFS client to return an error to the calling application.
        • Note: A so-called "soft" timeout can cause silent data corruption in certain cases. As such, use the soft option only when client responsiveness is more important than data integrity. Using NFS over TCP or increasing the value of the retrans option may mitigate some of the risks of using the soft option. 

      Issue persists => Further debugging 

      If after checking NFS configuration and after implementing the MarkLogic recommended NFS mount settings, the issue persists, then you will need to debug the NFS connection during an issue period.    You should enable rpcdebug for NFS on the hosts showing the NFS errors, and then analyze the resulting syslogs during a period that is experiencing the issues

              rpcdebug -m nfs -s all

       The resulting logs may give you additional information to help understand what the source of the failures are.

       

      Introduction

      It has long been possible to store binary files in MarkLogic. In the MarkLogic 5 release in 2011, binary support was enhanced to allow for even more control over binary files.

      The purpose of this Knowledgebase article is not to cover MarkLogic's binary support in depth but to demonstrate a technique for retrieving a list of URIs for binary files which are managed in a MarkLogic Database.

      Retrieving a list of binary document URIs from MarkLogic Server

      The following code will use a call to cts:uris to get back a list of all URIs pointing to binary documents for a given MarkLogic database; note that this example assumes that you have the uri lexicon enabled in your database:

      Further reading

      People often want fine-grained entitlement control in the applications they build on top of MarkLogic Server. This article discusses two options and their performance implications.

      Best Practice

      Often, we'll see people attempt an implementation using MarkLogic users and roles. While MarkLogic Server can easily handle a large number of roles in total, you'll run into scalability and performance issues if you have a large number of roles per user. Additionally, you'll want to minimize the number of updates to documents in your Security database as every update requires Security caches to be re-validated, thus incurring a performance penalty.

      Instead, for a more scalable and performant solution, you will want to build your entitlements into your documents at the application level, then query those entitlement values with element range indexes on the elements containing those entitlement values.

      Summary

      When attempting to start MarkLogic Server on older versions of Linux (Non-supported platforms), a "Floating Point Exception" may prevent the server from starting.

      Example of the error text from system messages:

      kernel: MarkLogic[29472] trap divide error rip:2ae0d9eaa80f rsp:7fffd8ae7690 error:0

      Detail

      Older Linux kernels will, by default, utilize older libraries.  When a software product such as MarkLogic Server is built using a newer version of gcc, it is possible that it will fail to execute correctly on older systems.  We have seen it in cases where the glibc library is out of date, and not containing certain symbols that were added in newer versions. Refer to the RedHat bug that explains this issue: https://bugzilla.redhat.com/show_bug.cgi?id=482848

      The recommended solution is to upgrade to a newer version of your Linux distribution.  While you may be able to resolve the immediate issue by only upgrading the glibc library, it is not recommended.

      Introduction

      Attached to this article is an XQuery module: "appserver-status.xqy", which will generate a report on all requests currently "in-flight" across all application servers in your cluster

      Usage

      Run this in Query Console (be sure to display results as html output), it will generate an html table showing all requests currently "in-flight" across all application servers in your cluster. For any transaction taking over 60 seconds, it provides extra detail to help understand and identify bottlenecks where specific modules (or tasks) may be having an adverse effect on the overall performance of the cluster.

      The information generated by this module can be used in conjunction with any ticket opened with the support team where assistance is required to better understand and resolve performance issues relating to specific modules. This module could also be used in a situation where DBAs want to perform routine health checks on their cluster to find and identify slow running queries.

      Introduction

      At the time of this writing (MarkLogic 9), MarkLogic Server cannot perform spherical queries, as the geospatial indexes do not support a true 3D coordinate system.  In situations where cylindrical queries are sufficient, you can create a 2D geospatial index and a separate range index on an altitude value. An "and-query" with these indexes would result in a cylindrical query.

      Example

      Consider the following sample document structure:

      Configure these 2 indexes for your content database:

      1. Geospatial Element Pair index specifying latitude localname as ‘lat’ , longitude localname ‘long’ and ‘parent localname’ as ‘location’ in configuration
      2. Range element index with localname as ‘alt’ with int scalar type

      Assuming you have data in your content database matching above document structure, this query:

      will return all the documents with location i.e., points falling in the cylinder with center at 37.655983, -122.425525 having a radius of 1000 miles and with an altitude of less than 4 miles.

      Note that in MarkLogic Server 9 geospatial region match was introduced, so the above technique can be extended beyond cylinders.

      Introduction

      The MarkLogic Monitoring History dashboard (http://localhost:8002/history/) is probably the easiest way to gather monitoring history data, but almost all of this information available within the monitoring dashboard is also available over our ReST APIs:

      Application Server Status details

      Information on Application Severs can be found at https://docs.marklogic.com/REST/GET/manage/v2/servers and here's an example for getting detailed metrics - http://localhost:8002/manage/v2/servers?group-id=Default&view=metrics&format=xml

      For Application Server status information - https://docs.marklogic.com/REST/GET/manage/v2/servers@view=status and here's an example with detailed metrics http://localhost:8002/manage/v2/servers?view=status&group-id=Default&format=xml&fullrefs=true

      To access status information for a specific Application Server (for example, the TaskServer), you can get the current status by adding the name to the URI - http://localhost:8002/manage/v2/servers/TaskServer?group-id=Default&view=status&format=xml

      You can also get the configuration information for a given application server (for example: "Admin") over the ReST API - http://localhost:8002/manage/v2/servers/Admin/properties?group-id=Default&format=xml

      Database and Forest status details

      For databases and forests, you can similarly use the endpoints for /databases or /forests:

      Database level examples include:

      Forest level examples include:

      Introduction

      Now below query will fail to insert a document with URI having a ,>

      Above code fails and gives error listed below,

      [1.0-ml] XDMP-DOCENTITYREF: xdmp:unquote("<?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?>&#10;<...") -- Invalid entity reference "." at line 2

       

      To resolve this issue, function xdmp:url-encode can be used, for example

      let $node := xdmp:unquote(fn:concat('<?xml version="1.0" encoding="UTF-8"?>
      <tns:simpleuri xmlns:tns="http://www.example.org/uri" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.example.org/uri uri.xsd ">',

      xdmp:url-encode(fn:codepoints-to-string($n)), '.org
      </tns:simpleuri>'))

      The MarkLogic knowledge base article, Using URL encoding to handle special characters in a document URI , explains a recommended approach for safely handling special characters (using url encoding). A document URI containing special characters, as mentioned in above Knowledge base article, should be encoded before it is inserted into MarkLogic 8. 

      Summary

      While it is possible to load documents into MarkLogic Server where the document URI contains special characters not encoded, it is recommended to follow best practices by URL encoding document URIs as it will help you design robust applications, free from the side effects caused by such special characters in other areas of your application stack. 

      Additional References

      ISO/IEC 8859-1

      w3 school: HTML Unicode (UTF-8) Reference

       

      MarkLogic default Group Level Cache and Huge Pages settings

      The table below shows the default (and recommended) group level cache settings based on a few common RAM configurations for the 9.0-9.1 release of MarkLogic Server:

      Total RAM List Cache Compressed Tree Cache Expanded Tree Cache Triple Cache Triple Value Cache Default Huge Page Ranges
      8192 (8GB) 1024 (1 partition) 512 (1 partition) 1024 (1 partition) 512 (1 partition) 1024 (2 partitions) 1280 to 1994
      16384 (16GB) 2048 (1 partition) 1024 (2 partitions) 2048 (1 partition) 1024 (2 partitions) 2048 (2 partitions) 2560 to 3616
      24576 (24GB) 3072 (1 partition) 1536 (2 partitions) 3072 (1 partition) 1536 (2 partitions) 3072 (4 partitions) 3840 to 4896
      32768 (32GB) 4096 (2 partitions) 2048 (3 partitions) 4096 (2 partitions) 2048 (3 partitions) 4096 (6 partitions) 5120 to 6176
      49152 (48GB) 6144 (2 partitions) 3072 (4 partitions) 6144 (2 partitions) 3072 (4 partitions) 6144 (8 partitions) 7680 to 8736
      65536 (64GB) 8064 (3 partitions) 4032 (6 partitions) 8064 (3 partitions) 4096 (6 partitions) 8192 (11 partitions) 10080 to 11136
      98304 (96GB) 12160 (4 partitions) 6080 (8 partitions) 12160 (4 partitions) 6144 (8 partitions) 12160 (16 partitions) 15200 to 16256
      131072 (128GB) 16384 (6 partitions) 8192 (11 partitions) 16384 (6 partitions) 8192 (11 partitions) 16384 (22 partitions) 20480 to 21020
      147456 (144GB) 18432 (6 partitions) 9216 (12 partitions) 18432 (6 partitions) 9216 (12 partitions) 18432 (24 partitions)

      23040 to 24096

      262144 (256GB) 32768 (9 partitions) 16384 (11 partitions) 32768 (9 partitions) 16128 (22 partitions) 32256 (32 partitions)

      40320 to 42432

      Note that these values are safe to use for MarkLogic 7 and above.

      For all the databases that ship with MarkLogic Server, the Huge Pages ranges on this table will cover the out-of-the box configuration. Note that adding more forests will cause the second value in the range to increase.

      From MarkLogic Server 9.0-7 and above

      In the 9.0-7 release and above (and all versions of MarkLogic 10), automatic cache sizing was introduced; this setting is usually recommended.

      Maximum group level cache settings

      Assuming a Server configured with 256GB RAM (and above), these are the maximum sizes for the three main group level caches and will utilise 180GB (184320MB) per host for the Group Level Caches:

      • Expanded Tree Cache - 73728 (72GB) (with 9 8GB partitions)
      • List Cache - 73728 (72GB) (with 9 8GB partitions)
      • Compressed Tree Cache - 36864 (36GB) (with 11 3 GB partitions)

      We have found that configuring 4GB partitions for the Expanded Tree Cache and the List Cache generally works well in most cases; for this you would set the number of partitions to 18

      For the Compressed Tree Cache the number of partitions can be set to 22.

      Important note

      The maximum number of configurable partitions is 32

      Each cache partition should be no more than 8192 MB

      Introduction

      MarkLogic Server has a notion of groups, which are sets of similarly configured hosts within a cluster.

      Application servers (and their respective ports) are scoped to their parent group.

      Therefore, you need to make sure that the host and its exposed port to which you're trying to connect both exist in the group where the appropriate application server is defined. For example, if you attempt to connect to a host defined in a group made up of d-nodes, you'll only see application servers and ports defined in the d-nodes group. If the application server you actually want is in a different group (say, e-nodes), you'll get a connection error, instead.

      Questions

      Can I use any xdmp builtins to show which application servers are linked to particular groups?

      The code example below should help with this:

      Problem:

      The errors 'XDMP-MODNOTFOUND - Module not found' and 'XDMP-NOPROGRAM - Server unable to build program from request' may occur when the requested XQuery document does not exist or the user does not have the right permissions on the module.

      Solution:

      When either of these errors is encountered, the first step would be to check if the requested XQuery module is actually present in the modules database. Make sure the the document uri matches the 'root' of the relevant app-server.

      'Modules' field of the app-server configuration specifies the name of the database in which this app-server locates the XQuery application code (if it is not set to 'File-system'). When it is set to a specific database, then only documents in that database whose URI begin with the specified root directory are executable. For example, if 'root'  of the database is set to "/codebase/xquery/", then only documents in the database which start with this uri "/codebase/xquery/" are executable.

      If set to 'File-system' make sure the requested module exists in the location specified in the 'root' directory of the app-server. 

      Defining a 'File-system' location is often used on single node DEV systems but not recommended on a clustered environment. To keep the deployment of code simple it is recommended to use a Modules database in clustered production system.

      Once you made sure that the module does exist, the next step is to check if the user has the right permissions to execute the database. More often, it is likely that the error is caused because of a permissions issue.

      (i) Check app-server privileges

      The 'privilege' field in the app-server configuration, when set, specified the execute privilege required to access the server. Only users who are assigned this privilege can access the server and the application code. Absence of this privilege may cause the XDMP-NOPROGRAM error.

      Make sure the user accessing the app-server has the specified priveleges. This can be checked by using sec:user-privileges() (Should be run against the Security database).

      The documentation here - http://docs.marklogic.com/guide/admin/security#id_63953 contains more detailed information about privileges.

      (ii) Check permission on the requested module

      The user trying to access the application code/modules is required to have the 'execute' permission on the module. Make sure all the xquery documents have 'read' and 'execute' permissions for the user trying to access them. This can be verified by executing the following query against your 'modules' database:

                       xdmp:document-get-permissions("/your-xqy-module")

      This returns a list of permission on the document - with the capability that each role has, in the below format:

                    <sec:permission xmlns:sec="http://marklogic.com/xdmp/security">
                    <sec:capability>execute</sec:capability>
                    <sec:role-id>4680733917602888045</sec:role-id>
                    </sec:permission>
                    <sec:permission xmlns:sec="http://marklogic.com/xdmp/security">
                    <sec:capability>read</sec:capability>
                    <sec:role-id>4680733917602888045</sec:role-id>
                    </sec:permission>

      You can then map the role-ids to their role names as below: (this should be done against the Security database)

                    import module namespace sec="http://marklogic.com/xdmp/security" at "/MarkLogic/security.xqy";
                    sec:get-role-names((4680733917602888045))

      If you see that the module does not have execute permission for the user, the required permissions can be added as below: (http://docs.marklogic.com/xdmp:document-add-permissions)

                   xdmp:document-add-permissions("/document/uri.xqy",

                    (xdmp:permission("role-name","read"),
                   xdmp:permission("role-name", "execute")))

       

       

           

       

       

       

      Introduction

      Recent exploits in the TLS protocol such as POODLE, FREAK, LogJam, and SLOTH have rendered TLSv1.0 and SSLv3 largely obsolete.  Additionally, standards councils such as PCI (Payment Card Industry) and NIST (National Institute of Standards & Technology) are moving to disallow the use of these protocols.

      This article will describe the MarkLogic configuration changes needed to harden a MarkLogic HTTP Application Server so that only secure versions of TLS are used and where clients attempting to connect with TLSv1.0 or earlier protocols are rejected.

      Note: Since this article was first written MarkLogic server has added an administrator function to disable individual SSL and TLS protocol versions. If you are still running MarkLogic version 8.0-5 or earlier you can continue to use the solution outlined below, otherwise, users of MarkLogic 9 or later should use the new AppServer Set SSL Disabled Protocols function to control which SSL and TLS protocol versions are available.

      Configuration

      The TLS protocol versions accepted and the Cipher suites selected are controlled by the specification list set in the "SSL Ciphers" field on the HTTP App Server Configuration panel:

      The format of the specification list follows the OpenSSL format as described in the OpenSSL Cipher suite documentation and comprises one or more colon ":" separated ciphers strings which control which cipher suites are enabled or disabled. 

      The default specification used by MarkLogic enables ALL ciphers except those that are considered of LOW encryption and places them in order of @STRENGTH 

      ALL:!LOW:@STRENGTH

      While sufficient for a lot of needs the default settings still allow for cipher negotiations that are no longer considered secure or weak signature algorithms such as MD2 and MD5. The following cipher specification string enhances security by only permitting AES and Triple DES (3DES) ciphers while at the same time disabling MD2 and MD5 signature algorithms.

      ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD2:!MD5

      PCI DSS 3.2 & NIST SP 800-52 compliance

      At this stage, while the MarkLogic HTTP Application Server is now using stronger security it will still permit a client to connect using TLSv1.0. In order to comply with PCI DSS 3.2, compliant sites must stop using TLSv1.0 by 30th June 2018 while NIST SP 800-52 requires that sites only use TLSv1.1 with a recommendation to use TLSv1.2 where possible.

      TLSv1.2 and browser support

      For TLSv1.2, older browsers should be upgraded to current versions.

      Making these changes may require users accessing your application to upgrade older browsers such as Firefox < 27.0 or Internet Explorer < 11.0 as these versions do not support TLSv1.2 by default.

      The MarkLogic App Server utilizes OpenSSL which does not explicitly support enabling or disabling a specific TLS protocol version, however by disabling the all cipher suites associated with a particular version you effectively get the same outcome.

      SSLv3, TLSv1.0 & TLSv1.1 share the same common ciphers, so adding "!SSLv3" to the cipher specification will cause all client connection attempts using any of these protocols to fail.

      ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD2:!MD5:!SSLv3

      Testing using the OpenSSL s_client utility shows that attempts to connect using TLSv1.0 fail with SSL alert 40 indicating no common cipher was available.

      openssl s_client -connect 192.168.99.100:8010 -debug -tls1
      CONNECTED(00000003)
      ..
      140735283961936:error:14094410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure:s3_pkt.c:1472:SSL alert number 40
      140735283961936:error:1409E0E5:SSL routines:ssl3_write_bytes:ssl handshake failure:s3_pkt.c:656:

      While connecting using TLSv1.2 is successful.

      openssl s_client -connect 192.168.99.100:8010 -debug -tls1_2
      CONNECTED(00000003)
      ...
      ---
      New, TLSv1/SSLv3, Cipher is AES256-GCM-SHA384
      Server public key is 2048 bit
      Secure Renegotiation IS supported
      Compression: NONE
      Expansion: NONE
      No ALPN negotiated
      SSL-Session:
      Protocol : TLSv1.2
      Cipher : AES256-GCM-SHA384

      Further reading

      On MarkLogic Security Certification

      How does MarkLogic Server's high-availability work in AWS?

      AWS provides fault tolerance within a geographic region through the use of Availability Zones (AZs) while MarkLogic gives that ability through Local Disk Failover (LDF). If you’re using AWS, the best practice is to place each MarkLogic node/EC2 instance in a different Availability Zone within a single region, where a given data forests is in one AZ (AZ A), while its LDF forest is in a different AZ (AZ B). This way, in the event where access to Availability Zone A is lost, the host in the Availability Zone A will failover to its LDF on the host in Availability Zone B, thereby ensuring high-availability within your MarkLogic cluster.

      Further reading:

      Should failover be configured for the Security forest?

      A cluster is not functional without its Security database. Consequently, it’s important to ensure high-availability of the Security database’s forest by configuring failover for that forest.

      Further reading:

      Should my forests have more than one Local Disk Failover forest?

      High-availability through Local Disk Failover with one LDF forest is designed to allow the cluster to survive the failure of a single host. If you're using AWS, careful forest placement across AWS availability zones can provide high-availability even in the event of an entire availability zone going down. With rare exceptions, additional LDF forests are typically not worth the additional complexity and cost for the vast majority of MarkLogic deployments.

      Do I still have high-availability post failover? What happens to the data forest? How can I fail back my forests to the way they were?

      When a failover event occurs, the LDF forest takes over as the acting data forest and the configured data forest will assume the role of the acting LDF forest as soon as it is successfully restarted. At this point, as long as both forests are still available, the cluster continues to be high availability but with forests reversing their originally intended roles. To fail back the forests into the roles they were originally intended, you will need to wait until the acting data forest (the originally intended LDF) and acting LDF (the originally intended data forest) are synchronized, then manually restart the acting data forest/intended LDF. At that point, the acting LDF/intended data forest “fails back” to take over its original role of acting data forest, and the acting data forest/intended LDF will once again assume its original role of acting LDF. In short, failover is automatic, but failing back requires a manual restart of the acting data forest/intended LDF. When failing back, it's very important to wait until the forests are synchronized - if you fail back before the forests are fully synchronized, you'll lose any data in the acting data forests that's yet to be propagated back to the acting LDF/intended data forest.

      Further reading:

      Introduction: getting more information about the bugs fixed between releases

      As a general recommendation, we encourage customers to keep the server up-to-date with patch releases at any case.

      If you would like a list of some of the published bugs that were addressed between two releases of the server (for example: 5.0-3 and 5.0-4.1), you can perform the following steps:

      - Log into the support portal at http://help.marklogic.com
      - Click on the "Fixed bugs" icon to take you to the bugtrack list
      - Select 5.0-3 in the From: dropdown box
      - Select 5.0-4.1 in the To: dropdown box
      - Click 'Show' to generate an HTML table or View PDF to export the results in a PDF document

      Step one: login

      Provide your credentials and use the form on the left-hand side to log in to access the support portal

      Log into the support portal

      Step two: select the "Fixed bugs" link from the icons on the page

      Select 'Fixed Bugs' to go to the bugtrack list

      Step three: select the release 'range' from the two dropdown lists on the Fixed Bugs page

      Use the Show button to update the page or download the list in PDF format as required

      Select the versions from the 'From' and 'To' lists to generate the report

      Introduction

      In Amazon Web Services, AMIs have unique ids based on their region. There will be many cases when you want to use multiple regions (for example: maintenance of two clusters in separate geographical regions). Below is an example of how to find the list of current AMIs.

      Log in to Amazon Web Services

      Example image showing the AWS Login Page

      Find your MarkLogic instance on Amazon AWS Marketplace

      Example image showing the MarkLogic 8 HVM in Amazon's Marketplace

      For example: https://aws.amazon.com/marketplace/pp/B00U36DS6Y

      Click continue

      Example Continue button

      View the table

      Choose the version of MarkLogic Server that you're planning to use from the version dropdown.

      Image of a table showing all AMI IDs available for this item in the AWS Marketplace

      You will see a table containing a list of all current regions and the corresponding AMI ID for our instances for each available region.

      Further reading

      Summary

      MarkLogic Server has several different features that can help manage data across multiple database instances. Those features differ from each other in several important ways - this article will focus on high-level distinctions and will provide pointers to other materials to help you decide which of these features could work best for your particular use case.

       Details

      Backup/Restore - database backup and restore operations in MarkLogic Server provide consistent database-level views of your data. Propagating data from one instance to another via backup/restore involves a MarkLogic administrator using a completed backup from the source instance as the restore archive on the destination instance. You can read more about Backup/Restore here: http://docs.marklogic.com/guide/admin/backup_restore.

      Flexible Replication - can be used to maintains copies of data on multiple MarkLogic Servers. Unlike backup/restore (which relies on taking a consistent, database level view of the data at a particular timestamp), Flexible Replication creates a copy of a document in another database and keeps that copy in sync (possibly with some time-lag/latency) with the original in the course of normal operations. You can read more about Flexible Replication here: http://docs.marklogic.com/guide/flexrep/rep_intro. Do note that:

      • Flexible Replication is asynchronous. Asynchronous Replication refers to a configuration in which the Master does not wait for confirmation that the update has been received by the Replica before sending further updates.
      • Flexible Replication does not use the same transaction boundaries on the replica as on the master. For example, 10 documents might be inserted in a single transaction on a Flexible Replication master. Those 10 documents will eventually be inserted on a Flexible Replication replica, but there is no guarantee that the replica instance will also use a single transaction to do so.

      Database Replication - is used maintains copies of data on multiple MarkLogic Servers. Database Replication creates a copy of a document in another database and keeps that copy in sync (possibly with some time-lag/latency) with the original in the course of normal operations. You can read more about Database Replication here: http://docs.marklogic.com/guide/database-replication/dbrep_intro. Note that:

      a. Database Replication is, like Flexible Replication, asynchronous.

      b. In contrast to Fleixble Replication, Database Replication operates by copying journal frames from the Master database and replays the transactions described by those journal frames on the foreign Replica database.

      XA Transactions - MarkLogic Server can participate in distributed transactions by acting as a Resource Manager in an XA/JTA transaction. If there are multiple MarkLogic Server instances participating as XA resources in a given XA transaction, then it's possible to use that XA transaction as a synchronized means of replicating data across those multiple MarkLogic instances. You can read more about XA Transactions in MarkLogic Server here: http://docs.marklogic.com/guide/xcc/concepts#id_57048.

      Introduction

      Upgrading individual MarkLogic instances and clusters is generally very easy to do and in most cases requires very little downtime. In most cases, shutting down the MarkLogic instance on each host in turn, uninstalling the current release, installing the updated release and restarting each MarkLogic instance should be all you need to be concerned about...

      However, unanticipated problems do sometimes come to light and the purpose of this Knowledgebase article is to offer some practical advice as to the steps you can take to ensure the process goes as easily as possible - this is particularly important if you're planning an upgrade between major releases of the product.

      Prerequisites

      While the steps outlined under the process heading below offer practical advice as to what to do to ensure your data is safeguarded (by recommending that backups are taken prior to upgrading), another very useful step would be to ensure you have your current configuration files backed up.

      Each host in a MarkLogic cluster is configured using parameters which are stored in XML Documents that are available on each host. These are usually relatively small files and will zip up to a manageable size.

      If you cd to your "Data" directory (on Linux this is /var/opt/MarkLogic; on Windows this is C:\Program Files\MarkLogic\Data and on OS X this is /Users/{username}/Library/Application Support/MarkLogic), you should see several xml files (assignments, clusters, databases, groups, hosts, server).

      Whenever MarkLogic updates any of these files, it creates a backup using the same naming convention used for older ErrorLog files (_1, _2 etc). We recommend backing up all configuration files before following the steps under the next heading.

      Process

      1) Take a backup for each database in your cluster

      2) Turn reindexing off for each database in your cluster

      3) Starting with the node hosting your Security and Schemas forests, uninstall the current maintenance release MarkLogic version on your cluster, then install the latest maintenance release in that feature release (for example, if you're currently running version 10.0-2, you'll want to update to the latest available MarkLogic 10 maintenance release - at the time of this writing, it is 10.0-4).

      4) Start up the host in your cluster hosting your Security and Schemas forests, then the remaining hosts in the cluster.

      5) Access the Admin UI on the node hosting your Security and Schemas forests and accept the license agreement, either for just that host (Accept button) or for all of the hosts in the cluster (Accept for Cluster button). If you choose the Accept for Cluster button, a summary screen appears showing all of the hosts in the cluster. Click the Accept for Cluster button to confirm acceptance (all of the hosts must be started in order to accept for the cluster). If you accepted the license just for the one host in the previous step, you must go to all of the Admin Interface for all of the other hosts and accept the license for each host before each host can operate.

      6) If you're upgrading across feature releases, you may now repeat steps #3-5 until you reach the desired feature and maintenance release on your cluster (for example, if trying to upgrade from MarkLogic 8 to MarkLogic 10,  after installing 8.0-latest, you'll repeat steps 3-5 for version 9.0-latest).

      7) After you've finished upgrading across all the relevant feature releases, re-enable reindexing for each database in your cluster.

      For more details, please go through Section  “Upgrading a Cluster to a New Maintenance Release of MarkLogic Server” of “Scalability, Availability, and Failover” guide.

      If you've got database replication in place across both a master and replica cluster, then be aware that:

      1) You do not need to break replication between the clusters

      2) You should plan to upgrade both the master cluster and replica cluster. If you upgrade just the master, connectivity between the two clusters will stop due to different XDQP versions. 

      3) If the Security database isn't replicated, then there shouldn't be anything special you need to do other than upgrade the two clusters.

      4) If the security database is replicated, do the following:

      • Upgrade the Replica cluster and run the upgrade scripts. This will update the Replica's Security database to indicate that it is current. It will also do any necessary configuration upgrades.
      • Upgrade the Master cluster and run the upgrade scripts. This will update the Master's Security database to indicate that it is current. It will also do any necessary configuration upgrades.

      For more here Updating Clusters Configured with Database Replication

      Back-out Plan

      MarkLogic does not support restoring a backup made on a newer version of MarkLogic Server onto an older version of MarkLogic Server. Your Back-out plan will need to take this into consideration.

      See the section below for recommendations on how this should be handled.

      Further reading

      Backing out of your upgrade: steps to ensure you can downgrade in an emergency

      Product release notes

      The "Upgrade Support" section of the release notes.

      All known incompatibilities between releases

      The "Upgrading from previous releases" section of the documentation

      MarkLogic Support Fixed Bug List

      Introduction

      spell:suggest() and spell:suggest-detailed aren't simply looking for character differences between the provided strings and the strings in your dictionaries - they're also factoring in differences in the resulting phonetics represented by these strings.

      Detail

      There is an undocumented option that can be passed along to increase the phonetic-distance threshold (which is 1, by default). For example, consider the following:

      xquery version "1.0-ml";

      spell:suggest-detailed(('customDictionary.xml'),'acknowledgment', <options xmlns="http://marklogic.com/xdmp/spell"> <phonetic-distance>2</phonetic-distance> </options> )

      =>

      <spell:suggestion original="acknowledgment"
      dictionary="customDictionary.xml"
      xmlns:xml="http://www.w3.org/XML/1998/namespace"
      xmlns:spell="http://marklogic.com/xdmp/spell"> <spell:word distance="9" key-distance="2" word-distance="45"
      levenshtein-distance="1">acknowledgement</spell:word> </spell:suggestion>

      Note that the option "distance-threshold" corresponds to "distance" in the result, and "phonetic-distance" corresponds to "key-distance."

      Also note that increasing the phonetic-distance may cause spell:suggest() and spell:suggest-detailed() to use significantly more CPU. Metaphones are short keys, so a larger distance may match a very large fraction of the dictionary, which would then mean each of those matches would need to be checked in the distance algorithms.

      Background

      A database consists of one or more forests. A forest is a collection of documents (mostly XML trees, thus the name), implemented as a physical directory on disk. Each forest holds a set of documents and all their indexes. 

      When a new document is loaded into MarkLogic Server, the server puts this document in an in-memory stand and writes the action to an on-disk journal to maintain transactional integrity in case of system failure. After enough documents are loaded, the in-memory stand will fill up and be flushed to disk, written out as an on-disk stand. As more document are loaded, they go into a new in-memory stand. At some point this in-memory stand fills up as well, and the in-memory stand gets written as yet another new on-disk stand.

      To read a single term list, MarkLogic must read the term list data from each individual stand and unify the results. To keep the number of stands to a manageable level where that unification isn't a performance concern, MarkLogic runs merges in the background. A merge takes some of the stands on disk and creates a new singular stand out of them, coalescing and optimizing the indexes and data, as well as removing any previously deleted fragments
      Each forest has its own in-memory stand and set of on-disk stands. Loading and indexing content is a largely parallelizable activity so splitting the loading effort across forests and potentially across machines in a cluster can help scale the ingestion work.

      Deletions and Multi-Version Concurrency Control (MVCC)

      What happens if you delete or change a document? If you delete a document, MarkLogic marks the document as deleted but does not immediately remove it from disk. The deleted document will be removed from query results based on its deletion markings, and the next merge of the stand holding the document will bypass the deleted document when writing the new stand. MarkLogic treats any changed document like a new document, and treats the old version like a deleted document.

      This approach is known in database circles as which stands for Multi-Version Concurrency Control (or MVCC).
      In an MVCC system changes are tracked with a timestamp number which increments for each transaction as the database changes. Each fragment gets its own creation-time (the timestamp at which it was created) and deletion-time (the timestamp at which it was marked as deleted, starting at infinity for fragments not yet deleted).

      For a request that doesn't modify data the system gets a performance boost by skipping the need for any URI locking. The query is viewed as running at a certain timestamp, and throughout its life it sees a consistent view of the database at that timestamp, even as other (update) requests continue forward and change the data.

      Updates and Deadlocks

      An update request, because it isn't read-only, has to use read/write locks to maintain system integrity while making changes. Read-locks block for write-locks; write-locks block for both read and write-locks. An update has to obtain a read-lock before reading a document and a write-lock before changing (adding, deleting, modifying) a document. Lock acquisition is ordered, first-come first-served, and locks are released automatically at the end of a request.

      In any lock-based system you have to worry about deadlocks, where two or more updates are stalled waiting on locks held by the other. In MarkLogic deadlocks are automatically detected with a background thread. When the deadlock happens on the same host in a cluster, the update farthest along (with the most locks) wins and the other update gets restarted. When it happens on different hosts, because lock count information isn't in the wire protocol, both updates start over. MarkLogic differentiates queries from updates using static analysis. Before running a request, it looks at the code to determine if it includes any calls to update functions. If so, it's an update. If not, it's a query. Even if at execution time the update doesn't actually invoke the updating function, it still runs as an update.

      For the most part it's not under the control of the user. The one exception is there's an xdmp:lock-for-update($uri) call that requests a write-lock on a document URI, without actually having to issue a write and in fact without the URI even having to exist.

      When a request potentially touches millions of documents (such as sorting a large data set to find the most recent items), a query request that runs lock-free will outperform an update request that needs to acquire read-locks and writelocks. In some cases you can speed up the query work by isolating the update work to its own transactional context. This technique only works if the update doesn't have a dependency on the outer query, but that turns out to be a common case. For example, let's say you want to execute a content search and record the user's search string to the database for tracking purposes. The database update doesn't need to be in the same transactional context as the search itself, and would slow things down if it were. In this case it's better to run the search in one context (read-only and lock-free) and the update in a different context. See the xdmp:eval() and xdmp:invoke() functions for documentation on how to invoke a request from within another request and manage the transactional contexts between the two.

      Document Lifecycle

      Let's track the lifecycle of a document from first load to deletion until the eventual removal from disk. A document load request acquires a write-lock for the target URI as part of the xdmp:document-load() function call. If any other request is already doing a write to the same URI, our load will block for it, and vice versa. At some point, when the full update request completes successfully (without any errors that would implicitly cause a rollback), the actual insertion work begins, processing the queue of update work orders. MarkLogic starts by parsing and indexing the document contents, converting the document from XML to a compressed binary fragment representation. The fragment gets added to the in-memory stand. At this point the fragment is considered a nascent fragment, a term you'll see sometimes on the administration console status pages. Being nascent means it exists in a stand but hasn't been fully committed. (On a technical level, nascent fragments have creation and deletion timestamps both set to infinity, so they can be managed by the system while not appearing in queries prematurely.) If you're doing a large transactional insert you'll accumulate a lot of nascent fragments while the documents are being processed. They stay nascent until they've been committed. Once the fragment is placed into the in-memory stand, the request is ready to commit. It obtains the next timestamp value, journals its intent to commit the transaction, and then makes the fragment available by setting the creation timestamp for the new fragment to the transaction's timestamp. At this point it's a durable transaction, replayable in event of server failure, and it's available to any new queries that run at this timestamp or later, as well as any updates from this point forward (even those in progress). As the request terminates, the write-lock gets released.

      Our document lives for a time in the in-memory stand, fully queryable and durable, until at some point the in-memory stand fills up and gets written to disk. Our document is now in an on-disk stand. Sometime later, based on merge algorithms, the on-disk stand will get merged with some other on-disk stands to produce a new on-disk stand. The fragment will be carried over, its tree data and indexes incorporated into the larger stand. This might happen several times.

      At some point a new request makes a change to the document, such as with an xdmp:node-replace() call. The request making the change first obtains a read-lock on the URI when it first accesses the document, then promotes the read-lock to a write-lock when executing the xdmp:node-replace() call. If another write-lock were already present on the URI from another executing update, the read-lock would have blocked until the other write-lock released. If another read-lock were already present, the lock promotion to a write-lock would have blocked. Assuming the update request finishes successfully, the work runs similar to before: parsing and indexing the document, writing it to the in-memory stand as a nascent fragment, acquiring a timestamp, journaling the work, and setting the creation timestamp to make the fragment live. Because it's an update, it has to mark the old fragment as deleted also, and does that by setting the deletion timestamp of the original fragment to the transaction timestamp. This combination effectively replaces the old fragment with the new. When the request concludes, it releases its locks. Our document is now deleted, replaced by the new version.

      The old fragment still exists on disk, of course. In fact, any query that was already in progress before the update incremented the timestamp, or any query doing time travel with an old timestamp, can still see it. Eventually the on-disk stand holding the fragment will be merged again, at which point the old fragment will be completely removed from the system. It won't be written into the new on-disk stand. That is, unless the administration "merge timestamp" was set to allow deep time travel. In that case it will live on, sticking around in case any new queries want to time travel to see old fragments.

      Summary

      The following article explains the way in-memory caches are used by MarkLogic Server and how can they be utilized to improve query execution.

       

      Detail

      MarkLogic Server provides several caches that are used to improve the performance during query execution. When a query executes for the first time, the Server will populate these caches to store termlist and data fragments in memory.

      MarkLogic Server keeps a lot of its configuration information in databases, and has a lot of caches to make it run faster, but those caches get populated the first time things are accessed. The server also uses book-keeping terms in the indexes to keep track of whether all documents have been indexed with the current settings. MarkLogic caches this information, but has to query the indexes on the first request to warm the cache.

      The in-memory cache in MarkLogic Server holds data that was recently added to the system and is still in an in-memory stand; that is, it holds data that has not yet been written to disk.

      For updates, if there is no in-memory stand on a forest when a new document is inserted, the server will create it. This stand is big enough for thousands of documents, but the cost of creating it will be seen in the time taken for the first document added to it.

       


      How will the in-memory cache help improve query execution

      When a query is executed, the in-memory data structures like range indexes and lexicons get pinned into RAM the first time they are used.  The easiest way to speed things up is to "warm the caches” by running a small sample program that exercises the type-ahead prior to starting production. You can also keep the server warm by doing a non-time-critical stub update at time intervals (every 30 sec to 1 minute). If the server is idle, then it will serve to keep caches and in-memory stand warm. If the server is really busy then it would only take a small amount of extra work. Once this is done, the functionality will be fast for all users in all future sessions.

      Introduction

      MarkLogic does not recommend having more than one forest for the Security database.

      The Security database is typically fairly small and there is no reason to have more than one forest for the Security database. Having more than one Security forest causes additional complexity during failover events, server upgrades, and restarts. A functioning Security database is critical to the stability of a MarkLogic Cluster and it is easier to recover from a host failure if the Security database is configured with only a single forest and a single replica forest. 

      In terms of high availability and forest failover, one local disk failover forest should be configured. In terms of database replication, a replica forest in the replica cluster should be configured.

      If you have more than one Security forest(s):

      We have seen incidents where customers attached more than one Security forest either intentionally or inadvertently (scripting bug or user error) and run into issues while detaching them.

      When the database rebalancer is enabled for the database (default setting) and when a new forest is attached, the database will automatically redistribute the content across all attached forests. Problems can then arise when security forests are detached without preserving their content. This is true for any database, but is problematic when dealing with the Security database. 

      When a Security database forest is detached without first retiring it (and verifying documents are moved out of it), some Security documents will be removed from the database. This may lead to users being locked out of the cluster or render the cluster unusable.  If this occurs on your MarkLogic cluster, please contact MarkLogic Support to help with the repair.

      Best Practice

      • Do not configure more than one forest for any system database, including the Security database.
      • If you have multiple forests in your Security database, and need to come back in line with our one forest recommendation
        • Retire the extra Security database forests;
        • Verify all extra forests are drained of content (zero documents / zero fragments);
        • Detach the extra forests.
      • Once your cluster is in line with our one forest recommendation, disable the rebalancer for the Security database.
      • Configure a single replica forest to achieve high availability.

      Further reading

      Administering Security in MarkLogic

      Database Rebalancing in MarkLogic

      Restoring Security Database

      Security Database restore leading to lingering Certificate Template id in Config files

      Introduction

      This Knowledgebase article is a general guideline for backups using the journal archiving feature for both free space requirements and expected file sizes written to the archive journaling repository when archive journaling is enabled and active.

      The MarkLogic environment used here was an out-of-the box version 9.x with one change of adding a new directory specific to storing the archive journal backup files.

      It is assumed that the reader of this article already has a basic understanding of the role of Journal Archiving in the Backup and Restore feature of MarkLogic Server. See references below for further details(below).

      How much free space is needed for the Archive Journal files in a backup?

      MarkLogic Server uses the forest size of the active forest to confirm whether the journal archive repository has enough free space to accommodate that forest, but if additional forests already exist on the same volume, then there may be an issue in the Server's "free-space" calculation as the other forests are never used in the algorithm that calculates the free space available for the backup and/or archive journal repositories. Only one forest is used in the free-space calculation.

      In other words, if multiple forests exist on the same volume, there may not be enough free space available on that specific volume due to the additional forests; especially during a high rate of ingestion. If that is the case, then it is advised to provide enough free space on that volume to accommodate the sizes of all the forests. Required Free Space(approximately) = (Number of Forests) x (Size of largest Forest).

      What can we expect to see in the journal archiving repository in terms of files sizes for specific ingestion types and sizes? That brings us to the other side.

      How is the Journal Archive repository filling up?

      1 MByte of raw XML data loaded into the server (as either a new document ingestion or a document update) will result in approximately 5 to 6 MBytes of data being written to the corresponding Journal Archive files.  Additionally, adding Range Indexes will contribute to a relatively small increase in consumed space.

      Ingesting/updating RDF data results in slightly less data being written to the journal archive files.

      In conclusion, for both new document ingestion and document updates, the typical expansion ratio of Journal Archive size to Input file size is between 5 an 6 but can be higher than that depending on the document structure and any added range indexes.

      References:

      Introduction

      Content processing applications often require multi-step processing. Each step in the process performs a particular task or set of tasks. The Content Processing Framework in MarkLogic Server supports these types of multi-step conversion processes. Sometimes during document delete operation, it is possible that the CPF action might fail with 'XDMP-CONFLICTINGUPDATES' error, which can be seen in document-properties file like:

      Sample message:

      <error:format-string>XDMP-CONFLICTINGUPDATES: xdmp:document-set-property("FILE-NAME", <cpf:state xmlns:cpf="http://marklogic.com/cpf">http://marklogic.com/states/deleted</cpf:state>) -- Conflicting updates xdmp:document-set-property("FILE-NAME", /cpf:state) and xdmp:document-delete("FILE-NAME")</error:format-string>

      This error message indicates that an update statement (for e.g. xdmp:document-set-property) is trying to update a document that is conflicting with other update occurring (e.g. xdmp:document-delete) in the same transaction.

       

      Detail

      Actions that want to delete the target URI need special handling because MarkLogic CPF also wants to keep track of progress in the properties, and just having document-delete [ xdmp:document-delete($cpf:document-uri) ]can't do that.

      Following are ways to achieve the expected behavior and get past the XDMP-CONFLICTINGUPDATES error:

      1) Performing a "soft delete" on the document and then let CPF take care of deleting the document. This can be done by setting the document status to "deleted" via cpf:document-set-processing-status API function. Setting the document's processing status to "deleted" will tell CPF to clean up the document and not update properties at the same time.

      cpf:document-set-processing-status( $uri-to-delete, "deleted" )

      Additional details can be found at: http://docs.marklogic.com/cpf:document-set-processing-status


      2) If you want to keep a record of the URI that is being deleted, you can delete its root node instead of the document. The CPF state will be able be recorded in document-properties, even if the document is gone.

      xdmp:node-delete(doc($uri-to-delete))

      Details at: http://docs.marklogic.com/xdmp:node-delete

      Introduction

      Sometimes, when a host is removed from a cluster in an improper manner -- e.g., by some means other than the Admin UI or Admin API, a remote host can still try to communicate with its old cluster, but the cluster will recognize it as a "foreign IP" and will log a message like the one below:

      2014-12-16 00:00:20.228 Warning: XDQPServerConnection::init(10.0.80.7:7999-10.0.80.39:44247): SVC-SOCRECV: Socket receive error: wait 10.0.80.7:7999-10.0.80.39:44247: Timeout

      Explanation: 

      XDQP is the internal protocol that MarkLogic uses for internal communications amongst the hosts in a cluster and it uses port 7999 by default. In this message, the local host 10.0.80.7 is receiveng socket connections from foreign host 10.0.80.39.

       

      Debugging Procedure, Step 1

      To find out if this message indicates a socket connection from an IP address that is not part of the cluster, the first place is to look is in the hosts.xml files. If the IP address in not found in the hosts.xml, then it is a foreign IP. In that case, the following are the steps will help to identify the the processes that are listening on port 7999.

       

      Debugging Procedure, Step 2

      To find out who is listening on XDQP ports, try running the following command in a shell window on each host:

            $ sudo netstat -tulpn | grep 7999

      You should only see MarkLogic as a listner:

           tcp 0 0 0.0.0.0:7999 0.0.0.0:* LISTEN 1605/MarkLogic

      If you see any other process listening on 7999, yopu have found your culprit. Shot down those processes and the messages will go away.

       

      Debugging Procedure, Step 3

      If the issue persists, run tcpdump to trace packets to/from "foreign" hosts using the following command:

           tcpdump -n host {unrecognized IP}

      Shutdown MarkLogic on those hosts. Also, shutdown any other applications that are using port 7999.

       

      Debugging Procedure, Step 4

      If the cluster are hosts on AWS, you may also want to check on your Elastic Load Balancer ports. This may be tricky, because instances will change IP addresses if they are rebooted, so  work with AWS Support to help you find the AMI or load balancer instance that is pinging your cluster.

      In the case that the "foreign host" is an elastic load balancer, be sure to remove port 7999 from its rotation/scheduler. In addition, you should set the load balancer to use port 7997 for the heartbeat functionality.

      Introduction

      Sometimes, when a cluster is under heavy load, your cluster may show a lot of XDQP-TIMEOUT messages in the error log. Often, a subset of hosts in the cluster may become so busy that the forests they host get unmounted and remounted repeatedly. Depending on your database and group settings, the act of remounting a forest may be very time-consuming, due to the fact that that all hosts in the cluster are being forced to do extra work of index detection.

      Forest Remounts

      Every time a forest remounts, the error log will show a lot messages like these:

      2012-08-27 06:50:33.146 Debug: Detecting indexes for database my-schemas
      2012-08-27 06:50:33.146 Debug: Detecting indexes for database Triggers
      2012-08-27 06:50:35.370 Debug: Detected indexes for database Last-Login: sln
      2012-08-27 06:50:35.370 Debug: Detected indexes for database Triggers: sln
      2012-08-27 06:50:35.370 Debug: Detected indexes for database Schemas: sln
      2012-08-27 06:50:35.370 Debug: Detected indexes for database Modules: sln
      2012-08-27 06:50:35.373 Debug: Detected indexes for database Security: sln
      2012-08-27 06:50:35.485 Debug: Detected indexes for database my-modules: sln
      2012-08-27 06:50:35.773 Debug: Detected indexes for database App-Services: sln
      2012-08-27 06:50:35.773 Debug: Detected indexes for database Fab: sln
      2012-08-27 06:50:35.805 Debug: Detected indexes for database Documents: ss, fp

      ... and so on ...

      This can go on for several minutes and will cost you more down time than necessary, since you already know the indexes for each database.

      Improving the situation

      Here are some suggestions for improving this situation:

      1. Browse to Admin UI -> Databases -> my-database-name
      2. Set ‘index detection’ to ‘none’
      3. Set ‘expunge locks’ to ‘none’

      Repeat steps 1-4 for all active databases.

      Now tweak the group settings to make the cluster less sensitive to an occasional busy host:

      1. Browse to Admin UI -> Groups -> E-Nodes
      2. Set ‘xdqp timeout’ to 30
      3. Set ‘host timeout’ to 90
      4. Click OK to make this change effective.

      The database-level changes tell the server to speed up cluster startup time when a server node is perceived to be offline. The group changes will cause the hosts on that group to be a little more forgiving before declaring a host to be offline, thus preventing forest unmounting when it's not really needed.

      If after performing these changes, you find that you are still experiencing XDQP-TIMEOUT's, the next step is to contact MarkLogic Support for assistance. You should also alert your Development team, in case there is a stray query that is causing the data nodes to gather too many results.

      Related Reading

      XML Data Query Protocol (XDQP)

      Introduction

      Under normal operations, only a single user object is created for a user-name. However, when users are migrated from another security database and if the recommend checking is not performed, duplicate user-names might be created.

      Resolution

      When there are duplicate user-names in the database, you may see the following message on the Admin UI or in the error logs:

      500: Internal Server Error
      XDMP-AS: (err:XPTY0004) get-element($col, "sec:user", "sec:user-name", $user-name, "SEC-USERDNE") -- Invalid coercion: (fn:doc("http://marklogic.com/xdmp/users/*******")/sec:user, fn:doc("http://marklogic.com/xdmp/users/*******")/sec:user) as element()?

       

      To fix duplicate user-names, the extra security object that is created needs to be removed. You can delete one of the extra security objects, which should have a URI similar to:


      http://marklogic.com/xdmp/users/******* where "*******" represents the user-id's.

       

      To resolve the issue, follow the below steps:

      1. Perform a backup of your Security database in case manual recovery is required.

      2. Login to the QConsole with admin credentials.

      3. Select "Security" database as the content-source

      4. Delete the security object by executing xdmp:document-delete($uri) with $uri set to the Uri of the duplicate user.

      Introduction

      For hosts that don't use a standard US locale (en_US) there are instances where some lower level calls will return data that cannot be parsed by MarkLogic Server. An example of this is shown with a host configured with a different locale when making a call to the Cluster Status page (cluster-status.xqy):

      XDMP-LEXVAL exception

      The problem

      The problem you have encountered is a known issue: MarkLogic Server uses a call to strtof() to parse the values as floats:

      http://linux.die.net/man/3/strtof

      Unfortunately, this uses a locale-specific decimal point. The issue in this environment is likely due to the Operating System using a numeric locale where the decimal point is a comma, rather then a period.

      Resolving the issue

      The workaround for this is as follows:

      1. Create a file called /etc/marklogic.conf (unless one already exists)

      2. Add the following line to /etc/marklogic.conf:

      export LC_NUMERIC=en_US.UTF-8

      After this is done, you can restart the MarkLogic process so the change is detected and try to access the cluster status again.

      Summary

      This Knowledgebase article outlines the necessary steps required in importing an existing (pre-signed) Certificate into MarkLogic Server and configuring a MarkLogic Application Server to utilize that certificate.

      Existing (Pre-signed) Certificate vs. Certificate Request Generated by MarkLogic

      MarkLogic will allow you to use an existing certificate or will allow you to generate a Certificate Request. The key difference between above two lies in who generates public-private keys and other fields in the certificate.

      For a Pre-Signed Certificate: In this instance, the keys already exist outside of MarkLogic Server, and 3rd party tool would have populated CN (Common Name) and other subject fields to generate Certificate Request File (.csr) containing a public key.

      For a Certificate Request Generated by MarkLogic: In this instance, new keys are generated by MarkLogic Server (it does this while creating the new template), while CN and other fields are added by the MarkLogic Server Administrator (or user) through the web-based MarkLogic admin GUI during New Certificate Template creation.

      The section in MarkLogic's online documentation on Creating a Certificate Template covers the steps required to generate a certificate template from within MarkLogic Server: http://docs.marklogic.com/guide/security/SSL#id_35140

        

      Steps to Import Pre-Signed Certificate and Key into MarkLogic

      1) Create a Certificate Template 

      Create a new Certificate Template with the fields similar to your existing Pre-Signed Certificate

      For example, your current Certificate file - presigned.marklogic.com.crt

      [amistry@engrlab18-128-026 PreSignedCert]$ openssl x509 -in ML.pem -text 
      Certificate:
          Data:
              Version: 1 (0x0)
              Serial Number: 7 (0x7)
          Signature Algorithm: sha1WithRSAEncryption
              Issuer: C=US, ST=CA, L=San Carlos, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic CA
              Validity
                  Not Before: Nov 30 04:12:33 2015 GMT
                  Not After : Nov 29 04:12:33 2017 GMT
              Subject: C=US, ST=NJ, L=Princeton, O=DemoLab Corporation, OU=Engineering, CN=presigned.engrlab.marklogic.com
              Subject Public Key Info:
                  Public Key Algorithm: rsaEncryption
                      Public-Key: (1024 bit)
       
       
      For above Certificate we will create below Custom Template in Admin GUI -> Configure-> Security -> Certificate Template  Create Tab as below.
      We will save our new template as - "DemoLab Corporation Template"
       
       
       Template.jpg

      Note - Above fields are placeholders only for signed Certificate, and MarkLogic mainly uses above fields to generate Certificate Signing Request (.csr). For Certificate request generated by 3rd party tool, it does NOT matter if template field matches exactly with final signed Certificate or not.

      Once we have Signed Certificate imported, App Server will use the Signed Certificate, and the SSL Client will only see field values from the Signed Certificate (even if they are different from Template Config page ).

      2) Create an HTTPS App Server

      Please follow Procedures for Enabling SSL on App Servers except for the "Creating Certificate Template" part as we have created the Template to match our existing pre-signed Certificate. 

      3) Verify Pre-signed Certificate and Private Key file 

      Prior to installing a pre-signed certificate and private key the following verification should be performed to ensure that both certificate and key are valid and are in the correct format. 

      * Generate and display the certificate checksum using the OpenSSL utility

      [admin@sitea ~]# openssl x509 -noout -modulus -in cert.pem | openssl md5

      (stdin)= 2ddd2ca48ad2eb4eba082f5da3fd33ab

      * Generate and display the private key checksum

      [admin@siteaa ~]# openssl rsa -noout -modulus -in key.key | openssl md5

      (stdin)= 2ddd2ca48ad2eb4eba082f5da3fd33ab

      The checksum from both commands should return identical values, if the values do not match or if you are prompted for additional information such as the private key password then the certificate and private keys are not valid and should be corrected before proceeding.

      Note: Proceeding to the next step without verifying the certificate and the private key could lead to the MarkLogic server being made inaccessible. 

      4) Install Pre-signed Certificate and Key file to Certificate Template using Query Console

      Now since Certificate was pre-signed, MarkLogic does not have a key that goes along with that Pre-signed Certificate. We will install Pre-signed Certificate and Key into MarkLogic using below XQuery in Query Console.

      Note: Query Must be run against Security Database. 

      Please change the Certificate Template-Name, and Certificate/Key File location in below XQuery to reflect values from your environment.

      xquery version "1.0-ml";
      import module namespace pki = "http://marklogic.com/xdmp/pki" at "/MarkLogic/pki.xqy";
      import module namespace admin = "http://marklogic.com/xdmp/admin" at "/MarkLogic/admin.xqy";
      
      (: Update Template name for your environment :)
      let $templateid := pki:template-get-id(pki:get-template-by-name("TemplateName"))
      (: Path on the MarkLogic host that is readable by the MarkLogic server process (default daemon) :)
      (:   File suffix could also be .txt or other format :)
      let $path-to-cert := "/cert.pem"
      let $path-to-key := "/key.key"
      
      return
      pki:insert-host-certificate($templateid,
        xdmp:document-get($path-to-cert,
          <options xmlns="xdmp:document-get"><format>text</format></options>),
        xdmp:document-get($path-to-key,
          <options xmlns="xdmp:document-get"><format>text</format></options>)
      )
      

       Above will associate our pre-signed Certificate and Key into Template created earlier, which is linked to HTTPS App Server.

      Important note: pki:insert-trusted-certificates can also be used in place of pki:insert-host-certificate in the above example.

      Introduction

      This article discusses the effects of the incremental backup implementation on Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO).

      Details

      With MarkLogic 8 you can have multiple daily incremental backups with minimal impact on database performance.

      Incrementals complete more quickly than full backups reducing the backup window. A smaller backup window enables more frequent backups, reducing the RPO of the database in case of disaster.

      However, RTO can be longer when using incremental backups compared to just full backups, because multiple backups must be restored to recover.

      There are two modes of operation when using incremental backups:

      Incremental since last full. Here, each incremental has to store all the data that has changed since the last full backup. Since a restore only has to go through a single incremental data set, the server is able to perform a faster restore.  However, each incremental data set is bigger and takes longer to complete than the previous data set because it stores all changes that were included in the previous incremental.

      Please note when doing “Incremental since last full”:-

      - Create a new incremental backup directory for each incremental backup

      - Call database-incremental-backup with incremental-dir set to the new incremental backup directory

       

      Incremental since last incremental.  In this case, a new incremental stores only changes since the last incremental, also known as delta backups. By storing only the changes since the last incremental, the incremental backup sets are smaller in size and are faster to complete.  However, a restore operation would have to go through multiple data sets.

      Please note when doing “Incremental since last incremental”:-

      - Create an incremental backup directory ONCE

      - Call database-incremental-backup with the same incremental backup directory.

      See also the documentation on Incremental Backup.

       

       

      Indexing Best Practices

      MarkLogic Server indexes records (or documents/fragments) on ingest. When a database's index configuration is changed, the server will consequently reindex all matching records.

      Indexing and reindexing can be a CPU and I/O intensive operation. Reindexing creates a lot of new fragments, with the original fragments being marked for deletion. These deleted fragments will then need to be merged out. All of this activity can potentially affect query performance, especially in systems with under-provisioned hardware.

      Reindexing in Production

      If you need to add or modify an index on a production cluster, consider scheduling the reindex during a time when your cluster is less busy. If your database is too large to completely reindex during a single period of low usage, consider running the reindex over several periods of time. For example, if your low usage period is during a weekend, the process may look like:

      • Change your index configuration on a Friday night
      • Let the reindex run for most of the weekend
      • To pause the reindex, set the reindexer-enable field to 'false' for the database being reindexed. Be sure to allow sufficient time for the associated merging to complete before system load comes back.
      • If needed, reindexing can continue over the next weekend - the reindexer process will pick up where it left off before it was disabled.

      You can refer to https://help.marklogic.com/Knowledgebase/Article/View/18/15/how-reindexing-works-and-its-impact-on-performance for more details on invoking reindexing on production.

            When you have Database Replication Configured

      If you have to add or modify indexes on a database which has database replication configured, make sure the same changes are made on the Replica cluster as  well. Starting with ML server version 9.0-7, index data is also replicated from the Master to the Replica, but it does not automatically check if both sides have the same index settings. Reindexing is disabled by default on a replica cluster. However, when database replication configuration is removed (such as after a disaster),  the replica database will reindex as necessary. So it is important that the Replica database index configuration matches the Master’s to avoid unnecessary reindexing.

      Further reading -

      Master and Replica Database Index Settings

      Database Replication - Indexing on Replica Explained

      Avoid Unused Range Indexes, Fields, and Path Indexes

      In addition to taking up extra disk space, Range, Field, and Path Indexes require extra work when it's time to reindex. Field and Path indexes may also require extra indexing passes.

      Avoid Using Namespaces to Implement Multi-Tenancy

      It's a common use case to want to create some kind of partition (or multiple partitions) between documents in a particular database. In such a scenario it's far better to 1) constrain the partitioning information to a particular element in a document (then include a clause over that element in your searches), than it is to 2) attempt to manage partitions via unique element namespaces corresponding to each partition. For example, given two documents in two different partitions, you'll want them to look like this:

      1a. <doc><partition>partition1</partition><name>Joe Smith</name></doc>

      1b. <doc><partition>partition2</partition><name>John Smith</name></doc>

      ...vs. something like this:

      2a. <doc xmlns:p="http://partition1"><p:name>Joe Smith</p:name></doc>

      2b. <doc xmlns:p="http://partition2"><p:name>John Smith</p:name></doc>

      Why is #1 better? In terms of searching the data once it's indexed, there's actually not much of a difference - one could easily create searches to accommodate both approaches. The issue is how the indexing works in practice. MarkLogic Server indexes all content on ingest. In scenario #2, every time a new partition is created, a new range element index needs to defined in the Admin UI, which means your index settings have changed, which means the server now needs to reindex all of your content - not just the documents corresponding to the newly introduced partition. In contrast, for scenario #1, all that would need to be done is to ingest the documents corresponding to the new partition, which would then be indexed just like all the other existing content. There would be a need, however, to change the searches in scenario #1, as they would not yet include a clause to accommodate the new partition (for example: cts:element-value-query(xs:QName("partition"), "partition2")) - but the overall impact of adding a partition is changing the searches in scenario #1, which is ultimately far, far less intrusive a change than reindexing your entire database as would be required in scenario #2. Note that in addition to a database-wide reindex, searches would also need to change in scenario #2, as well.

      Keep an Eye on I/O Throughput

      Reindexing can lead to heavy merge activity and may lead to disk I/O bottlenecks if not managed carefully. If you have a system that is available 24-7 with no downtime window, then you may need to throttle the reindexer in order to keep the disk I/O to a minimum. We suggest the following database settings for reindexing a system that must always remain in use:

      • reindexer-throttle = 3
      • large-size-threshold = 1048576

      You can also adjust the following group settings to help limit background I/O:

      • background-io-limit = 100

      This will limit the background I/O for that group to 100 MB/sec per host across all hosts in that group. This should only be configured if merges are causing problems—it is a way of throttling back the I/O used by the merging process.This is good starting point, and may be increased in increments of 50 if you find that your merges are progressing too slowly.  Proceed with caution as too low of a background IO limit can have negative performance or even catastrophic consequences

      General Recommendations

      In general, your indexing/reindexing and subsequent search experience will be better if you

      Summary

      MarkLogic Admin GUI is convenient place to deploy the Normal Certificate infrastructure or use the Temporary Certificate generated by MarkLogic. However for certain advance solutions/deployment we need XQuery based admin operations to configure MarkLogic.

      This knowledgebase discusses the solution to deploy SAN or Wildcard Certificate in 3 node (or more) cluster.

       

      Certificate Types and MarkLogic Default Config

      Certificate Types

      In general, When browsers connect to a Server using HTTPS, they check to make sure your SSL Certificate matches the host name in the address bar. There are three ways for browsers to find a match:

      a).The host name (in the address bar) exactly matches the Common Name in the certificate's Subject.

      b).The host name matches a Wildcard Common Name. Please find example at end of article. 

      c).The host name is listed in the Subject Alternative Name (SAN) field as part of X509v3 extensions. Please find example at end of article.

      The most common form of SSL name matching is for the SSL client to compare the server name it connected to with the Common Name (CN field) in the server's Certificate. It's a safe bet that all SSL clients will support exact common name matching.

      MarkLogic allows this common scenario (a) to be configured from Admin GUI, and we will discuss the Certificate featuring (b) and (c) deployment further.

      Default Admin GUI based Configuration 

      By default, MarkLogic generates Temporary Certificate for all the nodes in the group for current cluster when Template is assigned to MarkLogic Server ( Exception is when Template assignment is done through XQuery ).

      The Temporary Certificate generated for each node do have hostname as CN field for their respective Temporary Certificate - designed for common Secnario (a).

      We have two path to install CA signed Certificate in MarkLogic

      1) Generate Certificate request, get it signed by CA, import through Admin GUI

      or 2) Generate Certificate request + Private Key outside of MarkLogic, get Certificate request signed by CA, import Signed Cert + Private Key using Admin script

      Problem Scenario

      In both of the above cases, while Installing/importing Signed Certificate, MarkLogic will look to replace Temporary Certificate by comparing CN field of Installed Certificate with Temporary Certificaet CN field.

      Now, if we have WildCard Certificate (b) or SAN Certificate (c), our Signed Certificate's CN field will never match Temporary Certificate CN field, hence MarkLogic will Not remove Temporary Certificates - MarkLogic will continue using Temporary Certificate.

       

      Solution

      After installing SAN or wildcard Certificate, we may run into AppServer which still uses Temporary installed Certificate ( which was not replaced while installing SAN/wild-card Certificate).

      Use below XQuery against Security DB to remove all Temporary Certificates. XQuery needs uri lexicon to be enabled (default enabled). [Please change the Certificate Template-Name in below XQuery to reflect values from your environment.] 

      xquery version "1.0-ml";
      
      import module namespace pki = "http://marklogic.com/xdmp/pki"  at "/MarkLogic/pki.xqy";
      import module namespace admin = "http://marklogic.com/xdmp/admin"  at "/MarkLogic/admin.xqy";
            
      
      let $hostIdList := let $config := admin:get-configuration()
                         return admin:get-host-ids($config)
                           
      for $hostid in $hostIdList
      return
        (: FDQN name matching Certificate CN field value :)
        let $fdqn := "TestDomain.com"
      
        (: Change to your Template Name string :)
        let $templateid := pki:template-get-id(pki:get-template-by-name("YourTemplateName"))
      
        for $i in cts:uris()
        where 
        (   (: locate Cert file with Public Key :)
            fn:doc($i)//pki:template-id=$templateid 
            and fn:doc($i)//pki:authority=fn:false()
            and fn:doc($i)//pki:host-name=$fdqn
        )
        return <h1> Cert File - {$i} .. inserting host-id {$hostid}
        {xdmp:node-insert-child(doc($i)/pki:certificate, <pki:host-id>{$hostid}</pki:host-id>)}
        {
            (: extract cert-id :)
            let $certid := fn:doc($i)//pki:certificate/pki:certificate-id
            for $j in cts:uris()
            where 
            (
                (: locate Cert file with Private key :)
                fn:doc($j)//pki:certificate-private-key/pki:template-id=$templateid 
                and fn:doc($j)//pki:certificate-private-key/pki:certificate-id=$certid
            )
            return <h2> Cert Key File - {$j}
            {xdmp:node-insert-child(doc($j)/pki:certificate-private-key,
              <pki:host-id>{$hostid}</pki:host-id>)}
            </h2>
        } </h1>
      

      Above will remove all Temporary Certificates (including Template CA) and their private-key, leaving only Installed Certificate associated with Template, forcing all nodes to use Installed Certificate. 

       

      Example: SAN (Subject Alternative Name) Certificate

      For 3 node cluster (engrlab-128-101.engrlab.marklogic.com, engrlab-128-164.engrlab.marklogic.com, engrlab-128-130.engrlab.marklogic.com)

      $ opensl x509 -in ML.pem -text -noout
      Certificate:
          Data:
              Version: 3 (0x2)
              Serial Number: 9 (0x9)
              Signature Algorithm: sha1WithRSAEncryption
              Issuer: C=US, ST=NY, L=NewYork, O=MarkLogic, OU=Engineering, CN=Support CA
              Validity
                  Not Before: Apr 20 19:50:51 2016 GMT
                  Not After : Jun  6 19:50:51 2018 GMT
              Subject: C=US, ST=NJ, L=Princeton, O=MarkLogic, OU=Eng, CN=TestDomain.com
              Subject Public Key Info:
                  Public Key Algorithm: rsaEncryption
                  RSA Public Key: (1024 bit)
                      Modulus (1024 bit):
                          00:97:8e:96:73:16:4a:cd:99:a8:6a:78:5e:cb:12:
                          5d:e5:36:42:d2:b8:52:51:53:6c:cf:ab:e4:c6:37:
                          2c:15:12:80:c1:1b:53:29:4c:52:76:84:80:1d:ee:
                          16:41:a6:31:c5:7b:0d:ca:d7:e5:da:d7:67:fe:80:
                          89:9f:0d:bc:46:4f:f0:7e:46:88:26:d5:a0:24:a6:
                          06:d1:fa:c0:c7:a2:f2:11:7f:5b:d5:8d:47:94:a8:
                          06:d9:46:8f:af:dd:31:d5:15:d2:7a:13:39:3e:81:
                          32:bd:5c:bd:62:9d:5a:98:1d:20:0e:30:d4:57:3f:
                          7f:89:e6:20:ae:88:4d:85:d7
                      Exponent: 65537 (0x10001)
              X509v3 extensions:
                  X509v3 Key Usage: 
                      Key Encipherment, Data Encipherment
                  X509v3 Extended Key Usage: 
                      TLS Web Server Authentication
                  X509v3 Subject Alternative Name: 
                      DNS:engrlab-128-101.engrlab.marklogic.com, DNS:engrlab-128-164.engrlab.marklogic.com, DNS:engrlab-128-130.engrlab.marklogic.com
          Signature Algorithm: sha1WithRSAEncryption
              52:68:6d:32:70:35:88:1b:70:df:3a:56:f6:8a:c9:a0:9d:5c:
              32:88:30:f4:cc:45:29:7d:b5:35:18:a0:9a:45:37:e9:22:d1:
              c5:50:1d:50:b8:20:87:60:9b:c1:d6:a8:0c:5a:f2:c0:68:8d:
              b9:5d:02:10:39:40:b3:e5:f6:ae:f3:90:31:57:4c:e0:7f:31:
              e2:79:e6:a8:c0:e6:3f:ea:c5:75:67:3e:cd:ea:88:5d:60:d6:
              01:59:3c:dc:e0:47:96:3b:59:4a:13:85:bb:87:70:d0:a2:6b:
              0f:d4:84:1d:d1:be:e8:a5:67:c3:e3:59:05:0d:5d:a5:86:e6:
              e4:9e

      Example: Wild-Card Certificate

      For 3 node cluster (engrlab-128-101.engrlab.marklogic.com, engrlab-128-164.engrlab.marklogic.com, engrlab-128-130.engrlab.marklogic.com). 

      $ openssl x509 -in ML-wildcard.pem -text -noout
      Certificate:
          Data:
              Version: 1 (0x0)
              Serial Number: 7 (0x7)
              Signature Algorithm: sha1WithRSAEncryption
              Issuer: C=US, ST=NY, L=NewYork, O=MarkLogic, OU=Engineering, CN=Support CA
              Validity
                  Not Before: Apr 24 17:36:09 2016 GMT
                  Not After : Jun 10 17:36:09 2018 GMT
              Subject: C=US, ST=NJ, L=Princeton, O=MarkLogic Corporation, OU=Engineering Support, CN=*.engrlab.marklogic.com
       

      Introduction

      Okta provides secure identity management and single sign-on to any application, whether in the cloud, on-premises or on a mobile device.

      The following procedure describes the procedure required to integrate MarkLogic with Okta identity management and Microsoft Windows Active Directory using the Okta AD Agent.

      This document assumes that the users accessing MarkLogic are defined in the Windows Active Directory only and do not currently have Okta User Profiles defined.

      Authentication Flow

       The authentication flow in this scenario will be as follows:

      1. The user opens a Browser connection to the Site Single Sign-On Portal page.
      2. The user enters their Active Directory credentials
      3. Okta verifies the user credentials using the Okta LDAP Agent
      4. If successful, the user is presented with a selection of applications they can sign-on to.
      5. The user selects the required application and Okta completes the sign-on using the stored user credentials.

      Requirements

      • MarkLogic Server version 8 or 9
      • Okta Admin account access
      • Okta AD Agent
      • Active Directory Server

      For the purpose of this document the following Active Directory user entry will be used as an example:

      # LDAPv3
      # base <dc=MarkLogic,dc=Local> with scope subtree
      # filter: (sAMAccountName=martin.warnes)
      # requesting: *
      #
      
      # Martin Warnes, Users, marklogic.local
      dn: CN=Martin Warnes,CN=Users,DC=marklogic,DC=local
      objectClass: top
      objectClass: person
      objectClass: organizationalPerson
      objectClass: user
      cn: Martin Warnes
      sn: Warnes
      givenName: Martin
      distinguishedName: CN=Martin Warnes,CN=Users,DC=marklogic,DC=local
      sAMAccountName: martin.warnes
      memberOf: CN=mladmins,CN=Users,DC=marklogic,DC=local
      sAMAccountType: 805306368
      userPrincipalName: martin.warnes@marklogic.local

      Notes

      1. By default, Okta uses the email address as the username, however, MarkLogic usernames cannot contain certain special characters such as the @ symbol so the sAMAccountName will be used to sign-on on to MarkLogic. This will be configured later during the Okta Application definition.
      2. One or more memberOf attributes should be assigned to the Active Directory user entry and these will be used to assign MarkLogic Roles without requiring the need to configure duplicate user entries in the MarkLogic security database.

      Step 1. Create a MarkLogic External Security definition

       An External Security definition is required to authenticate and authorize Okta users against a Microsoft Windows Active Directory server.

       Full details on configuring an external security definition can be found at:

       https://docs.marklogic.com/8.0/guide/security/external-auth

       You should ensure that both “authentication” and “authorization” are set to “ldap”, for details on the remaining settings you should consult your Active Directory administrator.

      Step 2. Assign Active Directory group membership to MarkLogic Roles

      In order to assign the correct Roles and Permission to Okta users, you will need to map Active Directory memberOf attributes to MarkLogic rolls.

      In my example Active Directory user entry martin.warnes belongs to the following Group:

       memberOf: CN=mladmins,CN=Users,DC=marklogic,DC=local

      To ensure that all members of this Group are assigned MarkLogic Admin roles you simply need to add the memberOf attribute value as an external name in the admin role as below:

      Step 3. Configure the MarkLogic AppServer

      For each App Server that you wish to integrate with Okta, you will need to set the “authentication” to “basic” and select the “external security” definition.

      As HTTP Basic Authentication is considered insecure it is highly recommended that you secure the AppServer connection using HTTPS by configuring and selecting a “SSL certificate template”.

       Further details on configuring SSL for AppServers can be found at:

       https://docs.marklogic.com/8.0/guide/admin/SSL

      Step 4. Install and Configure Okta AD Integration

      In order for Okta to authenticate your Active Directory users, you will first need to download and install the Okta AD Agent using the following instructions supplied by Okta

      https://support.okta.com/help/Documentation/Knowledge_Article/Install-and-Configure-the-Okta-Active-Directory-Agent-1689483166

       Once installed your Okta Administrator will be able to complete the AD Agent configuration to select which AD users to import into Okta.

      Step 5. Create Okta MarkLogic application

      From the Okta Administrator select “Add Application”, search for the Basic Authentication template and click “Add

      On the “General Settings” tab, enter the MarkLogic AppServer URL, ensure to use HTTP or HTTPS depending on whether you have chosen to secure the listening port using TLS.

       Check the “Browser plugin auto-submit” option.

      On the Sign-On options panel select “Administrator sets username, password is the same as user’s Okta password

       For “Application username format” select “AD SAM Account name” from the drop-down selection.

      Once the Okta application is created you should assign the users permitted to access the application

      When assigning a user, you will be prompted to check the AD Credentials, at this point you should just check that Okta has selected the correct "sAMAccountName" value, the password will not be modifiable.

      Repeat Step 5. for each AppServer you wish to access via the Okta SSO portal.

      Step 6. Sign-on to Okta SSO Portal

      All assigned MarkLogic applications should be shown:

      Selecting one of the MarkLogic applications should automatically log you in using your AD Credentials stored within Okta.

      Additional Reading

      Introduction

      MarkLogic server provides pre-commit or post-commit triggers and these triggers listens for certain events to occur and then invokes a configured XQuery module to run after event occurs. It is a common use case to create a common function in a library module which is shared among different trigger modules called by various triggers. This article shows an example to create and use such a shared library module in a post-commit trigger.

      Example

      This example shows a simple post commit trigger that fires when a new document is created.

      1. For this example create a database 'minidb' and after that set its triggers database as self (minidb). Also, create another database 'minimodule' to store all modules.

      2. Using Query Console, create a trigger using trigger definition by evaluating below XQuery against triggers database (minidb)

      3. Create a module by running below XQuery against modules database:

      4. Insert a library module into the modules database (minimodules):

      5. Now insert the sample document into the content database (minidb):

      6. Check output in logs:

      After a new document having its URI prefixed with "/mini" is inserted into the content database, TaskServer Logs file logs the below message:

      2018-04-25 11:40:50.224 Info: *****Document with /mini root /mini/test-25-1-1.xml was created.*****2018-04-25T11:40:50+05:30

      NOTE: Module imports are relative to root.

      References:

      1. Creating and Managing Triggers With triggers.xqy - https://docs.marklogic.com/guide/app-dev/triggers

      Introduction

      We are always looking for ways to understand and address performance issues within the product and we are addressing this by adding the following new diagnostic features to the product.

      New Trace Events in MarkLogic Server

      Some new diagnostic trace events have been added to MarkLogic Server:

      • Background Time Statistics - Background thread period and further processing timings are added to xdmp:host-status() output if this trace event is set.
      • Journal Lag 30 - A forest will now log a warning message if a frame takes more than 30 seconds to journal.
        • Please note that this limit can be adjusted down by setting the Journal Lag # trace event (where # is {1, 2, 5 or 10} seconds).
      • Canary Thread 10 - A new "canary thread" that does nothing but sleep for a second and check how long is was since it went to sleep.
        • It will log messages if the interval between sleeping has exceeded 10 seconds.
        • This can be adjusted down by setting the Canary Thread # trace event (where # is {1, 2, 5 or 10} seconds).
      • Canary Thread Histogram - Adding this trace event will cause MarkLogic to write to the ErrorLog a histogram of timings once every 10 minutes.
      • Forest Fast Query Lag 10 - By default, a forest will now warn if the fast query timestamp is lagging by more than 30 seconds.
        • This can be adjusted down by setting the Forest Fast Query Lag # (where # is {1, 2, 5, or 10} seconds).
        • Note that Warning level messages will be repeatedly logged at intervals while the lag limit is exceeded, with the time between logged messages doubling until it reaches 60 seconds.
        • There will be a final warning when the lag drops below the limit again as a way to bracket the period of lag.

      Examples of some of new statistics can be viewed in the Admin UI by going to the following URL in a browser (replacing hostname with the name of a node in your cluster and replacing TheDatabase with the name of the database that you would like to monitor):

      You can clear the forest insert and journal statistics by adding clear=true to your request; executing the following in a browser:

      These changes now feature in the current releases of both MarkLogic 7 and MarkLogic 8 and are available for download from our developer website:

      Hints for interpreting new diagnostic pages

      Here's some further detail on what the numbers mean.

      First, a note about how bucketing is performed on these diagnostic pages:

      For each operation category (e.g. Timestamp Wait, Semaphore, Disk), the wait time will fall into a range of values, which need to be bucketed.

      The bucketing algorithm starts with 1000 buckets to cover the whole range, but then collapses them into a small set of buckets that cover the whole span of values. The algorithm aims to

      1. End up with a small number of buckets

      2. Include extreme (out-liers) values

      3. Spread out multiple values so that they are not too "bunched-up" and are therefore easier to interpret.

      Forest Journal Statistics (http://hostname:8001/forest-journal-statistics.xqy?database=TheDatabase)

      When we journal a frame, there are a sequence of operations.

      1. Wait on a semaphore to get access to the journal.
      2. Write to the journal buffer (possibly waiting for I/O if exceeding the 512k buffer)
      3. Send the frame to replica forests
      4. Send the frame to journal archive/database replica forests
      5. Release the semaphore so other threads can access the journal
      6. Wait for everything above to complete, if needed.
        1. If it's a synchronous op (e.g. prepare, commit, fast query timestamp), we wait for disk I/O
        2. If there are replica forests, we wait for them to acknowledge that they have journaled and replayed.
        3. If the journal archive or database replica is lagged, wait for it to no longer be lagged.

      We note the wall clock time before/after these various options, so we can track how long they're taking.

      On the replica side, we also measure the "Journal Replay" time which would be inserting into the in-memory stand, committing, etc.

      Here's an example for a master and its replica.

      Forest F-1-1

      Timestamp Wait
      Bucket (ms)Count%CumulativeCumulative %
      0..9 280 99.64 280 99.64
      50..59 1 0.36 281 100.00
      Semaphore
      Bucket (ms)Count%CumulativeCumulative %
      0..9 816 100.00 816 100.00
      Disk
      Bucket (ms)Count%CumulativeCumulative %
      0..9 204 99.51 204 99.51
      10..19 1 0.49 205 100.00
      Local-Disk Replication
      Bucket (ms)Count%CumulativeCumulative %
      0..9 804 99.26 804 99.26
      10..119 6 0.74 810 100.00
      Journal Archive

      No Information

      Database Replication

      No Information

      Journal Total
      Bucket (ms)Count%CumulativeCumulative %
      0..9 810 99.26 810 99.26
      10..119 6 0.74 816 100.00
      Journal Replay

      No Information

      Forest F-1-1-R

      Timestamp Wait

      No Information

      Semaphore
      Bucket (ms)Count%CumulativeCumulative %
      0..9 811 100.00 811 100.00
      Disk
      Bucket (ms)Count%CumulativeCumulative %
      0..9 203 99.02 203 99.02
      10..59 2 0.98 205 100.00
      Local-Disk Replication

      No Information

      Journal Archive

      No Information

      Database Replication

      No Information

      Journal Total
      Bucket (ms)Count%CumulativeCumulative %
      0..9 809 99.75 809 99.75
      10..59 2 0.25 811 100.00
      Journal Replay
      Bucket (ms)Count%CumulativeCumulative %
      0..9 807 99.63 807 99.63
      10..119 3 0.37 810 100.00

      Forest Insert Statistics (http://hostname:8001/forest-insert-statistics.xqy?database=TheDatabase)

      When we're inserting a fragment into an in-memory stand, we also have a sequence of operations.

      1. Wait on a semaphore to get access to the in-memory stand.
      2. Wait on the insert throttle (e.g. if there are too may stands)
      3. Wait for the stand's journal semaphore, to serialize with the previous insert if needed.
      4. Release the stand insert semaphore.
      5. Journal the insert.
      6. Release the stand journal semaphore.
      7. Start the checkpoint task if the stand is full.

      As with the journal statistics, we note the wall clock time between these operations so we can track how long they're taking.

      On the replica side, the behavior is similar, although the journal and insert are in reverse order (we journal before inserting into the in-memory stand). If it's a database replica forest, we also have to regenerate the index information (Filled IPD).

      Here is a example for a master and its replica.

      Forest F-1-1

      Journal Throttle
      Bucket (ms)Count%CumulativeCumulative %
      0..9 606 100.00 606 100.00
      Insert Sem
      Bucket (ms)Count%CumulativeCumulative %
      0..9 604 99.67 604 99.67
      80..199 2 0.33 606 100.00
      Filled IPD

      No Information

      Stand Throttle
      Bucket (ms)Count%CumulativeCumulative %
      0..9 606 100.00 606 100.00
      Stand Insert
      Bucket (ms)Count%CumulativeCumulative %
      0..9 605 99.84 605 99.84
      100..109 1 0.17 606 100.00
      Journal Sem
      Bucket (ms)Count%CumulativeCumulative %
      0..9 604 99.67 604 99.67
      10..119 2 0.33 606 100.00
      Journal
      Bucket (ms)Count%CumulativeCumulative %
      0..9 603 99.50 603 99.50
      10..119 3 0.50 606 100.00
      Total
      Bucket (ms)Count%CumulativeCumulative %
      0..9 597 98.51 597 98.51
      10..19 6 0.99 603 99.50
      200..229 3 0.50 606 100.00

      Forest F-1-1-R

      Journal Throttle

      No Information

      Insert Sem
      Bucket (ms)Count%CumulativeCumulative %
      0..9 606 100.00 606 100.00
      Filled IPD

      No Information

      Stand Throttle
      Bucket (ms)Count%CumulativeCumulative %
      0..9 606 100.00 606 100.00
      Stand Insert
      Bucket (ms)Count%CumulativeCumulative %
      0..9 605 99.84 605 99.84
      110..119 1 0.17 606 100.00
      Journal Sem
      Bucket (ms)Count%CumulativeCumulative %
      0..9 606 100.00 606 100.00
      Journal

      No Information

      Total
      Bucket (ms)Count%CumulativeCumulative %
      0..9 605 99.84 605 99.84
      110..119 1 0.17 606 100.00

      Further reading

      To learn more about diagnostic trace events, please refer to our documentation and Knowledgebase articles and note that some trace events may only log information if logging is set to debug:

      Summary

      The jemalloc library is included with the MarkLogic install and is recommended to use as it has shown a performance boost over the default Linux malloc library.  It is included with the MarkLogic server install and is configured to be used by default. 

      There have been cases where even if configured, the library is not used.  This article will give possible solutions to debug that.

      Diagnostics

      ErrorLog message on startup if jemalloc is not allocated:

      Warning: Memory allocator is not jemalloc; check /etc/sysconfig/MarkLogic

      Solutions

      1) Make sure to use superuser shell or sudo and run the 'service MarkLogic restart'

      2) Verify that the jemalloc library is present in the install directory (ie /opt/MarkLogic/lib/libjemalloc.so.1).

      3) Has the /etc/sysconfig/MarkLogic configuration file been modified from the default?  Try setting the configuration file back to the default and restarting the server.

      4) Confirm that /etc/sysconfig/MarkLogic contain the following lines:
      # preload jemalloc
      if [ -e $MARKLOGIC_INSTALL_DIR/lib/libjemalloc.so.1 ]; then
         export LD_PRELOAD=$MARKLOGIC_INSTALL_DIR/lib/libjemalloc.so.1
      fi

      Details

      For more information on the jemalloc library, please review the article provided by Facebook Engineering

      https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919/

      Introduction

      This article compares JSON support in MarkLogic Server versions 6, 7, and 8, and the upgrade path for JSON in the database.

      How is native JSON different than the previous JSON support?

      Previous versions of MarkLogic Server provided XQuery APIs that converted between JSON and XML. This translation is lossy in the general case meaning developers were forced to make compromises on either or both ends of the transformation. Even though the transformation was implemented in C++ it still added significant overhead to ingestion. All of these issues go away with JSON as a native document format. 

      How do I upgrade my JSON façade data to native JSON?

      For applications that use the previous JSON translation façade (for example: through the Java or REST Client APIs), MarkLogic 8 comes with sample migration scripts to convert JSON stored as XML into native JSON.

      The migration script will upgrade a database’s content and configuration from the XML format that was used in MarkLogic 6 and 7 to represent data to native JSON, specifically converting documents in the http://marklogic.com/xdmp/json/basic namespace.
       
      If you are using the MarkLogic 7 JSON support, you will also need to migrate your code to use the native JSON support. The resulting application code is expected to be more efficient, but it will require application developers to make minor code changes to your application.
       
      See also:
       
      Version 8 JSON incompatibilities
       

      Introduction

      MarkLogic Server provides a couple of useful techniques for keeping values in memory or resolving values without having to scan for documents on-disk.

      Options

      There are a few options available:

      1. cts:element-values performs a lexicon lookup so it's directly getting those values from the range indexes; you can add an options node and use the "map" parameter to get the call to return a map directly as per the documentation, which may give you what you need without having to do any further work.

      See: http://docs.marklogic.com/cts:element-values

      2. Storing a map as a server field is a popular approach and is widely used for storing data that needs to be accessed routinely by queries.

      Bear in mind that there is a catch to this approach as the map is not available to all nodes in a cluster - it is only available to the node responsible for evaluating the original request, so if you're using this technique in a clustered environment, the results may not be what is expected.

      Also note that if you're planning on storing a large number of maps in server fields on nodes on the cluster, it's important to make sure the hosts are provisioned with enough memory to accommodate these maps on top of group level caches and memory for query allocation, stands, range indexes document retrieval and the like.

      See: http://docs.marklogic.com/map:map

      And: http://docs.marklogic.com/xdmp:set-server-field

      3. xdmp:set only allows you to set a value for the life of a single query but this technique can be useful in some circumstances - especially in situations where you're interested in keeping track of certain values throughout the processing of a module or a function within a module.

      See: http://docs.marklogic.com/xdmp:set

      4. If you have a situation where you have a large number of complex queries - particularly ones where lexicon lookups or calls to range indexes won't resolve the data you need and where lots of documents will need to be retrieved from disk, you should consider using registered queries.

      See: http://docs.marklogic.com/cts:registered-query

      Note that registered queries utilise the List Cache so, if you plan to adopt this method, we recommend careful testing to ensure your caches are sized sufficiently to suit the needs of your application.

      Summary

      This article explains how to kill Long Running Query and related timeout configurations.

      Problem Scenario

      At some point, we've all run into an inefficient long running query. What should we do if we don't want to wait for the query to complete? If we cancel the browser request, that would end the connection, but it wouldn't end the program invocation (called a "request") on the MarkLogic Server side. On the server side, that program invocation would continue to run until the execution is complete.

      Most of the time, this isn't really an issue. The server, of course, is multi-threaded, handling many concurrent transactions. We can just cancel the browser request, move on, and let the query finish when it finishes. However, sometimes it becomes necessary to free up server resources by killing the query and starting over. To do this, we need access to the Admin interface. 

      Sample Long running Query 

      Example only, please don't try this on any production machines!

      for $x in 1 to 1000000
      return collection()[1 + xdmp:random(1000)]
       
      This query is asking for 1,000,000 random documents, and will take a long time to execute. How can we cancel this query?

      How to Cancel/Kill the Query

      Go to the Administrative interface (at http://localhost:8001/ if you're running MarkLogic locally). At the top of the screen, you'll see a tab labeled "Status." Click that:

      screenshot1.jpg

      This will take you to the "System Status" screen. This page reveals status information about hosts, databases, forests, and app servers. The App Server section is what we're concerned with. Scanning down the "Queries" column, we see that the "Admin" server is processing a query (namely, the one that generated the page we see). Everything looks okay so far. But just below that, we see that the "App-Services" server is just over 3 minutes into processing a query. That's our slow one. Query Console runs on the "App-Services" app server, which explains why we see it there. Go ahead and click the "App-Services" link:

      screenshot2.jpg

      This takes us to the "App-Services" status page. So far, there's still no "cancel" button. One more click will reveal it ("show more"):

      screenshot3.jpg

      We can now see an individual entry for the currently running query. Here we see it's called "eval.xqy"; that's the query module that Query Console invokes when you submit a query. If you were running your own query module (instead of using Query Console), then you would see its name here instead. To cancel the query, click the "[cancel]" link:

      screenshot4.jpg

      One more click (on the confirmation page).

      screenshot5.jpg

      This takes us back to the status page, where we see MarkLogic Server is in the process of canceling our query:

      screenshot6.jpg

      Above page will continue to say "cancelling..." even though query is already killed and no longer exist till we refresh the page.

      A quick refresh of the above page shows that the query is no longer present.

      screenshot7.jpg

       

      What happens if you forget to cancel a query?

      MarkLogic will continue to execute the query until a time limit is reached, at which point the Server will cancel the query for you. For example, here's what Query Console eventually returns back if we don't bother to cancel the query:

      screenshot8.jpg

      How long is this time limit?

      This depends on your server configuration. We can actually set the timeout in the query itself, using the xdmp:set-request-time-limit() function, but even that will be limited by your server's "max time limit."

      For example, on the "Configure" tab of my "App-Services" app server, you can see that the "default time limit" is set to 10 minutes (600 seconds), and the longest any query can allow itself to run (by setting its own request time limit) is one hour (3600 seconds):

      screenshot9.jpg

       

      Update and delete operations can be performance intensive and have negative effects on search performance when done in a conventional way, where data is updated or deleted in-place. To avoid these performance impacts during update and delete operations, MarkLogic Server updates and deletes "lazily."

      In MarkLogic Server, when you delete a document, it is not removed from disk immediately as that document's fragments are instead marked as "obsolete." Marking a document as obsolete tags its fragments for later removal, and also hides its fragments from subsequent query results. Updates happen in a similar way, where instead of updating in-place, MarkLogic Server marks the old versions of the fragments in an old stand as "obsolete" for later deletion, while also creating new versions of those fragments in a new stand (initially an in-memory stand, which is eventually written down as a new on-disk stand).

      Eventually, merges occur to move any unchanged fragments from an old stand into a new stand. Old fragments marked obsolete are ultimately deleted after the merge creating the new stand finishes, where the old stands that were used as input into that merge are finally removed from disk. Merging is very important - this is the mechanism by which MarkLogic Server both frees up disk space and optimizes its on-disk data structures, as well as reduces the number of fragments evaluated during its queries and searches.

      While lazy deletion results in faster updates and deletes, be aware that residual impacts can be seen in terms of both disk space and query performance if merges are not done in a timely manner.

      Further reading:

      Multi-Version Concurrency Control
      How do updates work in MarkLogic Server?
      ML Performance: Understanding System Resources

      Introduction

      MarkLogic Server allows you to configure MarkLogic Server so that users are authenticated using an external authentication protocol, such as Lightweight Directory Access Protocol (LDAP) or Kerberos. These external agents serve as centralized points of authentication or repositories for user information from which authorization decisions can be made. If, after following the configuration instructions in our documentation, the authentication does not work as expected, this article gives some additional debugging ideas.

      Details

      The following are areas should be checked when your LDAP Authentication is not working as expected:

      1. Verify that cyrus-sasl-md5 library is installed on MarkLogic Server node.

      2. Run the following LDAP search command to check if LDAP server is properly setup.

      ldapsearch -H ldap://{Your LDAP Serevr URI}:389 -x -s base

      a. Once you run the ldap search command, make sure digest-md5 is supported. 

      supportedSASLMechanisms: DIGEST-MD5

      b. Identify the correct LDAP Service name:

      e.g ldapServiceName: MLTEST1.LOCAL:dc1$@MLTEST1.LOCAL


      3. On Windows platforms, the services.keytab file is created using Active Directory Domain Services (AD DS) on a Windows server. If you are using Active Directory Domain Services (AD DS) on a computer that is running Windows Server 2008 or Windows Server 2008 R2, be sure that you have installed the hot fix described in http://support.microsoft.com/kb/975697.

      Introduction: the issue

      MarkLogic performs Nested lookups on the LDAP Groups assigned to a user to determine which roles the user will be assigned. If the groups belong to multiple Active Directory Domains within a federated Active Directory Forest then MarkLogic user authorization could fail with a subordinate Referral error, as seen below:

      2019-07-30 13:27:23.002 Notice: XDMP-LDAP: ldap_search_s failed on ldap server ldap://ad1.myhost.com:389: Referral (10)

      Cause

      MarkLogic has been configured to connect to the Local Domain Controller LDAP ports 389 (LDAP) or 636 (LDAPs), however, a Local Domain Controller can only search domains to which it has access.

      Example

      A user is a member of the following groups which belong to two separate Active Directory domains, subA, and subC.

      Using a Local Domain Controller for subA for external authorization would result in a login failure when attempting to perform the nested group lookup for the domain subC

      member=CN=Group Onw,OU=OrgUnitAGroups,OU=OrgUnitA,DC=subA,DC=domain
      member=CN=Group Two,OU=OrgUnitAGroups,OU=OrgUnitA,DC=subA,DC=domain
      member=CN=Group Three,OU=OrgUnitCGroups,OU=OrgUnitC,DC=subC,DC=domain

      Solution

      If you have multiple Active Directory Domains federated into an Active Directory forest you should use the Global Catalog port 3278 (LDAP) or 3279 (LDAPS) to prevent failures when searching for group memberships that are defined in other domains.

      Optional workaround

      A large number of nested groups can potentially lead to a decrease in login time performance, if you do not need to really on nested lookups to determine group membership for MarkLogic roles, i.e. all groups required are returned from the initial user search request then you should consider disabling setting the "ldap nested lookup" parameter to false in the External Security configuration.

      Doing this would also prevent subordinate domain searches and allow you to continue to use an Active Directory Domain Controller instead of switching to the Global Catalog.

      Further reading

      Summary

      A leap second, as defined by wikipedia is "a one-second adjustment that is occasionally applied to Coordinated Universal Time (UTC) in order to keep its time of day close to the mean solar time. Without such a correction, time reckoned by Earth's rotation drifts away from atomic time because of irregularities in the Earth's rate of rotation."  At the time of this writing, the next leap second to be inserted is on June 30, 2015 at 23:59:60 UTC.

      For systems that use the Network Time Protocol (NTP) to synchronize the network time across all the host in their MarkLogic Cluster, the Marklogic Server Software is not impacted by the leap second (i.e. we expect everything to work fine at the MarkLogic layer)

      For systems where the synchronization of their system clocks require UTC time to be set backwards, then anywhere time dependent data is stored, it must be accounted for. In this case, we recommend that our customers implement NTP in their environment.  Otherwise, the application layer will need to handle discontinuous time. 

      Transactional Consistency

      The algorithm that MarkLogic Server uses to maintain transactional consistency of data is not wall clock dependent and, as such, is not affected by the leap second.

      Network Time Protocol (NTP)

      NTP generally works really really hard not to make time go backwards as clock readings are constrained to always increase - every reading increases the NTP clock. NTP adjusts things gradually by slowing down or speeding up the clock and not by making discrete changes unless time is off by a lot. A second is not a lot.  An hour is a lot. Regardless of the leap second, adjustments for computer clock drift can easily be more than a second and happen frequently. 

      When Time Goes Backwards

      Without NTP and left on their own, computer clocks are really not that accurate. If synchronization of the system clocks on the hosts of a MarkLogic cluster require the clocks to be set backwards, then the application layer will need to account for and handle discontinuous date-time in their data. 

      Beginning with MarkLogic Server version 8,  the temporal feature was introduced.  If the system clock is adjusted backwards, there are conditions where temporal document inserts and updates will fail with an appropriate error code.  This is by design and expected.

      Our recommendation is to implement NTP on all hosts of a MarkLogic cluster to eliminate the need to handle discontinuous time at the application layer. 

      Further Reading

      Redhat article on the Leap Second - https://access.redhat.com/articles/15145 ;

      Microsoft Support article on the Leap Second - http://support.microsoft.com/kb/909614 ;

       

      Summary

      The internal mechanisms MarkLogic Server uses to implement security are query constraints. Lexicon search performance may be impacted by security query contraints.  If performed with admin credentials, Lexicon searches will not be impacted by the security query constraints.  

      Detail

      Query time grows proportionately with the number of matches from a given search across a set of documents (not the actual number of documents in your database). The presence of security constraints will contribute a significantly larger number of matches than if the same lexicon search was performed with admin credentials.  In order to minimize the number of matches (and therefore query time) for a given lexicon search, you'll want to amp your lexicon searches to an admin user.

      For MarkLogic Server v6.0, the absolute maximum number of MarkLogic Servers in a Cluster is 256, but the optimum is around 64.

      Summary

      MarkLogic recommends the default "ordered" option for Linux ext3 and ext4 file-systems.

      File System administrators in Linux are tempted to use the data=writeback option to achieve higher throughput from their file-system, but this comes with the side-effects of potential data corruption and data-secuity breach. This article explains both file system options with respect to MarkLogic Server. 

      "data=ordered"

      Linux ext3 and ext4 file system has default data option of "ordered", which writes to the main file system before committing to the journal.

      https://www.kernel.org/doc/Documentation/filesystems/ext4.txt

      https://www.kernel.org/doc/Documentation/filesystems/ext3.txt

      Both of these file-system goes the extra mile to protect your files and writes data associated with that meta data by default with data=ordered, thus assuring file-system integrity to application layer - essential for MarkLogic Server data integrity. 

      "data=writeback"

      Other journaled file systems like XFS and JFS write meta data to the disk;  to make ext3 and ext4 behave like XFS and other journal file system, an administrator could set 'data=writeback' in their mount options.

      The 'data=writeback' mode does not preserve data ordering when writing to the disk, so commits to the journal may happen before the data is written to the file system. This method is faster because only the meta data is journaled, but is not good at protecting data integrity in the face of a system failure.

      If there is a crash between the time when metadata is commited to the journal and when data is written to disk, the post-recovery metadata can point to incomplete, partially written or incorrect data on disk; which can lead to corrupt data files. Additionally, data which was supposed to be overwritten in the filesystem could be exposed to users - resulting in a security risk.

      Linus Torvalds comments on 'data=writeback'

      "it makes things much smoother, since now the actual data is no longer in the critical path for any journal writes, but anybody who thinks that's a solution is just incompetent.  We might as well go back to ext2 then. If your data gets written out long after the metadata hit the disk, you are going to hit all kinds of bad issues if the machine ever goes down."   - http://thread.gmane.org/gmane.linux.kernel/811167/focus=811654

       

      Introduction

      Here we discuss management of temporal documents.

      Details

      In MarkLogic, a temporal document is managed as a series of versioned documents in a protected collection. The ‘original’ document inserted into the database is kept and never changes. Updates to the document are inserted as new documents with different valid and system times. A delete of the document is also inserted as a new document.

      In this way, a temporal document always retains knowledge of when the information was known in the real world and when it was recorded in the database.

      API's

      By default the normal xdmp:* document functions (e.g., xdmp:document-insert) are not permitted on temporal documents.

      The temporal module (temporal:* functions; see Temporal API) contains the functions used to insert, delete, and manage temporal documents.

      All temporal updates and deletes create new documents and in normal operations this is exactly what will be desired.

      See also the documentation: Managing Temporal Documents.

      Updates and deletes outside the temporal functions

      Note: normal use of the temporal feature will not require this sort of operation.

      The function temporal:collection-set-options can be used with the updates-admin-override option to specify that users with the admin role can change or delete temporal documents using non-temporal functions, such as xdmp:document-insert and xdmp:document-delete.

      For example, if you need to do a corb or other administrative transform, but do not want to update the system dates on the documents; say, you want to change the values M/F to Male/Female.

       

      Introduction

      When CPF is installed, a number of new documents are created for the nominated Triggers database associated with that database.

      This Knowledgebase article is designed to show you what CPF creates on install, in the event that you want to safely disable and remove it from your system.

      Getting started

      Below is a layout of all databases and their associated document counts with a clean install of MarkLogic 9.0-2:

      Database IDDatabase NameDocument Count
      8723423541597683063 App-Services 14
      12316032390759111212 Modules 0
      1695527226691932315 Fab 0
      11723073009075196192 Security 1526
      15818912922008798974 Triggers 0
      5212638700134402198 Documents 0
      4320540002505594119 Extensions 0
      9023394855382775954 Last-Login 0
      11598847197347642387 Schemas 0
      12603105430027950215 Meters 48

      Adding CPF

      After installing CPF on the Documents database (with conversion enabled), we now see:

      Database IDDatabase NameDocument Count
      8723423541597683063 App-Services 15
      12316032390759111212 Modules 0
      1695527226691932315 Fab 0
      11723073009075196192 Security 1526
      15818912922008798974 Triggers 39
      5212638700134402198 Documents 0
      4320540002505594119 Extensions 0
      9023394855382775954 Last-Login 0
      11598847197347642387 Schemas 0
      12603105430027950215 Meters 498

      If we ignore Meters and App-Services, we can see that by default, A CPF install will create a number of documents in the Triggers database:

      /cpf/domains.css
      /cpf/pipelines.css
      http://marklogic.com/cpf/configuration/configuration.xml
      http://marklogic.com/cpf/domains/4361761515557042908.xml
      http://marklogic.com/cpf/pipelines/10451885084298751684.xml
      http://marklogic.com/cpf/pipelines/11486027894562997537.xml
      http://marklogic.com/cpf/pipelines/1182872541253698578.xml
      http://marklogic.com/cpf/pipelines/11925472395644624519.xml
      http://marklogic.com/cpf/pipelines/12665626287133680551.xml
      http://marklogic.com/cpf/pipelines/12977232154552215987.xml
      http://marklogic.com/cpf/pipelines/13371411038103584886.xml
      http://marklogic.com/cpf/pipelines/13468360248543629252.xml
      http://marklogic.com/cpf/pipelines/13721894103731640519.xml
      http://marklogic.com/cpf/pipelines/14473927355946353823.xml
      http://marklogic.com/cpf/pipelines/16071401642383641119.xml
      http://marklogic.com/cpf/pipelines/17008133204004114953.xml
      http://marklogic.com/cpf/pipelines/1707825679528566193.xml
      http://marklogic.com/cpf/pipelines/17486255598951175231.xml
      http://marklogic.com/cpf/pipelines/1789191734187967847.xml
      http://marklogic.com/cpf/pipelines/2145494300111008849.xml
      http://marklogic.com/cpf/pipelines/2272288885870389220.xml
      http://marklogic.com/cpf/pipelines/2585221667797881502.xml
      http://marklogic.com/cpf/pipelines/4684095308382280821.xml
      http://marklogic.com/cpf/pipelines/6055693256331806191.xml
      http://marklogic.com/cpf/pipelines/7250675434061295808.xml
      http://marklogic.com/cpf/pipelines/7354167915842037706.xml
      http://marklogic.com/cpf/pipelines/7492839190910743342.xml
      http://marklogic.com/cpf/pipelines/8329675320036351600.xml
      http://marklogic.com/cpf/pipelines/8537493622930387355.xml
      http://marklogic.com/cpf/pipelines/8877791654658876902.xml
      http://marklogic.com/cpf/pipelines/8988716724908642408.xml
      http://marklogic.com/cpf/pipelines/9432621469736814202.xml
      http://marklogic.com/xdmp/triggers/10905847201437369653
      http://marklogic.com/xdmp/triggers/11663386212502595308
      http://marklogic.com/xdmp/triggers/12471659507809075185
      http://marklogic.com/xdmp/triggers/15932603084768890631
      http://marklogic.com/xdmp/triggers/16817738273312375366
      http://marklogic.com/xdmp/triggers/17731123999892629453
      http://marklogic.com/xdmp/triggers/6779751200800194600

      Files created by CPF

      http://marklogic.com/cpf/configuration

      One of these files is the CPF configuration.xml file

      http://marklogic.com/cpf/domains

      One of these documents describes the default domain which is created when CPF is installed:

      Default Documents
      http://marklogic.com/cpf/pipelines

      Of the 39 files created, we can see from the URI listing above that the majority (28) of these are prefaced with http://marklogic.com/cpf/pipelines. These files describe each of the standard conversion pipelines that ship with the server. These are:

      Alerting
      Alerting (spawn)
      Calais Entity Enrichment Sample
      Conversion Processing
      Conversion Processing (Basic)
      Data Harmony Enrichment Sample
      DocBook Conversion
      Document Filtering (Properties)
      Document Filtering (XHTML)
      Entity Enrichment
      Flexible Replication
      HTML Conversion
      Janya Entity Enrichment Sample
      MS Office Conversion
      Office OpenXML Extract
      PDF Conversion
      PDF Conversion (Image Batching)
      PDF Conversion (Page Layout with Reblocking)
      PDF Conversion (Page Layout, Image Batching)
      PDF Conversion (Page Layout)
      PDF Conversion (Paged Text, No Rendering)
      Schema Validation
      SRA NetOwl Entity Enrichment Sample
      Status Change Handling
      Temis Entity Enrichment Sample
      WordprocessingML Process
      XHTML Conversion Processing
      XInclude Processing
      http://marklogic.com/xdmp/triggers

      Seven of the files are triggers - all of which are namespaced with the cpf prefix:

      cpf:any-property Default Documents
      cpf:create Default Documents
      cpf:delete Default Documents
      cpf:restart
      cpf:state Default Documents
      cpf:status Default Documents
      cpf:update Default Documents

      Removing the core files created when CPF was initially installed will disable it from further functioning in your environment.

      Scripting the removal of default CPF components

      This GitHub gist demonstrates a method for removing CPF configuration from a given database - in the example below, the "Triggers" database is specfied:

      Introduction

      If you have an existing MarkLogic Server cluster running on EC2, there may be circumstances where you need to upgrade the existing AMI with the latest MarkLogic rpm available. You can also add a custom OS configuration.

      This article assumes that you have started your cluster using the CloudFormation templates with Managed Cluster feature provided by MarkLogic.

      Procedure
      To upgrade manually the MarkLogic AMI, follow these steps:

      1. Launch a new small MarkLogic instance from the AWS MarketPlace, based on the latest available image. For example, t2.small based on MarkLogic Developer 9 (BYOL). The instance should be launched only with the root OS EBS volume.
      Note: If you are planning to leverage the PAYG-PayAsYouGo model, you must choose MarkLogic Essential Enterprise.
      a. Launch a MarkLogic instance from AWS MarketPlace, click Select and then click Continue:

      b. Choose instance type. For example, one of the smallest available, t2.small
      c. Configure instance details. For example, default VPC with a public IP for easy access
      d. Remove the second EBS data volume (/dev/sdf)
      e. Optional - Add Tags
      f. Configure Security Group - only SSH access is needed for the upgrade procedure
      g. Review and Launch
      Review step - AWS view:

      2. SSH into your new instance and switch the user to root in order to execute the commands in the following steps.

      $ sudo su -

      Note: As an option, you can also use "sudo ..." for each individual command.

      3. Stop MarkLogic and uninstall MarkLogic rpm:

      $ service MarkLogic stop
      $ rpm -e MarkLogic

      4. Update-patch the OS:

      $ yum -y update

      Note: If needed, restart the instance (For example: after a kernel upgrade/core-libraries).
      Note: If you would like to add more custom options/configuration/..., they should be done between steps 4 and 5.

      5. Install the new MarkLogic rpm
      a. Upload ML's rpm to the instance. (For example, via "scp" or S3)
      b. Install the rpm:

      $ yum install [<path_to_MarkLogic_RPM>]/[MarkLogic_RPM]

      Note: Do not start MarkLogic at any point of AMI's preparation.

      6. Double check to be sure that the following files and log traces do not exist. If they do, they must be deleted.

      $ rm -f /var/local/mlcmd.conf
      $ rm -f /var/tmp/mlcmd.trace
      $ rm -f /tmp/marklogic.host

      7. Remove artifacts
      Note: Performing the following actions will remove the ability to ssh back into the baseline image. New credentials are applied to the AMI when launched as an instance. If you need to add/change something, mount the root drive to another instance to make changes.

      $ rm -f /root/.ssh/authorized_keys
      $ rm -f /home/ec2user/.ssh/authorized_keys
      $ rm -f /home/ec2-user/.bash_history
      $ rm -rf /var/spool/mail/*
      $ rm -rf /tmp/userdata*
      $ rm -f [<path_to_MarkLogic_RPM>]/[MarkLogic_RPM]
      $ rm -f /root/.bash_history
      $ rm -rf /var/log/*
      $ sync

      8. Optional - Create an AMI from the stopped instance.[1] The AMI can be created at the end of step 7.

      $ init 0

      [1] For more information: https://docs.aws.amazon.com/toolkit-for-visual-studio/latest/user-guide/tkv-create-ami-from-instance.html

      At this point, your custom AMI should be ready and it can be used for your deployments. If you are using multiple AWS regions, you will have to copy the AMI as needed.
      Note: If you'd like to add more custom options/configuration/..., they should be done between steps 4 and 5.

      Additional references:
      [2] Upgrading the MarkLogic AMI - https://docs.marklogic.com/8.0/guide/ec2/managing#id_69624

      Introduction

      A powerful new feature was added to MarkLogic 8 - the ability to build applications around a declarative HTTP rewriter. You can read more about MarkLogic Server's HTTP rewriter and some of the new features it provides in our documentation.

      This article will cover some basic tips for debugging applications that make use of this feature.

      Validating your rewriter rules (Using XML Schema)

      The rewriter adheres to an XML Schema. At runtime the rewriter is not validated against this schema; this is by design so that potentially minor errors don't risk taking your application offline. As a best practice, we recommend validating your rewriters manually every time you make a change. In order to do this, you can use MarkLogic Server or any other tool that supports XML validation (the schema is standard XSD 1.0).  If you want to view the schema, it's copied to Config/rewriter.xsd when you install the product.

      In order to validate from within MarkLogic using XQuery you can simply execute:

      validate { fn:doc("/path/to/your/rewriter.xml") }

      The above will validate the XML if your rewriter rules are stored in a database. If you're using the filesystem, you can use xdmp:document-get instead.

      Alternatively, you can copy / paste the XML body into Query Console and wrap it with a call to validate as below:

      validate { * Paste your rewriter rules here * }

      The above approach should work without any issue as long as there is no content in your rewriter XML that contains any XQuery reserved syntax.

      General rewritter debugging and tracing

      For a simple "print" style debugging you can manually add trace statements at any point an eval rule is allowed. Like this:

      <trace event="customevent">data</trace>

      Then enable diagnostics (in your group settings) and add "customevent"; your custom trace will now show up in ErrorLog.txt whenever that endpoint is accessed. To read more on the use of trace events in your applications, refer to this Knowledgebase article

      There is error code handling:

      <error code="MYAPP-EXCEPTION" data1="value1" data2="... 

      You can also add ids - these will be traced out - which may aid debugging

      <match id="match-id-for-myregex" regex=".* ...

      Useful diagnostic trace events

      Note that additional trace events can generate a lot of data and may slow your application down, so make sure these do not get left on in a production-critical environment

      Below are some trace events you can use and a brief description of what each trace event does:

      Rewriter Parser Details of the parsing of the rewriter XML file
      Rewriter Evaluator Execution traces of rules as evaluated
      Rewriter Evaluator Verbose Additional (more verbose) tracing
      Declarative Rewriter Entry points into and out of the rewriter from the app server request handler
      Rewriter Print Rules After parsing and validation of the rewriter – a full dump of the internal data structures that resulted.

      Additional points to note

      Use of the "Evaluator" traces will write to the ErrorLog.txt on every request.

      The "Parser" trace event will only occur once or upon updating your rewriter.

      Introduction

      Prior to the 9.0-9 release, MarkLogic currently provides support for the Oracle JDK 8.  However, Oracle have recently announced End of Public Updates of Java SE 8

      What can we expect from MarkLogic?

      MarkLogic will support OpenJDK 9, OpenJDK 10 and OpenJDK 11 starting with MarkLogic Server 9.0-9 and associated products.

      These products include:

      From the 9.0-9 release onwards, we will no longer QA test our products with Oracle JDK.

      We will support Amazon Corretto JDK as part of our Amazon offerings.  Corretto meets the Java SE standard and certified compliant by AWS using the Java Technical Compatibility Kit.

      The latest version of MarkLogic Server is available to download from:

      http://developer.marklogic.com/products

      JDK Requirements for Data Hub Framework (DHF) Users

      Requirements are discussed in further detail in the DHF documentation, however it's important to note that versions of DHF prior to the 5.2 release require Java 8.

      Summary

      The default configuration of MarkLogic Application Servers are not vulnerable to the FREAK SSL attack. 

      What is the FREAK SSL attack?

      Tuesday 2015/03/03 - Researchers of miTLS team (joint project between Inria and Microsoft Research) disclosed a new SSL/TLS vulnerability — the FREAK SSL attack (CVE-2015-0204). The vulnerability allows attackers to intercept HTTPS connections between vulnerable clients and servers and force them to use ‘export-grade’ cryptography, which can then be decrypted or altered.

      Read more about the FREAK SSL attack.

      Testing a webserver

      You can verify whether a webserver is attackable by the FREAK attack with this free SSL vulnerability checker.

      FIPS

      MarkLogic Server uses FIPS-capable OpenSSL to implement the Secure Sockets Layer (SSL v3) and Transport Layer Security (TLS v1) protocols. When you install MarkLogic Server, FIPS mode is enabled by default and SSL RSA keys are generated using secure FIPS 140-2 cryptography. This implementation disallows weak ciphers and uses only FIPS 140-2 approved cryptographic functions. Read more about OpenSSL FIPS mode in MarkLogic Server, and how to configure it.

      As long as FIPS mode was not explicitly disabled, MarkLogic Application Servers are not vulnerable to the FREAK SSL attack. 

      OpenSSL

      Eliminating the vulerability for all configurations requires an update to the OpenSSL library. MarkLogic Server continually updates the implementation version of the OpenSSL library so every MarkLogic Server maintenance release published after the discovery of this vulnerability will include the OpenSSL version that is not vulnerable to the FREAK attack.

      Conclusion

      As long as FIPS mode is enabled, which is the default configuration, MarkLogic Application Servers are not vulnerable to the FREAK SSL attack

       

      Summary

      MarkLogic 9 introduces Certificate based User Authentication, which allows users to Log into MarkLogic Server without being required to enter user name/password. In previous versions, Certificates were only utilized to restrict client access to MarkLogic Server with the Digest/Basic User Authentication Scheme. In addition to Certificate based User Authentication using Internal user and External name verification MarkLogic 9 also permits authenticating and authorizing user certificates against an LDAP or Active Directory database to permit access based on MarkLogic Roles and LDAP Group membership. By using this method of authentication and authorization a site is able to maintain all users access externally without the need to manage a separate set of users within the MarkLogic security database.


      This document will expand on the concepts and configuration examples described in the associated "MarkLogic Certificate based User Authentication" knowledge base article and will show the additional steps required to configure MarkLogic to authorize a User certificate against an LDAP or Active Directory. It is highly recommended that you make yourself familiar with the previous article as it covers in more detail the steps required to setup the MarkLogic App Server to ensure that TLS Client Authentication is configured correctly to request and verify the certificates that may be presented by the user.

      Creating the External Security definition

      To authorize users presenting a certificate you should first create a new External Security definition selecting “Certificate” for authentication and LDAP for authorization.

       ExternalSecurity.png

      Next, configure the LDAP server entry.

      LDAPServer.png

      Notes:

      • Unlike standard user authorization when MarkLogic searches for the user certificate, MarkLogic uses a base Object search using the full certificate distinguished name rather than a sub-tree search off the “ldap base”. MarkLogic UI currently requires an entry for the “ldap base”; Even though it is not used, as such you will need to code a dummy value to satisfy UI verification.
      • When performing the LDAP search, MarkLogic will request the “ldap attribute” value to use when creating the temporary userid. Care should be taken when selecting this value to ensure that the value is unique for all possible Certificate DN’s that may be presented.
      • Ensure that the “ldap default user” has the required permissions to search for the Certificate within the LDAP or Active Directory server and return the required attributes.
      • MarkLogic uses the “memberOf” and “member” attributes to return Group and Group of Group membership, if your LDAP or Active Directory server using different attributes such as “isMemberOf” you can override them in the “memberOf” and “member” attribute fields. 

      Configuring the App Server

      Configure the App Server to use “certificate” authentication, set “Internal Security” to false and select the external security definition created above.

      AppServer1.png

      Enable TLS Client Authentication and configure the SSL Client Certificate authorities that you will accept to sign the user certificates. Any certificates presented that is not signed by one of the specified CA’s will be rejected.

      AppServer2.png 

      AppServer3.png

      For more details on configuring the CA certificates required for certificate based authentication please from to the knowledge base article "MarkLogic Certificate based User Authentication". 

      Configure MarkLogic Security Roles

      For each role specify one or more external names that match the “memberOf” attribute returned for the Certificate DN.

      Role.png
       
      To confirm that users are being authorized to the MarkLogic AppServer correctly, connect using your browser or command line tool such as “cUrl”.

      MacPro-4505:~ $ curl -k --cert ./mluser1.p12:password https://localhost:8013
      <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
      <html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
      <head>
      <title>Welcome to the MarkLogic Test page.</title>
      </head>
      <body><p>This application is running on MarkLogic Server version 9.0-1.1</p></body>

       
      Within the AppServer AccessLog, you should see a mapping for a new temporary userid to the expected role.

      External User(mluser1) is Mapped to Temp User(mluser1) with Role(s): mladmin
      ::1 - mluser1 [18/Jul/2017:16:07:05 +0100] "GET / HTTP/1.1" 200 347 - "curl/7.51.0"

      Troubleshooting

      If a user is not able to connect using their certificate, the first thing to check is if the Certificate Distinguished Name (DN) can be found in the LDAP or Active Directory database and if it contains the required userid and memberOf attributes.

      Using a tool such as OpenSSL determine the correct Subject Certificate DN, e.g.

      MacPro-4505:~ $ openssl x509 -in mluser1.pem -text
      Certificate:
      Data:
      Version: 3 (0x2)
      Serial Number: 1497030421 (0x593adf15)
      Signature Algorithm: sha256WithRSAEncryption
      Issuer: CN=User Signing Authority, O=MarkLogic, OU=Support
      Validity
      Not Before: Jun 9 17:47:13 2017 GMT
      Not After : Jun 9 17:47:13 2018 GMT
      Subject: CN=mluser1, OU=Users, DC=MarkLogic, DC=Local
       
      Next using an LDAP lookup tool such as “ldapsearch” or "ldp.exe" on Microsoft Windows, perform a base Object search for the Certificate DN requesting the LDAP user and memberOf attribute (with the entries matching your LDAP External Security settings).

      If either the userid or memberOf attributes are missing access will be denied.


      MacPro-4505:~ $ ldapsearch -H ldap://192.168.66.240:389 -x -D "cn=manager,dc=marklogic,dc=local" -W -s base -b "cn=mluser1,ou=Users,dc=MarkLogic,dc=Local" "memberOf" "cn"
      # extended LDIF
      #
      # LDAPv3
      # base <cn=mluser1,ou=Users,dc=MarkLogic,dc=Local> with scope baseObject
      # filter: (objectclass=*)
      # requesting: memberOf uid
      #
      # mluser1, Users, MarkLogic.Local
      dn: cn=mluser1,ou=Users,dc=MarkLogic,dc=Local
      uid: mluser1
      memberOf: cn=AppAdmin,ou=Groups,dc=MarkLogic,dc=Local
      # search result
      search: 2
      result: 0 Success
       
      If MarkLogic is able successfully to locate the certificate and return the required attributes, then check if the external names in the security role matches (case-sensitive) the “memberOf” attribute returned by the LDAP search.

      The following XQuery can be used to show all the external names assigned to a specific role. 


      (: execute this against the security database :)
      xquery version "1.0-ml";
      import module namespace sec = "http://marklogic.com/xdmp/security"
          at "/MarkLogic/security.xqy";
      sec:role-get-external-names("mladmin")


      Result

      cn=AppAdmin,ou=Groups,dc=MarkLogic,dc=Local


      If MarkLogic is still not able to authenticate users, it is very useful to use a packet capture tool such as Wireshark to check - if MarkLogic is able to contact the LDAP or Active Directory server and is receiving the expected successful Admin bind and Search for the Certificate DN.

      The following example trace shows a successful BIND using the LDAP Default user followed by a successful search for the Certificate DN.

      LDAPWireshark.png

      Further Reading

      Summary

      MarkLogic 9 introduces Certificate based User Authentication, which allows users to Log into MarkLogic Server without being required to enter user name/password. In previous versions, Certificates were only utilized to restrict client access to MarkLogic Server with the Digest/Basic User Authentication Scheme. Certificate based User Authentication configuration can be achieved using Internal User or External Name based user configurations.

      Certificate Authentication: Internal User vs External Name based Authentication:

      The difference between Internal User or External Name based authentication lies in the existence of the Certificate CN field based User (demoUser1 in our example) in the MarkLogic Security Database (Internal User) vs if the user retrieved from Certificate Subject field (whole Subject field as DN) is mapped as External Name value in any Existing User.

      User Certificate Example:

      There are few common steps/examples listed to add to clarity. For our example setup, the certificate presented by the App Server User (demoUser1) will be as following. 

      $ openssl x509 -in UserCert.pem -text -noout
      Certificate:
          Data:
              Version: 1 (0x0)
              Serial Number: 7 (0x7)
          Signature Algorithm: sha1WithRSAEncryption
              Issuer: C=US, ST=NY, L=New York, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic DemoCA
              Validity
                  Not Before: Jul 11 02:58:24 2017 GMT
                  Not After : Aug 27 02:58:24 2019 GMT
              Subject: C=US, ST=NJ, L=Princeton, O=MarkLogic Corporation, OU=Engineering, CN=demoUser1
              Subject Public Key Info:
                  Public Key Algorithm: rsaEncryption
                      Public-Key: (1024 bit)
                      Modulus:
                          .....................
                      Exponent: 65537 (0x10001)
          Signature Algorithm: sha1WithRSAEncryption

      CA Certificate (User Cert Signer) Import from Admin GUI

      In order to allow MarkLogic Server to accept the Certificate presented by a user, MarkLogic Server needs Certificate Authority (CA) to sign the User Certificate installed into MarkLogic. We can install CA Certificate (below) used to sign demoUser1 Cert using Admin GUI->Configure->Security->Certificate Authority Import tab.

      $ openssl x509 -in CACert.pem -text -noout
      Certificate:
          Data:
              Version: 3 (0x2)
              Serial Number: 9774683164744115905 (0x87a6a68cc29066c1)
          Signature Algorithm: sha256WithRSAEncryption
              Issuer: C=US, ST=NY, L=New York, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic DemoCA
              Validity
                  Not Before: Jul 11 02:53:18 2017 GMT
                  Not After : Jul  6 02:53:18 2037 GMT
              Subject: C=US, ST=NY, L=New York, O=MarkLogic Corporation, OU=Engineering, CN=MarkLogic DemoCA
              Subject Public Key Info:
                  Public Key Algorithm: rsaEncryption
                      Public-Key: (4096 bit)
                      Modulus:
                         ......................
                      Exponent: 65537 (0x10001)
              X509v3 extensions:
                  X509v3 Subject Key Identifier:
                      D9:45:B9:9A:DC:93:7B:DB:47:07:C6:96:63:57:13:A7:A8:F1:D0:C8
                  X509v3 Authority Key Identifier:
                      keyid:D9:45:B9:9A:DC:93:7B:DB:47:07:C6:96:63:57:13:A7:A8:F1:D0:C8
                  X509v3 Basic Constraints: critical
                      CA:TRUE
                  X509v3 Key Usage: critical
                      Digital Signature, Certificate Sign, CRL Sign
          Signature Algorithm: sha256WithRSAEncryption

      CA Certificate Import into MarkLogic from Query Console

      We can also import above Certificate Authority with xquery call pki:insert-trusted-certificates to load the Trusted CA into MarkLogic.  The sample Query Console code below demonstrates this process. 

      (Please ensure this query is executed against the Security database)

      Certificate Template & Template CA import into Client (Browser/SSL Client)

      To enable SSL App Server, we will either

      1) Create Certificate Template to utilize Self Signed Certificate.

      or, 2) Import pre-signed Certificate Certificate into MarkLogic

      In both of the above cases, we will need to import CA used to sign Certificate used by MarkLogic SSL AppServer into Client Browser/SSL Client (below example clients).

      Importing a Self Signed Certificate Authority into Mozilla Firefox 

      Importing a Self Signed Certificate Authority into Windows

      Once template is created, we will link our Template with our App Server to enable SSL based App Server.

      Certificate Authentication: CN as Internal User vs External Name based Internal User

      Difference between above two lies in if Certificate CN field User (demoUser1 in our example) exist in MarkLogic Security Database as Internal User -vs- if User retrieved from Certificate Subject field is mapped as External Name to any Existing User.

      1.) Certificate Authentication: Certificate CN field value as MarkLogic Security Database Internal User

      Steps to configure Certificate based User Authentication for our User demoUser1 as MarkLogic Internal User.

      a.) Create User "demoUser1" with necessary roles in MarkLogic Security (Internal User).

      DemoUser1_Internal_User.png

      b.) On the AppServer page, we will set Authentication schema to "Certificate" with Internal Security to "true". Also, unless you want to have some Users Authenticated as External User as well, you should leave External Security object to "none".

      AppServer_Authentication_Certificate.png

      c.) AppServer would also select CA that will be used to sign Client/User Certificate as accepted Certificate Authorities (please see section: CA Certificate earlier for our example).

      ClientCert_CA.png

      Once Configured, accessing above App Server with Browser with User Certificate (demoUser1) installed will be able to log into MarkLogic with internal demoUser1 (Note- We will also need to assign necessary Roles to Internal User to access resource as needed). 

      2.) Certificate Authentication: User Certificate Subject field value as External Name for Internal User

      Steps to configure Certificate based User Authentication for our User demoUser1 as MarkLogic External Name for Internal User "newUser1".

      a.) Create User "newUser1" with necessary roles in MarkLogic Security (Internal User), and Configure User Certificate Subject field as External Name to User.

      NewUser1_External_Name.png

      b.) Create an External Security object with Certificate based Authentication.

      External_Sec_Object.png

      c.) On External Security Object Configuration itself, select CA that will be used to sign Client/User Certificate as accepted Certificate Authorities (please see section: CA Certificate earlier for our example).

      Please Note - below Configuration is different then configuring Client CA on App Server (required for Internal User).

      External_Sec_ClientCert_CA.png

      d.) For External Name (Cert Subject field) based linkage to Internal User, App Server needs to point to our External Security Object.

      AppServer_ExternalSec_Link.png

      Summary

      MarkLogic may fail to start, with an XDMP-ENCODING error, Initialization: XDMP-ENCODING: (err:XQST0087) Unsupported character encoding: ascii.  This is caused by a mismatch in the Linux Locale character set, and the UTF-8 character set required by MarkLogic.

      Solution

      This issue occurs when the Linux Locale LANG setting is not set to UTF-8.  This can be accomplished by changing the value of LC_ALL to "en_US.UTF-8".  This should be done for the root user for default installations of MarkLogic.  To change the system wide locale settings, the /etc/locale.conf needs to be modified. This can be done using the localectl command.

      • sudo localectl set-locale LANG=en_US.utf8

      If MarkLogic is configured to run as a non-root user, then setting the locale can be done in the users environment.  Setting the value can be done using the $HOME/.i18n file.  If the file does not exist, please create it and ensure it has the following:

      • export LANG="en_US.UTF-8"

      If that does not resolve the issue in the user environment, then you may need to look at setting LC_CTYPE, or LC_ALL for the locale.

      • LC_CTYPE will override the character set part of the LANG setting, but will not change other locale settings.
      • LC_ALL will override both LC_CTYPE and all locale configurations of the LANG setting.

      On Azure MarkLogic VM

      On an Azure MarkLogic VM, you may encounter the error when attempting to start the MarkLogic service.  The MarkLogic process is started using systemctl and not service. To start the service, use the following command:

      • sudo systemctl start MarkLogic

      References

      https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system_administrators_guide/ch-keyboard_configuration

      https://access.redhat.com/solutions/974273

      https://www.unix.com/man-page/centos/1/localectl/

      http://man7.org/linux/man-pages/man1/locale.1.html

      Summary
      Overlarge workloads, underprovisioned environments, or a combination of the two often result in false failovers - where MarkLogic Server will perceive an overloaded node as unavailable. Failover events redistribute the affected node’s traffic to the remaining nodes in the cluster. False failover events, unfortunately, redistribute an overloaded node’s workload to the likely similarly overloaded (and now even fewer number of) nodes remaining in the cluster. While it’s possible to mitigate this scenario in the short term by allowing more time for nodes to talk to one another, long term correction requires throttling of workloads, increasing the environment’s hardware provisioning, or a combination of the two.

      What does failover look like in MarkLogic Server?
      High availability systems require continuity within a cluster. MarkLogic Server delivers high availability by providing fault tolerance - if a node in a MarkLogic cluster fails, other nodes automatically pick up the workload so that the data stored in forests is always available. 

      More specifically, failover in MarkLogic Server is designed to address data node (“d-node”) or forest-level failure. D-node failures can include operating system crashes, MarkLogic Server restarts, power failures, or persistent system failures (hardware failures, for example). A forest-level failure is any disk I/O or other failure that results in an error state on the forest. 

      Failover in MarkLogic Server is "hot" in the sense that switchover occurs immediately to failover hosts already running in the same cluster, with no node restarts required. Failing back from a failover host to the primary host, however, needs to be done manually and requires a node restart.

      When a node is perceived as no longer communicating with the rest of the cluster, and a quorum of greater than 50% of the nodes in the cluster vote to remove the affected node, then a failover event will occur automatically. A node is defined to no longer be communicating with the rest of the cluster when that node fails to respond to cluster wide heartbeats within the defined host timeout.

      What does false failover look like in MarkLogic Server?
      False failover events in MarkLogic Server occur when a node is present and working, but so overloaded that it can no longer respond to cluster wide heartbeats within the specified host timeout. In other words, during false failover events the affected node is so busy that it is unable to communicate its status to the other nodes in the cluster, and consequently unable to prevent the other nodes from voting to remove it from the cluster.

      There could be many reasons causing a busy node/cluster and one of the reasons that’s often overlooked is the infrastructure especially when Virtualization is involved where you can get more out of your resources by allowing VMs to share resources under the assumption that not all systems will need the assigned resources at the same time. However, if you are in a situation where multiple VMs are under load, they can outstrip the available physical resources because more than 100% of the resources have been assigned to the VMs causing what is called a "resource starvation".

      What should I do about false failover events in MarkLogic Server?
      Recall that a node is voted out when it can no longer respond to the rest of the cluster within the specified host timeout. It might be possible to mitigate false failovers in the short term by temporarily increasing the environment’s XDQP and host timeouts. Larger timeouts would give all the nodes in the cluster more time to respond to clusterwide heartbeats, which under heavy load should decrease the frequency of false failover events. That said - DO NOT get in the habit of simply increasing your timeouts to larger and larger values. Increasing timeout to avoid false failovers is, at best, a temporary/short term tactic.

      Long term correction of false failover events requires better alignment between your system's workloads and its hardware provisioning. You could, for example, reduce the workload, or spread the same workload over more time, or increase your system’s hardware provisioning. All of these tactics would free up the affected nodes to respond to the clusterwide heartbeat in a more timely manner, thereby avoiding false failover events. You can read more about aligning your workloads and hardware footprint at:

      1. MarkLogic Performance: Understanding System Resources
      2. Performance Issues in MarkLogic Server: what they look like - and what you should do about them

      Further reading:

      MarkLogic Server is optimized for query performance - if you're coming from a relational database background, you might be surprised by how much storage and storage bandwidth might be used. To better understand this behavior, it's important to recall the following:

      Speed over storage savings - While it makes sense to minimize storage footprint from a storage utilization perspective, MarkLogic Server trades space for time to take advantage of rapidly falling storage prices.

      Lazy Deletes - To better prioritize query performance, in MarkLogic Server record deletions happen in the form of "lazy deletes" where the record (or "document") is first marked as "obsolete" and consequently hidden from query results. The work of actually deleting any one record is deferred for a later time, when multiple obsolete documents can be removed and your remaining data optimized all at the same time and in bulk during a merge operation.

      Index on ingest - MarkLogic Server indexes documents as they're ingested. If your data model and index configuration is where it needs to be, that means your data is ready to be queried as soon as it's in a MarkLogic Server database. If your index configuration isn't quite where you want it, however, revising it means reindexing your entire database, creating lots of obsolete documents and resulting in potentially multiple large merge operations. This is why it's always better in MarkLogic Server to optimize your index settings in smaller environments before propagating those index settings to your bigger environments, and why you'll want to do fewer, bigger index configuration changes instead of many small index configuration changes. Each index configuration change - regardless of size - will trigger a reindex, so you'll want to minimize the number of reindexes you need to perform instead of the minimizing the number of changes in any one reindex.

        In addition to reindexing, other aspects of MarkLogic Server that take up significant storage bandwidth include:

        • Rebalancing - which redistributes your data across your database
        • Local disk failover/database replication - both make copies of your data, and those copies need their own resources
        • Backup/restore - backup is making a copy of your data, and restore is effectively a mass update of your data
        • Mass updates of existing documents - Because of the way updates are performed in MarkLogic Server (read more), updating a large number of existing records will create a large number of obsolete documents, and consequently result in lots of large merge operations. To help reduce performance overhead, and if you have no need to preserve attributes of your existing data (read more), you might want to consider simply reloading data into an empty database, instead (which would result in avoiding the creation of obsolete documents and consequent merges)

        References:

        MarkLogic Fundamentals of Resource Consumption
        Understanding MarkLogic Minimum Disk Space Requirements
        MarkLogic - Lazy Deletes
        Mass Updates - "node-replace" vs "document-insert"

        Introduction

        A MarkLogic cluster is a group of inter-connected individual machines (often called “nodes” or “hosts”) that work together to perform computationally intensive tasks. Clustering offers scalability and high-availability by avoiding single-points of failure. This knowledgebase article contains tips and best practices around clustering, especially in the context of scaling out.

        How many nodes should I have in a cluster?

        If you need high-availability, there should be a minimum of three nodes in a cluster to satisfy quorum requirements.

        Anything special about deploying on AWS?

        Quorum requirements hold true even in a cloud environment where you have Availability Zones (or AZs). In addition to possible node failure, you can also defend against possible AZ failure by splitting your d-Nodes and e-Nodes evenly across three availability zones.

        Load distribution after failover events

        If a d-node experiences a failover event, the remaining d-nodes pick up its workload so that the data stored in its forests remains available.

        Failover forest topology is an important factor in both high-availability and load-distribution within a cluster. Consider the example below of a 3-node cluster where each node has two data forests (dfs) and two local disk-failover forests (ldfs):

        • Case 1: In the event of a fail over, if both dfs (df1.1 and df1.2) from node1 fail over to node2, the load on node2 would double (100% to 200%, where node2 would now be responsible for its own two forests - df2.1 and df2.2 - as well as the additional two forests from node1 - ldf1.1 and ldf1.2)
        • Case 2: In the event of a fail over, if we instead set up the replica forests in such a way that when node1 goes down, df1.1 would fail over to node2 and df1.2 would fail over to node3, then the load increase would be reduced per node. Instead of one node going from 100% to 200% load, two nodes would instead go from 100% to 150%, where node2 is now responsible for its two original forests - df2.1 and df2.2, plus one of node1's failover forests (ldf1.1), and node3 would also now be responsible for its two original forests - df3.1 and df3.2, plus one of node1's failover forests (ldf1.2)

        Growing or scaling out your cluster

        If you need to fold in additional capacity to your cluster, try to add nodes in "rings of three." Each ring of three can have its own independent failover topology, where nodes 1, 2, and 3 will fail over to each other as described above, and nodes 4, 5, and 6 will fail over to each other separate from the original ring of three. This results in minimal configuration changes for any nodes already in your cluster when adding capacity.

        Important related takeaways

        • In addition to the standard MarkLogic Server clustering requirements, you'll also want to pay special attention to the hardware specification of individual nodes
          • Although the hardware specification doesn’t have to be exactly the same across all nodes, it is highly recommended that all d-nodes be of the same specification because cluster performance will ultimately be limited by the slowest d-node in the system
          • You can read more about the effect of slow d-nodes in a cluster in the "Check the Slowest D-Node" section of our "Performance Testing
            With MarkLogic" whitepaper
        • Automatic fail-back after a failover event is not supported in MarkLogic due to the risks of unintentional overwrites, which could potentially result in accidental data loss. Should a failover event occur, human intervention is typically required to manually fail-back. You can read more about the considerations involved in failing a forest back in the following knowledgebase article: Should I flip failed over forests back to their respective masters? What are the risks if I leave them?

         

        Further reading

        Introduction

        There is a lot of useful information in MarkLogic Server's documentation surrounding many of the new features of MarkLogic 9 - including the new SQL implementation, improvements made to the ODBC driver and the new system for generating SQL "view" templates for your data. This article attempts to pull it all together by showing all the measures needed to create a successful connection and to verify that everything is set up correctly and works as expected?

        This guide presents a step-by-step walk through covering the installation of all the necessary components, the configuration of the ODBC driver and the loading of data into MarkLogic in order to create a Template View that will allow a SQL query to be rendered.

        Prerequisites

        We're starting with a clean install of Redhat Enterprise Linux 7:

        $ uname -a
        Linux engrlab-128-084.engrlab.marklogic.com 3.10.0-327.4.5.el7.x86_64 #1 SMP Thu Jan 21 04:10:29 EST 2016 x86_64 x86_64 x86_64 GNU/Linux

        In this example, I'm using yum to manage the additional dependencies (openssl-libs and unixODBC) required for the MarkLogic ODBC driver:

        $ sudo yum install openssl-libs
        Package 1:openssl-libs-1.0.2k-8.el7.x86_64 already installed and latest version
        Nothing to do
        
        $ sudo yum install unixODBC
        Package unixODBC-2.3.1-11.el7.x86_64 already installed and latest version
        Nothing to do
        

        If you want to use the latest version of unixODBC (2.3.4 at the time of writing), you can get it using cURL by running curl -O ftp://ftp.unixodbc.org/pub/unixODBC/unixODBC-2.3.4.tar.gz

        $ curl -O ftp://ftp.unixodbc.org/pub/unixODBC/unixODBC-2.3.4.tar.gz
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed
        100 1787k  100 1787k    0     0   235k      0  0:00:07  0:00:07 --:--:--  371k

        Please note - as per the documentation, this method will require unixODBC to be compiled so additional dependencies may need to be met for this.

        This article assumes that you have downloaded the ODBC driver for MarkLogic Server and the MarkLogic 9 install binary and have those available on your machine:

        $ ll
        total 310112
        -r--r--r-- 1 support support 316795526 Nov 16 04:19 MarkLogic-9.0-3.x86_64.rpm
        -r--r--r-- 1 support support    754596 Nov 16 04:18 mlsqlodbc-1.3-3.x86_64.rpm
        
        Getting started: installing and configuring MarkLogic 9 with an ODBC Server

        We will start by installing and starting MarkLogic 9:

        $ sudo rpm -i MarkLogic-9.0-3.x86_64.rpm
        $ sudo service MarkLogic start
        Starting MarkLogic:                                        [  OK  ]

        From there, we can point our browser at http://host:8001 and walk through the initial MarkLogic install process:

        As soon as the install process has been completed and you have created an Administrator user for MarkLogic Server, we're ready to create an ODBC Application Server.

        To do this, go to Configure > Groups > Default > App Servers and select the Create ODBC tab:

        Next we're going to make the minimal configuration necessary by entering the required fields - the odbc server name, the Application Server module directory root and the port.

        In this example we will configure the Application Server using the following values:

        odbc server name
        ml-odbc
        root
        /
        port
        5432

        After this is done, confirm that the Application Server has been created by going to Configure > Groups > Default > App Servers and ensure that you can see the ODBC Server listed and configured on port 5432 as per the image below:

        Getting started: Setting up the MarkLogic ODBC Driver

        Use RPM to install the ODBC driver:

        $ sudo rpm -i mlsqlodbc-1.3-3.x86_64.rpm
        odbcinst: Driver installed. Usage count increased to 1.
            Target directory is /etc

        Configure the base template as instructed in the installation guide:

        $ odbcinst -i -s -f /opt/MarkLogic/templates/mlsql.template
        Getting started: ensure unixODBC is configured

        To ensure the unixODBC commandline client is configured, you can run isql -h to bring up the help options:

        $ isql -h
        
        **********************************************
        * unixODBC - isql                            *
        **********************************************
        * Syntax                                     *
        *                                            *
        *      isql DSN [UID [PWD]] [options]        *
        *                                            *
        * Options                                    *
        *                                            *
        * -b         batch.(no prompting etc)        *
        * -dx        delimit columns with x          *
        * -x0xXX     delimit columns with XX, where  *
        *            x is in hex, ie 0x09 is tab     *
        * -w         wrap results in an HTML table   *
        * -c         column names on first row.      *
        *            (only used when -d)             *
        * -mn        limit column display width to n *
        * -v         verbose.                        *
        * -lx        set locale to x                 *
        * -q         wrap char fields in dquotes     *
        * -3         Use ODBC 3 calls                *
        * -n         Use new line processing         *
        * -e         Use SQLExecDirect not Prepare   *
        * -k         Use SQLDriverConnect            *
        * --version  version                         *
        *                                            *
        * Commands                                   *
        *                                            *
        * help - list tables                         *
        * help table - list columns in table         *
        * help help - list all help options          *
        *                                            *
        * Examples                                   *
        *                                            *
        *      isql WebDB MyID MyPWD -w < My.sql     *
        *                                            *
        *      Each line in My.sql must contain      *
        *      exactly 1 SQL command except for the  *
        *      last line which must be blank (unless *
        *      -n option specified).                 *
        *                                            *
        * Please visit;                              *
        *                                            *
        *      http://www.unixodbc.org               *
        *      nick@lurcher.org                      *
        *      pharvey@codebydesign.com              *
        **********************************************

        If you're not seeing the above message, it could be possible that there's another application on your system overriding this, for this configuration, the isql command is found at /usr/bin/isql:

        $ which isql /usr/bin/isql
        Getting started: initial connection test

        If you're happy that isql is correctly, installed, we're ready to test the connection using isql -v:

        $ isql -v MarkLogicSQL admin admin
        +---------------------------------------+
        | Connected!                            |
        |                                       |
        | sql-statement                         |
        | help [tablename]                      |
        | quit                                  |
        |                                       |
        +---------------------------------------+
        SQL>

        Let's confirm that it's really working by loading some data into MarkLogic and creating an SQL view around that data.

        Loading sample data into MarkLogic

        To load data, we're going to use Query Console to insert the same sample data that is created in the Quick Start Documentation:

        To access Query Console, point your browser at http://host:8000 and make note of the following:

        Ensure the database is set to Documents (or at least, matches the database specified by your ODBC Application Server) and ensure that the Query Type is set to JavaScript

        When these are both set correctly, run the code to generate sample data (note that this data is taken from the quick start guide and reproduced here for convenience):

        After that has run, you should see a null response back from the query:

        To confirm that the data was loaded successfully, you can use the Explore button.  You should now see that 22 employee documents (rows) are now in the database:

        Create the template view

        Now the documents are loaded, a tabular view for that data needs to be created.

        Ensure the database is (still) set to Documents (or at least, matches the database specified by your ODBC Application Server) and ensure that the Query Type is now set to XQuery

        As soon as this is set, you can run the code below to generate the template view (note that this data is taken from the quick start guide and reproduced here for convenience):

        And to confirm this was loaded, Query Console should report an empty sequence was returned.

        Test the template using a SQL Query

        The database should remain set to Documents and ensure that the Query Type is now set to SQL:

        Then you can run the following SQL Query:

        SELECT * FROM employees

        If everything has worked correctly, Query Console should render a view of the table in response to your query:

        Test the SQL Query via the ODBC Driver

        All that remains now is to go back to the shell and test the same connection over ODBC.

        To do this, we're going to use the isql command again and run the same request there:

        $ isql -v MarkLogicSQL admin admin
        +---------------------------------------+
        | Connected!                            |
        |                                       |
        | sql-statement                         |
        | help [tablename]                      |
        | quit                                  |
        |                                       |
        +---------------------------------------+
        SQL> select * from employees
        <<< RESPONSE CUT >>>
        SQLRowCount returns 7
        7 rows fetched
        

        Further reading

        Introduction

        This article details changes to the upgrade procedures for MarkLogic 9 AMIs.

        MarkLogic 9 now supports 1-click deployment in AWS Marketplace. This is an addition to existing options of manual launch of an AMI and launching MarkLogic clusters via CloudFormation templates. In order to make 1-click launch possible, our AMIs have pre-configured data volume (device on /dev/sdf).  The updated cloud formation templates account for the pre-configured data volume. This change also requires a different approach to our documented upgrade process.

        Details

        As per MarkLogic EC2 Guide, the main goal of the upgrade is to update AMI IDs in CloudFormation in order to upgrade all instances in the stack. There is now an additional step to handle the blank data volume that is pre-configured on MarkLogic AMIs.

        Always backup your data before attempting any upgrade procedures!

        Scenario 1:  You are using unmodified CF templates that were published by MarkLogic on http://developer.marklogic.com/products/cloud/aws starting from version 8.0-3.

        1. Update your CloudFormation stack with the latest template as there were no breaking changes since 8.0-3. The current templates for MarkLogic 9 include new AWS regions, new AMI IDs, and code to remove blank data volume that is bundled with current AMIs.
        2. In the EC2 Dashboard, stop one instance at the time and wait for it to be replaced with a new one.
        3. For a rolling upgrade (and as a good practice) terminate the other nodes one by one starting with the node that has Security database. They will come up and reconnect without any UI interaction.
        4. Go to 8001 port on any new instance where an upgrade prompt should be displayed.
        5. Click OK and wait for the upgrade to complete on the instance.

        Scenario 2: You made some changes to MarkLogic templates or you are using custom templates.

        1. Download current templates from http://developer.marklogic.com/products/cloud/aws.
        2. Locate the AMI IDs by searching for "AWSRegionArch2AMI" block in the template.
          "AWSRegionArch2AMI": {
                "us-east-1": {
                  "HVM": "ami-54a8652e"
                },
                "us-east-2": {
                  "HVM": "ami-2ab29f4f"
                }, ...
        3. Locate AMI IDs in the old template and replace them with the ones from the new template. 
        4. Locate "BlockDeviceMappings" section in the new template that was downloaded in step 1. This block of code was added to remove blank volume that is part of the new 1-click AMIs.
        5. Update the old template to include "BlockDeviceMappings" as a property of LaunchConfig. There will be one or three LaunchConfig blocks depending on the template used. Those can by located by searching for "AWS::AutoScaling::LaunchConfiguration". Here is an example of the new property under LaunchConfig.
          "LaunchConfig":
          {
            "Type":"AWS::AutoScaling::LaunchConfiguration",
          "Properties":
          {
          "BlockDeviceMappings":
          [{
          "DeviceName":"/dev/sdf",
          "NoDevice":true,
          "Ebs": {}
          }],
          ...
        6. Once all the changes are saved, update your stack with the updated CloudFormation template. Make sure the stack update is complete.
        7. In the EC2 Dashboard, terminate nodes one by one starting with the node that has Security database. New nodes will come up after a couple of minutes and reconnect without any UI interaction.
        8. Wait for all nodes to be up and in green state.
        9. Go to 8001 port on any new instance where an upgrade prompt should be displayed.
        10. Click OK and wait for the upgrade to complete on the instance.

        Scenario 3: You have instances that were brought up directly from MarkLogic AMI. For each MarkLogic instance in your cluster, do the following:

        1. Terminate the instance.
        2. Launch a new instance from the upgraded AMI.
        3. Detach blank volume that is mounted on /dev/sdf (should be 10GB in size)
        4. Attach the EBS data volume associated with the original instance.

        More details on how to update CloudFormation stack can be found at http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks.html

        Introduction: the decimal type

        In order to be compliant with the XQuery specification and to satisfy the needs of customers working with financial data, MarkLogic Server implements a decimal type, available in XQuery and server-side JavaScript.

        Decimal type has been implemented for very specific requirements, decimals have about a dozen more bits of precision than doubles but take up more memory and arithmetic operations over them are much slower.

        Use the double where possible

        Unless you have a specific requirement to use a Decimal data type, in most case it's better and faster to use the double data type to represent large numbers.

        Specific details about the decimal data type

        If you still want or need to use a decimal data type below are its limitations and details on how exactly it is implemented in MarkLogic Server:

        o   Precision

        • How many decimal digits of precision does it have?

        The MarkLogic implementation of xs:decimal representation is designed to meet the XQuery specification requirements to provide at least 18 decimal digits of precision. In practice, up to 19 decimal digits can be represented with full fidelity.

        • If it is a binary number, how many binary digits of precision does it have?

         A decimal number is represented inside MarkLogic with 64 binary bits of digits and an additional 64 bits of sign and a scale (specifies where the decimal point is).

        • What are the exact upper and lower bounds of its precision?

        -18446744073709551615 to 18446744073709551615 

        Any operation producing number smaller or bigger than this range will result in XDMP-DECOVRFLW error (decimal overflow)

        o   Scale

        • Does it have a fixed scale or floating scale?

        It has a floating scale.

        • What are the limitations on the scale?

        -20 to 0

        So you can only represent numbers between 1 * (2^-64) and 18446744073709551615

        • Is the scale binary or decimal?

        Decimal

        • How many decimal digits can it scale?

        20

        • How many binary digits can it scale?

        N/A

        • What is the smallest number it can represent and the largest?

        smallest: -1*(2^64)
        closest to zero: 1*(10^-20)
        largest: (2^64)

        • Are all integers safe or does it have a limited safe range for integers?

        It can represent 64 bit unsigned integers with full fidelity.

         

        o   Limitations

        • Does it have binary rounding errors?

        The division algorithm on Linux in particular does convert to an 80-bit binary floating point representation to calculate reciprocals - which can result in binary rounding errors. Other arithmetic algorithms work solely in base 10.

        • What numeric errors can it throw and when?

        Overflow: Number is too big or small to represent
        Underflow: Number is close to zero to represent
        Loss of precision: The result has too many digits of precision (essentially the 64bit digits value has overflowed)

        • Can it represent floating point values, such as NaN, -Infinity, +Infinity, etc.?

         No

        o   Implementation

        • How is the DECIMAL data type implemented?

        It has a representation with 64 bits of digits, a sign, and a base 10 negative exponent (fixed to range from -20 to 0). So the value is calculated like this:

        sign * digits * (10 ^ -exponent)

        • How many bytes does it consume?

        On disk, for example in triple indexes, it's not a fixed size as it uses integer compression. At maximum, the decimal scalar type consumes 16 bytes per value: eight bytes of digits, four bytes of sign, and four bytes of scale. It is not space efficient but it keeps the digits aligned on eight-byte boundaries.

        Summary

        A database or forest backup in MarkLogic Server may be significantly slower than just performing a file copy (cp in Linux).  Why is this so?

        Details

        Using cp on very large files on a large-memory linux can produce huge amounts of dirty pages that can saturate i/o channels for minutes in order to flush data to the disk. Cp also doesn’t wait for the data to be written before returning.  As a result, cp is very unfriendly to other applications running on the same system.

        When MarkLogic Server performs a backup, it works hard not to saturate any subsystem or resource. MarkLogic takes care that the number of dirty pages at any one time is never very large, and it keeps the i/o queues short so that any concurrent database queries and updates are not significantly impacted by the backup. Finishing the backup in the fastest possible time is not the priority. 

        Can I make it go faster?

        Yes, there is a diagnostic trace event “Unthrottle Backup” that turns off throttling in MarkLogic. However, even with throttling turned off, MarkLogic will still work to keep the number of dirty pages low.

        The diagnostic trace event can be enabled from the MarkLogic Server Admin UI by navigating to -> Configure -> Groups -> {group-name} -> Diagnostic:  trace events activated = true; Add  “Unthrottle Backup” (without quotes); Press "ok".

        Introduction

        MarkLogic automatically provides 

        • ANSI REPEATABLE READ level of isolation for update transactions, and 
        • Serializable isolation for read-only (query) transactions.

        MarkLogic can be made to provide ANSI SERIALIZABLE isolation for update transactions, but doing so requires developers to manage their own predicate locks.

        Isolation Levels - Background

        There are many possible levels of isolation, and many different taxonomies of isolation levels. The most common taxonomy (familiar to those with a RDBMS background) is the one defined by ANSI SQL, which defines four levels of isolation based on read phenomena that are possible at each level. ANSI has a definition for each phenomenon, but these definitions are open to interpretation. Broad interpretation results in more rigorous criteria for each isolation level (and therefore better isolation at each level), whereas strict interpretation results in less rigorous isolation at each level. Here I’ll use a shorthand notation to describe these phenomena, and will use the broad rather than the strict interpretation. The notation specifies the operation, the transaction performing the operation, and the item or domain on which the operation is performed. Operations in my notation are:

        • Write (w)
        • Read (r)
        • Commit (c)
        • Abort/rollback (a)

        An example of this shorthand: w1[x] means transaction1 writes to item x.

        Now the phenomena:

        • A dirty read happens when a transaction T2 reads an item that is being written by concurrently running transaction T1. In other words: w1[x]…r2[x]…((c1 or a1) and (c2 or a2) in any order). This phenomenon could lead to an anomaly in the case where T1 later aborts, and T2 has then read a value that never existed in the database.
        •  A non-repeatable read happens when a transaction T2 writes an item that was read by a transaction T1 prior to T1 completing. In other words: r1[x]…w2[x]…((c1 or a1) and (c2 or a2) in any order). Non-repeatable reads don’t produce the same anomalies as dirty reads, but can produce errors in cases where T1 relies on the value of x not changing between statements in a multi-statement transaction (e.g. reading and then updating a bank account balance).
        • A phantom read happens when a transaction T1 retrieves a set of data items matching some search condition and concurrently running transaction T2 makes a change that modifies the set of items that match that condition. In other words: (r1[P] and w2[x in P] in any order)…((c1 or a1) and (c2 or a2) in any order), where P is a set of results. Phantom reads are usually less serious than dirty or non-repeatable reads because it generally doesn’t matter if item x in P is written before or after T1 finishes unless T1 is itself explicitly reading x. And in this case the phenomenon would no longer be a phantom, but would instead be a dirty or non-repeatable read per the definitions above. That said, there are some cases where phantom reads are important.

         The isolation levels ANSI defines are based on which of these three phenomena are possible at that isolation level. They are:

        • READ UNCOMMITTED – all three phenomena are possible at this isolation level.
        • READ COMMITTED – Dirty reads are not possible, but non-repeatable and phantom reads are.
        • REPEATABLE READ – Dirty and non-repeatable reads are not possible, but phantom reads are.
        • SERIALIZABLE – None of the three phenomena are possible at this isolation level.

        Note that as defined above, ANSI SERIALIZABLE is not sufficient for transactions to be truly serializable (in the sense that running them concurrently and running them in series would in all cases produce the same result), so SERIALIZABLE is an unfortunate choice of names for this isolation level, but that’s what ANSI called it.

        Update Transaction Locks

        Typically, a DBMS will avoid dirty and non-repeatable reads by taking locks on records (called item locks). Locks are either shared locks (which can be held by more than one transaction) or exclusive locks (which can be held by only one transaction at a time). In most DBMSes (including MarkLogic), locks taken when reading an item are shared and locks taken when writing an item are exclusive.

        MarkLogic prevents dirty and non-repeatable reads in update transactions by taking item locks on items that are being read or written during a transaction and releasing those locks only on completion of the transaction (post-commit or post-abort). When a transaction needs to lock an item on which another transaction has an exclusive lock, that transaction waits until either the lock is released or the transaction times out. Deadlock detection prevents cases where two transactions are waiting on each other for exclusive locks. In this case one of the transactions will abort and restart.

        In addition, MarkLogic prevents some types of phantom reads by taking item locks on the set of items in a search result. This prevents phantom reads involving T2 removing an item in a set that T1 previously searched, but does not prevent phantom reads involving T2 inserting an item in a set that T1 previously searched, or those involving T2 searching for items and seeing a deletion caused by T1.

        Avoiding All Phantom Reads

        To avoid all phantom reads via locking, it is necessary to take locks not just on items that currently match the search criteria, but also on all items that could match the search criteria, whether they currently exist in the database or not. Such locks are called predicate locks. Because you can search for pretty-much anything in MarkLogic, guaranteeing a predicate lock for arbitrary searches would require locking the entire database. From a concurrency and throughput perspective, this is obviously not desirable. MarkLogic therefore leaves the decision to take predicate locks and the scope of those locks in the hands of application developers. Because the predicate domain can frequently be narrowed down with some application-specific knowledge, this provides the best balance between isolation and concurrency. To take a predicate lock, you lock a synthetic URI representing the predicate domain in every transaction that reads from or writes to that domain. You can take shared locks on a synthetic URI via fn:doc(URI). Exclusive locks are taken via xdmp:lock-for-update(URI).

        Note that predicate locks should only be taken in situations where phantom reads are intolerable. If your application can get by with REPEATABLE READ isolation, you should not take predicate locks, because any additional locking results in additional serialization and will impact performance.

        Summary

        To summarize, MarkLogic automatically provides ANSI REPEATABLE READ level of isolation for update transactions and true serializable isolation for read-only (query) transactions. MarkLogic can be made to provide ANSI SERIALIZABLE isolation for update transactions, but doing so requires developers to manage their own predicate locks.

        Summary

        Text is stored in MarkLogic Server in Unicode NFC normalized form.

        Discussion

        In MarkLogic Server, all text is converted into Unicode NFC normalized form before tokenization and storage. 

        Unicode considers NFC-compatible characters to be essentially equivalent. See the Unicode normalization FAQ and Conformance Requirements in the Unicode Standard.

        Example

        For example, consider the NFC equivalence of the codepoints x2126 (&#x2126) and x03A9 (&#x03A9). This is shown for the x2126 entry in the Unicode code chart for the U2100 block.

        You can see the effects of normalization alone, and during tokenization, by running the following in MarkLogic Server's Query Console:

        xquery version "1.0-ml";
        (: equivalence of Ω forms :)
        let $s := fn:codepoints-to-string (xdmp:hex-to-integer ('2126'))
        let $token := cts:tokenize ($s)
        return (
            'original: '||xdmp:integer-to-hex (fn:string-to-codepoints ($s)),
            'normalized: '||xdmp:integer-to-hex (fn:string-to-codepoints (fn:normalize-unicode ($s, 'NFC'))),
            'tokenized: '||xdmp:describe ($token, (), ())
        )
        

        The results show the original value, the normalized value, and the resulting token:

        original: 2126
        normalized: 3a9
        tokenized: cts:word("&#x03a9;")
        

        Abstract

        In MarkLogic Server version 9, the default tokenization and stemming code has been changed for all languages (except English tokenization). Some tokenization and stemming behavior will change between MarkLogic 8 and MarkLogic 9. We expect that, in most cases, results will be better in MarkLogic 9.

        Information is given for managing this change in the Release Notes at Default Stemming and Tokenization Libraries Changed for Most Languages, and for further related features at New Stemming and Tokenization.

        In-depth discussion is provided below for those interested in details.

        General Comments on Incompatibilities

        General implications of tokenization incompatibilities

        If you do not reindex, old content may no longer match the same searches, even for unstemmed searches.

        General tokenization incompatibilities

        There are some edge-case changes in the handling of apostrophes in some languages; in general this is not a problem, but some specific words may include/break at apostrophes.

        Tokenization is generally faster for all languages except English and Norwegian (which use the same tokenization as before).

        General implications of stemming incompatibilities

        Where there is only one stem, and it is now different:  Old data will not match stemmed searches without reindexing, even for the
        same word.

        Where the new stems are more precise:  Content that used to match a query may not match any more, even with
        reindexing.

        Where there are new stems, but the primary stem is unchanged:  Content that used to not match a query may now match it with advanced
        stemming or above. With basic stemming there should be no change.

        Where the decompounding is different, but the concatenation of the components is the same:  Under decompounding, content may match a query when it used to not match, or may not match a query when it used to match, when the query or content involves something with one of the old/new components. Matching under advanced or basic stemming would be generally the same.

        General stemming incompatibilities

        • MarkLogic now has general algorithms backing up explicit stemming dictionaries.  Words not found in the default dictionaries will sometimes be stemmed when they previously were not.
        • Diminutives/augmentatives are not usually stemmed to base form.
        • Comparatives/superlatives are not usually stemmed to base form.
        • There are differences in the exact stems for pronoun case variants.
        • Stemming is more precise and restricted by common usage. For example, if the past participle of a verb is not usually used as an adjective, then the past participle will not be included as an alternative stem. Similarly, plural forms that only have technical or obscure usages might not stem to the singular form.
        • Past participles will typically include the past participle as an alternative stem.
        • The preferred order of stems is not always the same: this will affect search under basic stemming.

        Reindexing

        It is advisable to reindex to be sure there are no incompatibilities. Where the data in the forests (tokens or stems) does not match the current behavior, reindexing is recommended. This will have to be a forced reindex or a reload of specific documents containing the offending data. For many languages this can be avoided if queries do not touch on specific cases. For certain languages (see below) the incompatibility is great enough that it is essential to reindex.

        Language Notes

        Below we give some specific information and recommendations for various languages.

        Arabic

        stemming

        The Arabic dictionaries are much larger than before. Implications:  (1) better precision, but (2) slower stemming.

        Chinese (Simplified)

        tokenization

        Tokenization is broadly incompatible.

        The new tokenizer uses a corpus-based language model.  Better precision can be expected.

        recommendation

        Reindex all Chinese (simplified).

        Chinese (Traditional)

        tokenization

        Tokenization is broadly incompatible.

        The new tokenizer uses a corpus-based language model.  Better precision can be expected.

        recommendation

        Reindex all Chinese (traditional).

        Danish

        tokenization

        This language now has algorithmic stemming, and may have slight tokenization differences around certain edge cases.

        recommendation

        Reindex all Danish content if you are using stemming.

        Dutch

        stemming

        There will be much more decompounding in general, but MarkLogic will not decompound certain known lexical items (e.g., "baastardwoorden").

        recommendation

        Reindex Dutch if you want to query with decompounding.

        English

        stemming

        British variants may include the British variant as an additional stem, although the first stem will still be the US variant.

        Stemming produces more alternative stems. Implications are (1) stemming is slightly slower and (2) index sizes are slightly larger (with advanced stemming).

        Finnish

        tokenization

        This language now has algorithmic stemming and may have slight tokenization differences around certain edge cases.

        recommendation

        Reindex all content in this language if you are using stemming.

        French

        See general comments above.

        German

        stemming

        Decompounding now applies to more than just pure noun combinations. For example, it applies to "noun plus adjectives" compound terms. Decompounding is more aggressive, which can result in identification of more false compounds. Implications: (1) stemming is slower, (2) decompounding takes more space, and (3) for compound terms, search gives better recall, with some loss of precision.

        recommendation

        Reindex all German.

        Hungarian

        tokenization

        This language now has algorithmic stemming and may have slight tokenization differences around certain edge cases.

        recommendation

        Reindex all content in this language if you are using stemming.

        Italian

        See general comments above.

        Japanese

        tokenization

        Tokenization is broadly incompatible.

        The tokenizer provides internal flags that the stemmer requires.  This means that (1) tokenization is incompatible for all words at the storage level due to the extra information and (2) if you install a custom tokenizer for Japanese, you must also install a custom stemmer.

        stemming

        Stemming is broadly incompatible.

        recommendation

        Reindex all Japanese content.

        Korean

        stemming

        Particles (e.g., 이다) are dropped from stems; they used to be treated as components for decompounding.

        There is different stemming of various honorific verb forms.

        North Korean variants are not in the dictionary, though they may handled by the algorithmic stemmer.

        recommendation

        Reindex Korean unless you use decompounding.

        Norwegian (Bokmal)

        stemming

        Previously, hardly any decompounding was in evidence; now it is pervasive.

        Implications: (1) stemming is slower, (2) decompounding takes more space, and (3) search gives better recall, with some loss of precision, at least where it comes to compounds.

        recommendation

        Reindex Bokmal if you want to query with decompounding.

        Norwegian (Nynorsk)

        stemming

        Previously hardly any decompounding was in evidence; now it is pervasive.

        Implications: (1) stemming is slower, (2) decompounding takes more space, and (3) search gives better recall, with some loss of precision, at least where it comes to compounds.

        recommendation

        Reindex Nynorsk if you want to query with decompounding.

        Norwegian (generic 'no')

        stemming

        Previously 'no' was treated as an unsupported language; now it is treated as both Bokmal and Nynorsk: for a word present in both dialects, all stem variants from both will be present.

        recommendation

        Do not use 'no' unless you really must; reindex if you want to query it.

        Persian

        See general comments above.

        Portuguese

        stemming

        More precision with respect to feminine variants (e.g., ator vs atriz).

        Romanian

        tokenization

        This language now has algorithmic stemming and may have slight tokenization differences around certain edge cases.

        recommendation

        Reindex all content in this language if you are using stemming.

        Russian

        stemming

        Inflectional variants of cardinal or ordinal numbers are no longer stemmed to a base form.

        Inflectional variants of proper nouns may stem together due to the backing algorithm, but it will be via affix-stripping, not to the nominal form.

        Stems for many verb forms used to be the perfective form; they are now the simple infinitive.

        Stems used to drop ё but now preserve it.

        recommendation

        Reindex all Russian.

        Spanish

        See general comments above.

        Swedish

        stemming

        Previously hardly any decompounding was in evidence; now it is pervasive.

        Implications: (1) stemming is slower, (2) decompounding takes more space, and (3) search gives better recall, with some loss of precision, at least where it comes to compounds.

        recommendation

        Reindex Swedish if you want to query with decompounding.

        Tamil

        tokenization

        This language now has algorithmic stemming and may have slight tokenization differences around certain edge cases.

        recommendation

        Reindex all content in this language if you are using stemming.

        Turkish

        tokenization

        This language now has algorithmic stemming and may have slight tokenization differences around certain edge cases.

        recommendation

        Reindex all content in this language if you are using stemming.

        What is MarkLogic Data Hub?

        MarkLogic’s Data Hub increases data integration agility, in contrast to time consuming upfront data modeling and ETL. Grouping all of an entity’s data into one consolidated record with that data’s context and history, a MarkLogic Data Hub provides a 360° view of data across silos. You can ingest your data from various sources into the Data Hub, standardize your data - then more easily consume that data in downstream applications. For more details, please see our Data Hub documentation.

        Note: Prior to version 5.x, Data Hub was previously known as Data Hub Framework (DHF)

        Takeaways:

        • In contrast to previous versions, Data Hub 5 is largely configuration-based. Upgrading to Data Hub 5 will require either:
          • Conversion of legacy flows from the code-based approach of previous versions to the configuration-based format of Data Hub 5
          • Executing your legacy flows with the “hubRunLegacyFlow” Gradle task
        • It’s very important to verify the “Version Support” information on the Data Hub GitHub README.md before installing or upgrading to any major Data Hub release

        Pre-requisites:

        One of the pre-requisites for installing Data Hub is to check for the supported/compatible MarkLogic Server version. For details, see our version compatibility matrix. Other pre-requisites can be seen here.

        New installations of Data Hub

        We always recommend installing the latest Data Hub version compatible with your current MarkLogic Server version. For example:

        -If a customer is running MarkLogic Server 9.0-7, one should install the most recent compatible Data Hub version (5.0.2), even if the previous Data Hub versions (such as 5.0.1, 5.0.0, 4.x and 3.x) also work with server version 9.0-7.

        -Similarly, if a customer is running 9.0-6, the recommended Data Hub version would be 4.3.1 instead of previous versions 4.0.0, 4.1.x, 4.2.x and 3.x.

        Note: A specific MarkLogic server version can be compatible with multiple Data Hub versions and vice versa, which allows independent upgrades of either Data Hub or MarkLogic Server.

         

        Upgrading from a previous version

        1. To determine your upgrade path, first find your current Data Hub version in the “Can upgrade from” column in the version compatibility matrix.
        2. While Data Hub should generally work with future server versions, it’s always best to run the latest Data Hub version that's also explicitly listed as compatible with your installed MarkLogic Server version.
        3. If required, make sure to upgrade your MarkLogic Server version to be compatible with your desired Data Hub version. You can upgrade MarkLogic Server and Data Hub independently of each other as long as you are running a version of MarkLogic Server that is compatible with the Data Hub version you plan to install. If you are running an older version of MarkLogic Server, then you must upgrade MarkLogic Server first, before upgrading Data Hub.

        Note: Data Hub is not designed to be 'backwards' compatible with any version before the MarkLogic Server version listed with the release. For example, you can’t use Data Hub 3.0.0 on 9.0-4 – you’ll need to either downgrade to Data Hub 2.0.6 while staying on MarkLogic Server 9.0-4, or alternatively upgrade MarkLogic Server to version 9.0-5 while staying on Data Hub 3.0.0.

        • Example 1 - Scenario where you DO NOT NEED to upgrade MarkLogic Server:

                 

        • Current Data Hub version: 4.0.0
        • Target Data Hub version: 4.1.x
        • ML server version: 9.0-9
        • The “Can upgrade from” value for the target version shows 2.x which means you need to be at least be on Data Hub 2.x. Since, the current Data Hub version is 4.0.0, this requirement has been met.
        • Unless there is a strong reason for choosing 4.1.x, we highly recommend to upgrade to the latest version compatible with MarkLogic Server 9.0-9 in 4.x - which in this example is 4.3.2. Consequently, the recommended upgrade path here becomes 4.0.0-->4.3.2 instead of 4.0.0-->4.1.x.
        • Since 9.0-9 is supported by the recommended Data Hub version 4.3.2, there is no need to upgrade ML server.
        • Hence, recommended path will be Data Hub 4.0.0-->4.3.2

         

        • Example 2 - Scenario where you NEED to upgrade MarkLogic Server:

                   

        • Current Data Hub version: 3.0.0
        • Target Data Hub version: 5.0.2
        • ML server version: 9.0-6
        • The “Can upgrade from” value for the target version shows Data Hub version 4.3.1 which means you need to be at least be on 4.3.x (4.3.1 or 4.3.2 depending on your MarkLogic Server version). Since the current Data Hub version 3.0.0 doesn’t satisfy this requirement, upgrade path after this step becomes Data Hub 3.0.0-->4.3.x
        • As per the matrix, the latest compatible Data Hub version for 9.0-6 is 4.3.1, so the path becomes 3.0.0-->4.3.1
        • From the matrix, the minimum supported MarkLogic Server version for 5.0.2 is 9.0-7, so you will have to upgrade your MarkLogic Server version before upgrading your Data Hub version to 5.0.2.
        • Because 9.0-7 is supported by all 3 versions under consideration (3.0.0, 4.3.1 and 5.0.2), recommended path can be either
          1. 3.0.0-->4.3.1-->upgrade MarkLogic Server version to at least 9.0-7-->upgrading Data Hub version to 5.0.2
          2. Upgrading MarkLogic Server version to at least 9.0-7-->upgrade Data Hub from 3.0.0 to 4.3.1-->upgrade Data Hub version to 5.0.2
        • Recall that Data Hub 5 moved to a configuration-based approach from previous versions’ code-based approach. Upgrading to Data Hub 5 from a previous major version will require either:
          • Conversion of legacy flows from the code-based approach of previous versions to the configuration-based format of Data Hub 5
          • Executing your legacy flows with the “hubRunLegacyFlow” Gradle task

        Links for Reference:

        https://docs.marklogic.com/datahub/upgrade.html

         

         

         

         

        Summary

        In addition to the multiple language support in MarkLogic Server, MarkLogic Server also supports ISO codes listed below for representation of names for these languages.

         

        MarkLogic supported ISO codes

        MarkLogic supports following ISO codes for the representation of language names:
        1. ISO 639-1
        2. ISO 639-2/T , and
        3. ISO 639-2/B

        Further, NOTE:
        a. MarkLogic uses the 2-letter ISO 639-1 codes, including zh's zh_Hant variant, and
        b. MarkLogic uses the 3-letter ISO 639-2 codes. To get a more specific list of ISO 639-2 codes go to http://www.loc.gov/standards/iso639-2/php/code_list.php


        Again, MarkLogic only supports below listed languages, http://docs.marklogic.com/guide/search-dev/languages#id_64343
        English
        French
        Italian
        German
        Russian
        Spanish
        Arabic
        Chinese (Simplified and Traditional)
        Korean
        Persian (Farsi)
        Dutch
        Japanese
        Portuguese
        Norwegian (Nynorsk and Bokmål)
        Swedish

         

        Suggestion

        The function cdict:get-languages() can be used to get ISO Codes for all supported languages. Here is an example of the usage:

          xquery version "1.0-ml";
          import module namespace cdict = "http://marklogic.com/xdmp/custom-dictionary" 
        		  at "/MarkLogic/custom-dictionary.xqy";
        
          cdict:get-languages()
        
          ==> ("en", "ja", "zh", "zh_Hant")

         

        Summary

        There are many different kinds of locks present in MarkLogic Server.

        Transaction locks are obtained when MarkLogic Server detects the potential of a transaction to change the database, at which point the server considers it to be an update transaction. Once a lock is acquired, it is held until the transaction ends. Transaction locks are set by MarkLogic Server either explicitly or implicitly depending on the configured commit mode. Because it's very common to see poorly performing application code written against MarkLogic Server due to unintentional locking, the two concepts of transaction type and commit mode have been combined into a single, simpler control - transaction mode

        MarkLogic Server also has the notion of document and directory locks. Unlike transaction locks, document and directory locks must be set explicitly and are persistent in the database - they are not tied to a transaction. Document locks also apply to temporal documents. Any version of a temporal document can be locked in the same way as a regular document.

        Cache partition locks are used by threads which can make changes to a cache. Threads need to acquire a write lock for both the relevant cache and cache partition before it makes the change.

        Transaction Locks and Commit Mode vs. Transaction Mode

        Transaction lock types are associated with transaction types. Query type transactions do not use locks to obtain a consistent view of data, but rather the state of the data at a particular timestamp. Update type transactions have the potential to change the database and therefore require locks on documents to ensure transactional integrity. 

        So - if an update transaction type is run in explicit commit mode, then locks are acquired for all statements in an update transaction -  whether or not those statements perform updates. Once a lock is acquired, it is held until the transaction ends. If an update transaction type is run in auto commit mode, by default MarkLogic Server detects the transaction type through static analysis of the first statement in that transaction. If the server detects the potential for updates during static analysis, then the transaction is considered an update transaction - which results in a write lock being acquired.

        In multi-statement transactions, if an update transaction type is run in explicit commit mode, then the transaction is an update transaction and locks are acquired for all statements in an update transaction - even if no update occurs. In auto commit mode MarkLogic Server determines the transaction type through static analysis of the first statement. If in auto commit mode, and the first statement is a query, and an update occurs later in that transaction, MarkLogic Server will throw an exception. In multi-statement transactions, the transaction ends only when it is explicitly committed or rolled back. Failure to explicitly commit or roll back a multi-statement transaction might retain locks until the transaction times out or reaches the end of the session - at which point the transaction rolls back.

        Best practices:

        1) Avoid unnecessary transaction locks or holding on to transaction locks for too long. For single-statement transactions, do not explicitly set the transaction type to update if running a query. For multi-statement transactions, always explicitly commit or rollback the relevant transaction to free transaction locks as soon as possible.

        2) It's very common for users to write code that unintentionally takes write locks. One of the best ways to avoid unintentional locks is to use transaction modes instead of transaction types/commit modes. Transaction modes combines transaction type and commit mode into a single configurable value. You can read more about transaction mode in our documentation at: Transaction Mode Overview.

        3) Be aware that when setting transaction mode, the xdmp:commit and xdmp:update XQuery prolog options affect only the next transaction created after their declaration; they do not affect an entire session. Use xdmp:set-transaction-mode or xdmp.setTransactionMode if you need to change the transaction mode settings at the session level.

        Document and Directory Locks

        Document and directory locks are not tied to a transaction. The locks must be explicitly set and stored as a lock document in a MarkLogic Server database. So the locks can last a specified time period or be persistent until explicitly unlocked.

        Each document and directory can have a lock. The lock can be used as part of an application's update strategy. MarkLogic Server provides the flexibility for client to set up a policy of how to use the locks that suitable for client environment. For example, if only one user is allowed to update the specific database objects, you can set the lock to be "exclusive." In contrast, if you have multiple users updating the same database object, you can set the lock to be "shared."

        Unlike transaction locks, document and directory locks are persistent in the database and are consequently searchable.   

        Temporal Document Locks

        A temporal collection contain bi-temporal or uni-temporal documents. Each version of a temporal document can be locked in the same way as a regular, non-temporal document.

        Cache and Cache Partition Locks

        If a thread attempts to make a change to database cache, it needs to acquire a write lock for the relevant cache and cache partition. This cache or cache partition write lock is serializes write access, which keep date in the relevant cache or cache partition thread-safe. While cache and cache partition locks are short-lived, be aware that in the case of a single cache partition, all of the threads needing to access that would need to serialize through a single cache partition write lock. For multiple cache partitions, multiple write locks can be acquired with one lock per partition - which allows multiple threads to make concurrent cache partition updates.

        References and Additional Reading:

        1) Understanding Transactions in MarkLogic Server

        2) Cache Partitions

        3) Document and Directory Locks

        4) Understanding Locking in MarkLogic Server Using Examples

        5) Understanding XDMP-DEADLOCK

        6) Understanding the Lock Trace Diagnostic Trace Event

        7) How MarkLogic Server Supports ACID Transactions

        Updates are a key aspect of data manipulation in MarkLogic Server, and can sometimes be performance intensive, especially if performed in bulk. Therefore one should take time to consider exactly how your application will perform updates. Moreover, a given document often is associated with data other than its content, such as attributes, permissions, collections, quality, and metadata - all of these attributes can be affected by a chosen update method.

        MarkLogic Server offers various methods to update a document, but there are two major ways to do it, in general:

        • node-replace - Replaces a node in an existing document
        • document-insert - Inserts an entirely new document into the database or replaces the content of an existing document based on whether or not a document with a specified URI already exists.

        Although there is no material difference between node-replace and document-insert, using node-replace for updates is better because it preserves document attributes like permissions, collections, quality and metadata as opposed to document-insert which replaces all the aforementioned attributes along with the content of the document unless these attributes are explicitly found and attached to the insert query.

        Note: Using ‘node-replace’ is the authoritative way of updating documents among all the node-level update functions

        Mass-updates:

        For updating a small set of documents where it is important to preserve all attributes of a document, ‘node-replace’ would be a better choice as it saves the overhead of finding the existing attributes by yourself. On the other hand, if query performance holds a higher priority over preserving the existing attributes of a document, ‘document-insert’ would likely be a better choice as it is faster when used without querying for the attributes. There is, however, no significant difference between the two if used in a similar fashion.

        With the release of MarkLogic Server versions 8.0-8 and 9.0-4, detailing memory use broken out by major areas is periodically recorded to the error log. These diagnostic messages can be useful for quickly identifying memory resource consumption at a glance and aid in determining where to investigate memory-related issues.

        Error Log Message and Description of Details

        At one hour intervals, an Info level log message will be written to the server error log in the following format:

        Info: Memory 18% phys=147456 virt=246146(166%) rss=27330(18%) anon=53794(36%) file=250(0%) forest=1021(0%) cache=40960(27%) registry=1(0%)

        The error log entry contains memory-related figures for non-zero statistics: Raw figures are in megabytes; Percentages are relative to the amount of physical memory reported by the operating system. The figures include:

        Memory: Percentage of physical memory consumed by the MarkLogic Server process;
        phys: Size of physical memory in the machine ;
        virt: Size of virtual address space reported by the operating system. This figure is often greater than 100%;
        swap: The amount of swap consumed by the MarkLogic Server process;
        rss: Resident Set Size reported by the operating system;
        anon: Anonymous mapped memory used by the MarkLogic Server;
        file: Total amount of memory-mapped data files used the MarkLogic Server. (The MarkLogic Server executable itself, for example, is memory-mapped by the operating system, but is not included in this figure.) ;
        forest: Forest-related memory allocated by the MarkLogic Server process;
        cache: User configured cache memory (list cache, expanded tree cache, etc) consumed by the MarkLogic Server process;
        registry: Amount of memory consumed by registered queries;
        huge: Huge page memory reserved by the operating system, and percentage comparing this to total physical memory;
        join: Memory consumed by joins for active running queries within the MarkLogic Server process, and percentage comparing this to total physical memory;
        unclosed: Unclosed memory, signifying memory consumed by unclosed or obsolete stands still held by the MarkLogic Server process, and percentage comparing this figure to total physical memory.

        In addition to reporting once an hour, the Info level error log entry is written whenever the amount of main memory used by MarkLogic Server changes by more than five percent from one check to the next. MarkLogic Server will check the raw metering data obtained from the operating system once per minute. If metering is disabled, the check will not occur and no log entries will be made.

        With the release of MarkLogic Server versions 8.0-8 and 9.0-5, this same information will be available in the output from the function xdmp:host-status().

        <host-status xmlns="http://marklogic.com/xdmp/status/host">
        . . .
        <memory-process-size>246162</memory-process-size>
        <memory-process-rss>27412</memory-process-rss>
        <memory-process-anon>54208</memory-process-anon>
        <memory-process-rss-hwm>73706</memory-process-rss-hwm>
        <memory-process-swap-size>0</memory-process-swap-size>
        <memory-system-pagein-rate>0</memory-system-pagein-rate>
        <memory-system-pageout-rate>14.6835</memory-system-pageout-rate>
        <memory-system-swapin-rate>0</memory-system-swapin-rate>
        <memory-system-swapout-rate>0</memory-system-swapout-rate>
        <memory-size>147456</memory-size>
        <memory-file-size>279</memory-file-size>
        <memory-forest-size>1791</memory-forest-size>
        <memory-unclosed-size>0</memory-unclosed-size>
        <memory-cache-size>40960</memory-cache-size>
        <memory-registry-size>1</memory-registry-size>
        . . .
        </host-status>


        Additionally, with the release of MarkLogic Server 8.0-9.3 and 9.0-7, Warning-level log messages may be reported when the host is low on memory — the messages will indicate the areas involved, for example:

        Warning: Memory low: forest+cache=97%phys

        The messages are reported if the total memory used by the mentioned areas is greater than 90% of physical memory (phys). As best practice, the total of the areas should never be more than around 80% of physical memory, and should be even less if you are using the host for query processing.

        If the hosts are regularly encountering these warnings, remedial action to support the memory requirements might include:

        • Adding more physical memory to each of the hosts;
        • Adding additional hosts to the cluster to spread the data across;
        • Adding additional forests to any under-utilized hosts.

        Other action might include:

        • Archiving/dropping any older forest data that is no longer used;
        • Reviewing the group level cache settings to ensure they are not set too high, as they make up the cache part of the total. For reference, default (and recommended) group level cache settings based on common RAM configurations may be found in our Group Level Cache Settings based on RAM Knowledgebase article.

        Summary

        This enhancement to MarkLogic Server allows for easy periodic monitoring of memory consumption over time, and records it in a summary fashion in the same place as other data pertaining to the operation of a running node in a cluster. Since all these figures have at their source raw Meters data, more in-depth investigation should start with the Meters history. However, having this information available at a glance can aid in identifying whether memory-related resources need to be explored when investigating performance, scale, or other like issues during testing or operation.

        Introduction

        The MarkLogic Monitoring History feature allows you to capture and view critical performance data from your cluster. By default, this performance data is stored in the Meters database. This article explains how you can plan for the additional disk space required for the Meters database.

        Meters Database Disk Usage

        Just like any other database, Meters database is also made up of forests which in turn are made up of stands that reside physically on-disk. As Meters database is used by Monitoring History to store critical performance data of your cluster, the amount of information can grow significantly with more number of hosts, forests, databases etc. Thus the need to plan and manage the disk space required by Meters database.

        Recommendation

        Meters database stores critical performance data of your cluster. The size of data is proportional to the number of hosts, app servers, forests, databases etc. Typically, the raw retention settings have the largest impact on size.

        MarkLogic's recommendation for a new install is to start with the default settings and monitor usage over the first two weeks of an install. The performance history charts, constrained to just show the Meters database, will show an increasing storage utilization over the first week, then leveling off for the second week. This would give you a decent idea of space utilization going forward.

        You can then adjust the number of days of raw measurements that are retained.

        You can also add additional forests to spread the Meters database over more hosts if needed.

        Monitoring History

        The Monitoring History feature allows you to capture and view critical performance data from your cluster. Monitoring History capture is enabled at the group level. Once the performance data has been collected, you can view the data in the Monitoring History page.

        By default, the performance data is stored in the Meters database. A consolidated Meters database that captures performance metrics from multiple groups can be configured, if there is more than one group in the cluster.

        Monitoring History Data Retention Policy

        How long the performance data should be kept in the Meters database before it is deleted can be configured with the data retention policy. (http://docs.marklogic.com/guide/monitoring/history#id_80656)

        If it is observed that meters data is not being cleared according to the retention policy, the first place to check would be the range indexes configured for the Meters database.

        Range indexes and the Meters Database

        Meters database is configured with a set of range indexes which, if not configured correctly (or not present) can prevent the cleaning up of Meters database according to the set retention policy.

        It is possible to have missing or misconfigured range indexes in either of the below scenarios

        •  if the cluster was upgraded from a version of ML before 7.0 and the upgrade had some issues
        •  if the indexes were manually created (when using another database for meters data instead of the default Meters database)

        The size of the meters database can grow significantly as the cluster grows, so it is important that the meters database is cleared per the retention policy.

        The required indexes (as of 8.0-5 and 7.0-6) are attached as an ML Configuration Manager package(http://docs.marklogic.com/guide/admin/config_manager#id_38038). Once these are added, the Meters database will reindex and the older data should be deleted.

        Note that deletion of data older than the retention policy occurs no sooner than the retention policy. Data older than the retention policy may still be maintained for an unspecified amount of time.

        Related documentation

        http://docs.marklogic.com/guide/monitoring

        https://help.marklogic.com/Knowledgebase/Article/View/259/0/metering-database-disk-space-requirements

         

         

         

         

         

         

         

         

         

         

         

         

        SUMMARY:

        Prior to MarkLogic 4.1-5, role-ids were randomly generated.  We now use a hash algothm that ensures that roles created with the same name will be assigned the same role-id.  When attempting to migrate data from a forest created prior to MarkLogic 4.1-5 to a newer installation can cause the user to be met with a "role not defined error".  In order to work around this issue, we will need to create a new role with the role-id defined in the legacy system. 

        Procedure:

        This process creates a new role with the same role-id from your legacy installation and assigns this old role to your new role with the correct name.

        Step 1: You will need to find the role-id of the legacy role. This will need to be run against the security DB on the legacy server. 

        <code>

        xquery version "1.0-ml";
        import module namespace sec="http://marklogic.com/xdmp/security" at
        "/MarkLogic/security.xqy";

        let $role-name := "Enter Roll Name Here" 

        return
        /sec:role[./sec:role-name=$role-name]/sec:role-id/text()

        </code>


        Step 2: In the new environment, store the attached module to the following location on the host containing the security DB.

        /opt/MarkLogic/Modules/role-edit/create-master-role.xqy

        Step 3: Ensure that you have created the role on the new cluster.

        Step 4: Run the following code against the new clusters security DB. This will create a new role with the legacy role-id. Be sure to enter the role name, description, and role-id from Step 1.

        <code>
        xquery version "1.0-ml";
        import module namespace cmr="http://role-edit.com/create-master-role" at
        "/role-edit/create-master-role.xqy";

        let $role-name := "ENTER ROLE NAME"
        let $role-description := "ENTER ROLE DESCRIPTION"
        let $legacy-role-id := 11658627418524087702 (: Replace this with the Role ID from Step 1:)

        let $legacy-role := fn:concat($role-name,"-legacy")
        let $legacy-role-create := cmr:create-role-with-id($legacy-role, $role-description, (), (), (), $legacy-role-id)

        return
        fn:concat("Inserted role named ",$legacy-role," with id of ",$legacy-role-id)

        </code>


        Step 5: Run the following code against the new clusters security database to assign the legacy role to the new role.

        <code>
        xquery version "1.0-ml";
        import module namespace sec="http://marklogic.com/xdmp/security" at
        "/MarkLogic/security.xqy";

        let $role-name := "ENTER ROLE NAME"
        let $legacy-role := fn:concat($role-name,"-legacy")

        return
        (
        sec:role-set-roles($role-name, ($legacy-role)),
        "Assigned ",$legacy-role," role to ",$role-name," role"
        )

        </code>

         

        You should now have a new role named [your-role]-legacy.  This legacy role will contain the role-id from your legacy installation and will be assigned to [your-role] on the new installation.  Legacy documents in your DB will now have the same rights they had in the legacy system.

        Introduction

        Those familiar with versions of MarkLogic Server prior to MarkLogic 7 may have heard the 3X disk space rule being mentioned. At the time of writing, references to are to be found in the MarkLogic 5 documentation and the MarkLogic 6 documentation

        The Monitoring Metrics of Interest section in the Monitoring MarkLogic Guide refers to the 3X rule as during a preparatory question on disk allocation for a database:

        • Is there enough disk space for forest data and merges? Merges require at least twice as much free disk space as used by the forest data (3X rule). If a merge runs out of disk space, it will fail.

        For anyone reading the requirements guidelines for MarkLogic 7 (and above), you may have noticed a section that suggests that you should plan to ensure disk space is available to:

        • 1.5 times the disk space of the total forest size. Specifically, each forest on a filesystem requires its filesystem to have at least 1.5 times the forest size in disk space (or, for each forest less than 32GB, 3 times the forest size). This translates to 1.5 times the disk space of the source content after it is loaded.

          For example, if you plan on loading content that will result in a 100 GB database, reserve at least 150GB of disk space. The disk space reserve is required for merges.

        This Knowledgebase article will cover both requirements and offer some further guidance as to how to plan and size your databases and - crucially - how you can take advantage of the newer 1.5X rule.

        3X

        The original logic behind the allocation of 3X disk space was to provide ample space to allow for a situation where a database is fully reindexed. The allocation would be in thirds according to the following measures:

        1. Your Data
        2. Space for reindexing
        3. Space for merges

        The 3X disk provision rule was offered as a very general (and very safe for production) rule to cover the most extreme example where your data gets reindexed in its entirety and then merges have to take place on top of that.

        ... but why 3X?

        To understand this, we need to briefly explore what happens when a document is updated in MarkLogic Server.

        As an update is made to a document - and the same rule applies to an update to a document when index changes are concerned - the transaction takes place at a given timestamp (a given point in time). At that point, the original fragment is marked as deleted and a new fragment is written to an in-memory-stand. Eventually, the in-memory stand is written to disk.

        For a period of time - especially at times where a MarkLogic instance/cluster is busy performing a large number of updates - it's likely that there will be occasions where two versions of the same fragment exist in different stands on disk; one stand will contain the fragment now marked as deleted and the other stand will contain the newly written fragment - which will be used by any subsequent queries running at later timestamps.

        ... so that covers 2X - what about the other third?

        When a merge takes place, merge candidate stands are identified and a new stand is created. As the candidate stands are read through, the active fragments are copied over to the new stand.

        At the point where the merge takes place, the new stand coexists with the older stand because - like updates and reindexing - queries will still need to run against the candidate stands; the timestamp will only get moved on to accommodate the data in the new stand as soon as the process has completed in it's entirety.

        While all of this is taking place, other updates could be taking place to documents in other stands and the same rules apply to those fragments too.

        So the 3X rule provides a true safeguard; allowing for a situation where forest sizes are likely to swell way above and beyond the size of the data they contain, to accommodate the fragments marked deleted for queries at earlier timestamps and to accommodate the additional headroom required by a merge of some very large stands.

        1.5X

        Some changes were made in MarkLogic 7 which effectively reduce the footprint of your data on-disk. With some careful planning, you can take advantage of the lower sizing rule.

        While the documentation still acknowledges the 3X rule (which is still true if you're performing an upgrade directly from MarkLogic 6 or earlier without making any other configuration changes), a new default configuration has been introduced to databases created under MarkLogic 7; this is the merge max size

        What does the merge max size do?

        This setting enforces an upper limit of 32GB on the size of an individual stand.

        With previous versions of the product, the expectation would be for the contents of a forest to merge down to one large stand. That is: given a quiesced database, on full completion of a merge, all content (all active fragments) should be in a single stand.

        For databases on MarkLogic 7 (and later), you can now expect to see more stands - each with a maximum size of 32GB.

        This means you should expect to see your data in more stands than you would have done on prior versions of the product, but it also means that you can lower the amount of disk space you need due to this size restriction.

        From MarkLogic 7 and onwards - with the merge max size correctly set - the largest amount of space a single merge operation should require would be 64GB

        ... but why 1.5X?

        If we return to this line in the documentation:

        • For example, if you plan on loading content that will result in a 100 GB database, reserve at least 150GB of disk space. The disk space reserve is required for merges.

        Given that we now have an upper limit on the size of a stand (32GB), as two smaller stands are being merged to create the new, larger stand and given the space required by other concurrent operations that may be taking place in other stands, a space limit of 1.5X should now cover any merges (and subsequent updates to documents).

        For further understanding or the 1.5X rule, read our knowledgebase article 'Explanation of the 1.5X Disk Space Requirement' .

        How do I find out whether my database is configured for this new merge max size?

        If you're on the admin interface at http://[yourhostname]:8001

        Go to: Configure > Databases > [Your Database Name] > Merge Policy

        On the right-hand panel, you should see the merge max size; the default should now be 32768

        Important caveats

        MarkLogic 7 is designed to allow you to work with more stands. While it's safe to say that you should be concerned when you see a system with a very large number of small stands exists, a slightly different rule requires a shift in thinking and this has implications in particular when you start to think about applying the 1.5x disk space rule in your environment.

        In releases prior to MarkLogic 6, the expectation (over time) was that all data in a forest would ultimately attempt to get merged into a single stand.

        In MarkLogic 7, at least with the default setting of the merge-max-size (to 32768 - 32GB), it is understood that a reasonably large forest would now be divided into a number of 32GB stands.

        If you are strictly following this rule for all reasonably large forests on your system - then the 1.5x rule can safely be used operationally in a production environment, but reliance on the rule should require careful management when migrating an existing system as running out of disk space can have catastrophic consequences for a live system.

        For very small forests, the 1.5X rule does not apply.  Due to the 32GB stand size overhead, your forests need to be sufficiently larger in order to use the 1.5X rule. 

        You should treat the 1.5x rule as an absolute minimum requirement for disk space for a given database. If you are going to use it, we would recommend having a strategy in place for allocating more space until you are confident that the cluster can run safely within the lower (1.5x) boundaries.

        I'm upgrading from an earlier version of MarkLogic to MarkLogic 7 - I have changed the merge max size to 32768. Can I reclaim the disk space?

        It's important to note that the 1.5x guidelines will only work if your forests all contain stands that have the new maximum size of 32GB. If your forests still contain larger stands, you'll need to break these down before you can consider reclaiming disk space. 

        ... Breaking Large Stands Down

        If your forests contain stands larger than 32 GB, you will want to break these stands down in order to take advantage of the lower disk space requirements.

        Different techniques can be followed to break the stands and reclaim disk space:

        1. Re-ingesting the content of the forests with large stands - When documents are re-ingested in a forest, the old fragments will be marked as deleted and the new fragment will be written to a new stand. Once there are sufficient deleted fragments, the large stands will be merged down into smaller stands.
        2. Perform re-indexing – A Forced re-index will update every fragment in the database, effectively re-loading the content - the original fragments will be marked as deleted and the new fragments will be written to a new stand. Once there are sufficient deleted fragments, the large stands will be merged down into smaller stands.  
        3. Forest rebalancing  - Rebalance active fragments from existing forests and retire old forest with Max Merge Size configured, this will merge out deleted fragments in old stand and maintain active fragments in smaller stand/stands in other rebalanced forests.

        Conclusion

        The major points for the 1.5X rule:

        • The estimated 1.5X disk space utilization is only true for databases where merge-max-size is correctly set and for forests that are sufficiently large. For databases created in MarkLogic Server v7 or later, the default merge-max-size is to 32768 (32GB)
        • If you're upgrading from earlier releases, you would need to make sure you set this value as part of your upgrade process.
          • After upgrading from a version previous to MarkLogic 7, you will have to take explicit steps to decrease the size of any pre-existing large stands. 

         

        Summary

        New and updated mimetypes were added for MarkLogic 8.  If your MarkLogic Server instance has customized mimetypes, the upgrade to MarkLogic Server v8.0-1 will not update the mimetypes table. 

        Details

        MarkLogic 8 includes the following new mimetype values:

        Name    Extension Format
        application/json json json
        application/rdf+json rj json
        application/sparql-results+json srj json
        application/xml xml xsd xvs sch    xml
        text/json   json
        text/xml   xml
        application/vnd.marklogic-javascript     sjs text
        application/vnd.marklogic-ruleset rules text

        If you upgraded to 8.0 from a previous version of MarkLogic Server and if you have ever customized your mimetypes (for example, using the MIME Types Configuration page of the Admin Interface), the upgrade will not automatically add the new mimetypes to your configuration. If you have not added any mimetypes, then the new mimetypes will be automatically added during the upgrade. You can check if you have these mimetypes configured by going to the Mimetype page of the Admin Interface and checking if the above mimetypes exist. If they exist, then there is nothing you need to do.

        Effect

        Not having these mimetypes may lead to application level failures - for example: running Javascript code via Query Console will fail. 

        Resolving Manually

        If you do not have the above mimetypes after upgrading to 8.0, you can manually add the mimetypes to your configuration using the Admin Interface. To manually add the configuration, perform the following

        1. Open the Admin Interface in a browser (for example, open http://localhost:8001).
        2. Navigate to the Mimetypes page, near the bottom of the tree menu.
        3. Click the Create tab.
        4. Enter the name,the extension, and the format for the mimetype (see the table above).
        5. Click OK.
        6. Repeat the preceding steps for each mimetype in the above table.

        Please be aware that updating the mimetype table results in a MarkLogic Server restart.  You will want to execute this procedure when MarkLogic Server is idle or during a maintenance window.

        Resolve by Script

        Alternatively, if you do not have the above mimetypes after upgrading to 8.0, you can add the mimetypes to your configuration by executing the following script in Query Console:

        xquery version "1.0-ml";

        import module namespace admin = "http://marklogic.com/xdmp/admin" at "/MarkLogic/admin.xqy";
        declare namespace mt = "http://marklogic.com/xdmp/mimetypes";

        let $config := admin:get-configuration()
        let $all-mimetypes := admin:mimetypes-get($config) (: existing mimetypes defined :)
        let $new-mimetypes := (admin:mimetype("application/json""json""json"),
            admin:mimetype("application/rdf+json""rj""json"),
            admin:mimetype("application/sparql-results+json""srj""json"),
            admin:mimetype("application/xml""xml xsd xvs sch""xml"),
            admin:mimetype("text/json""""json"),
            admin:mimetype("text/xml""""xml"),
            admin:mimetype("application/vnd.marklogic-javascript", "sjs", "text"),
            admin:mimetype("application/vnd.marklogic-ruleset", "rules", "text"))
        (: remove intersection to avoid conflicts :)
        let $delete-mimetypes :=
            for $mimetype in $all-mimetypes
            return if ($mimetype//mt:name/data() = $new-mimetypes//mt:name/data()) then $mimetype else ()
        let $config := admin:mimetypes-delete($config, $delete-mimetypes)
        (: save new mimetype definitions :)
        return admin:save-configuration( admin:mimetypes-add( $config, $new-mimetypes))
        (: executing this query will result in a restart of MarkLogic Server :)

        Please be aware that updating the mimetype table results in a MarkLogic Server restart.    You will want to execute this script when MarkLogic Server is idle or during a maintenance window.

        Fixes

        At the time of this writting, it is expected that the upgrade scripts will be improved in a maintenance release of MarkLogic Server where these updates will occur automatically.

        Introduction

        In this article, we discuss use of xdmp:cache-status in monitoring cache status, and explain the values returned.

        Details

        Note that this is a relatively expensive operation, so it’s not something to run every minute, but it may be valuable to run it occasionally for information on current cache usage.

        Output format

        The values returned by xdmp:cache-status are per host, defaulting to the current host. It takes an optional host-id to allow you to gather values from a specific host in the cluster.

        The output of xdmp:cache-status will look something like this:

        <cache-status xmlns="http://marklogic.com/xdmp/status/cache">
          <host-id>18349804367231394552</host-id>
          <host-name>macpro-2113.local</host-name>
          <compressed-tree-cache-partitions>
            <compressed-tree-cache-partition>
              <partition-size>512</partition-size>
              <partition-table>0.2</partition-table>
              <partition-used>0.8</partition-used>
              <partition-free>99.2</partition-free>
              <partition-overhead>0</partition-overhead>
            </compressed-tree-cache-partition>
          </compressed-tree-cache-partitions>
          <expanded-tree-cache-partitions>
            <expanded-tree-cache-partition>
              <partition-size>1024</partition-size>
              <partition-table>0.7</partition-table>
              <partition-busy>0</partition-busy>
              <partition-used>30.4</partition-used>
              <partition-free>69.6</partition-free>
              <partition-overhead>0</partition-overhead>
            </expanded-tree-cache-partition>
          </expanded-tree-cache-partitions>
          <list-cache-partitions>
            <list-cache-partition>
              <partition-size>1024</partition-size>
              <partition-table>0.2</partition-table>
              <partition-busy>0</partition-busy>
              <partition-used>0</partition-used>
              <partition-free>100</partition-free>
              <partition-overhead>0</partition-overhead>
            </list-cache-partition>
          </list-cache-partitions>
          <triple-cache-partitions>
            <triple-cache-partition>
              <partition-size>1024</partition-size>
              <partition-busy>0</partition-busy>
              <partition-used>0</partition-used>
              <partition-free>100</partition-free>
            </triple-cache-partition>
          </triple-cache-partitions>
          <triple-value-cache-partitions>
            <triple-value-cache-partition>
              <partition-size>512</partition-size>
              <partition-busy>0</partition-busy>
              <partition-used>0</partition-used>
              <partition-free>100</partition-free>
            </triple-value-cache-partition>
          </triple-value-cache-partitions>
        </cache-status>
        

        Values

        cache-status contains information for each partition of the caches:

        • The list cache holds search term lists in memory and helps optimize XPath expressions and text searches.
        • The compressed tree cache holds compressed XML tree data in memory. The data is cached in memory in the same compressed format that is stored on disk.
        • The expanded tree cache holds the uncompressed XML data in memory (in its expanded format).
        • The triple cache hold triple data.
        • The triple value cache holds triple values.

        The following are descriptions of the values returned:

        • partition-size: The size of a cache partition, in MB.
        • partition-table: The percentage of the table for a cache partition that is currently used. The table is a data structure that has a fixed overhead per cache entry, for cache admin. This will fix the number of entries that can be resident in the cache. If the partition table is full, something will need to be removed before another entry can be added to the cache.
        • partition-busy: The percentage of the space in a cache partition that is currently used and cannot be freed.
        • partition-used: The percentage of the space in a cache partition that is currently used.
        • partition-free: The percentage of the space in a cache partition that is currently free.
        • partition-overhead: The percentage of the space in a cache partition that is currently overhead.

        When do I get errors?

        You will get a cache-full error when nothing can be removed from the cache to make room for a new entry.

        The "partition-busy" value is the most useful indicator of getting a cache-full error. It tells you what percent of the cache partition is locked down and cannot be freed to make room for a new entry. 

         

        MarkLogic recommends that the Security database only have 1 primary forest.  Having more than one primary forest for the Security database can cause failover issues when doing upgrades and restarts.  The Security database should have a single primary forest, and one replica forest to support High Availability.

        More details available in the knowledge base article How many forests should my Security database have?

        Refer to our documentation for Configuring the Security and Auxiliary Databases to Use Failover Forests

        Summary

        When restarting very large forests, some customers have noted that it may take a while for them to mount. While the forests are mounting, the database is unable to come online, thus impacting the availability of your main site. This article shows you how to change a few database settings to improve forest-mounting time.

         


         

        When encountering delays with forest mounting time after restarts, we usually recommend the following settings:

        format-compatibility set to the latest format
        expunge-locks set to none
        index-detection set to none

        Additionally, some customers might be able to spread out the work of memory mapping forest indexes by setting preload-mapped-data to false - though it should be noted that instead of the necessary time being taken during the mounting of the forest, memory-mapped file data will be loaded on demand through page faults as the server accesses it.

        While the above settings should help with forest mounting time, in general, their effects can be situationally dependent. You can read more about each of these settings in our documentation here: http://docs.marklogic.com/admin-help/database. In particular:


        1) Regarding format compatability: "The automatic detection occurs during database startup and after any database configuration changes, and can take some time and system resources for very large forests and for very large clusters. The default value of automatic is recommended for most installations." So to your question, while automatic is recommended in most cases, you should try changing the setting if you're seeing long forest mount times.

        2) Regarding expunge-locks: "Setting this to none is only recommended to speed cluster startup time for extremely large clusters. The default setting of automatic, which cleans up the locks as they expire, is recommended for most installations."

        3) Regarding index-detection: "This detection occurs during database startup and after any database configuration changes, and can take some time and system resources for very large forests and for very large clusters. Setting this to none also causes queries to use the current database index settings, even if some settings have not completed reindexing. The default value of automatic is recommended for most installations"

        It may also be worth considering why forests are taking a long time to mount. If your data size has grown significantly over the lifetime of the affected database, it might be the case that your forests are now overly large, in which case a better approach might be to instead distribute the data across more forests.

        Introduction
         
        MarkLogic Server's 'DatabaseClient' instance represents a database connection sharable across threads. The connection is stateless, except that authentication is done the first time a client interacts with the database via a Document Manager, Query Manager, or other manager. For instance: you may instantiate a DatabaseClient as follows:
         
        // Create the database client

        DatabaseClient client = DatabaseClientFactory.newClient(host, port,
                                                  user, password, authType);

        And release it as follows:
        // release the client
        client.release();

        Details on DatabaseClient Usage

        To use the Java Client API efficiently, it helps to know a little bit about what goes on behind the scenes.

        You specify the enode or load balancer host when you create a database client object.  Internally, the database client object instantiates an Apache HttpClient object to communicate with the host.

        The internal Apache HttpClient object creates a connection pool for the host.  The connection pool makes it possible to reuse a single persistent HTTP connection for many requests, typically improving performance.

        Setting up the connection pool has a cost, however.

        As a result, we strongly recommend that applications create one database client for each unique combination of host, database, and user.  Applications should share the database client across threads.  In addition, applications should keep a reference to the database client for the entire life of the application interaction with that host.


        For instance, a servlet might create the database client during initialization and release the database client during destruction. The same servlet may also use two separate database client instances with different permissions, one for read-only users and one with read/write permissions for editors. In the latter case, both client instances are used throughout the life of the servlet and destroyed during client destruction.

        Summary

        Clock synchronization plays a critical part in the operation of a MarkLogic Cluster.

        MarkLogic Server expects the system clocks to be synchronized across all the nodes in a cluster, as well as between Primary and Replica clusters. The acceptable level of clock skew (or drift) between hosts is less than 0.5 seconds, and values greater than 30 seconds will trigger XDMP-CLOCKSKEW errors, and could impact cluster availability.

        Tools

        Network Time Protocol (NTP) is the recommended solution for maintaining system clock synchronization.  NTP services can be provided by public (internet) servers, private servers, network devices, peer servers and more.

        NTP Basics

        NTP uses a daemon process (ntpd) that runs on the host.  The ntpd periodically wakes up, and polls the configured NTP servers to get the current time, and then adjust the local system clock as necessary.  Time can be adjusted two ways, by immediately changing to the correct time, or by slowly speeding up or slowing down the system clock as necessary until it has reached the correct time. The frequency that the ntpd wakes up, called the polling interval, can be adjusted based on the level of accuracy needed anywhere between 1 and 17 minutes.  NTP uses a hierarchy of servers called a strata.  Each strata synchronizes with the layer above it, and provides synchronization to the later below it.

        Public NTP Reference Servers

        There are many public NTP reference servers available for time synchronization.  It's important to note that the most common public NTP reference server addresses are for a pool of servers, so hosts synchronizing against them may end up using different physical servers.  Additionally, the level of polling recommended for cluster synchronization is usually higher, and excessive polling could result in the reference server throttling or blocking traffic from your systems.

        Stand Alone Cluster

        For a cluster that is not replicated or connected to another cluster in some way, the primary concern is that all the hosts in the cluster be in sync with each other, rather than being accurate to UTC.

        Primary/Replica Clusters

        Clusters that act as either Primary or Replicas need to be synchronized with each other for replication to work correctly.  This usually means that the hosts in both clusters should reference the same NTP servers.

        NTP Configuration

        Time Synchronization Configuration Files

        It is common to have multiple servers referenced in the chronyd configuration file, /etc/chrony.conf or the ntpd configuration file, /etc/ntpd.conf. NTP may not choose the server based on the order in the file.  Because of this, hosts could synchronize with different reference servers, introducing differences in the system clocks between the hosts in the cluster. Most organizations may have devices that can act as NTP servers in their infrastructure already, as many network devices are capable of acting as NTP servers, as are Windows Primary Domain Controllers.  These devices can use default polling intervals, which avoids excessive polling against public servers.

        Once you have identified your NTP server, you can configure the NTP daemon on the cluster hosts. We suggest using a single reference server for all the cluster hosts, then add all the hosts in the cluster as peers of the current node.  We also suggest adding an entry for the local host as it's own server, assigning it a low strata. Using peers allows the cluster hosts to negotiate and elect a host to act as the reference server, providing redundancy in case the reference server is unavailable.

        Common Configuration Options

        The burst option sends a burst of 8 packets when polling to increase the average quality of time offset statistics.  Using it against a public NTP server is considered abuse.

        The iburst sends a burst of 8 packets at initial synchronization which is designed to speed up the initial synchronization at startup.  Using it against a public NTP server is considered aggressive.

        The minpoll and maxpoll settings are measured in seconds to the power of two, so a setting of 4 is 16 seconds, so setting minpoll and maxpoll to 4 will cause the host to check time approximately every minute.

        Time Synchronization with chronyd

        The following is a sample chrony.conf file:

        # Primary NTP Source

        server *.*.*.200 burst iburst minpoll 4 maxpoll 4

        # Allow peering as a backup to the primary time servers

        peer mlHost01 burst iburst minpoll 4 maxpoll 4
        peer mlHost02 burst iburst minpoll 4 maxpoll 4
        peer mlHost03 burst iburst minpoll 4 maxpoll 4

        # Serve time even if not synchronized to a time source (for peering)
        local stratum 10

        # Allow other hosts on subnet to get time from this host (for peering)
        # Can also be specified by individual IP
        # https://chrony.tuxfamily.org/manual.html#allow-directive
        allow *.*.*.0

        # By default chrony will not step the clock after the initial few time checks.
        # Changing the makestep option allows the clock to be stepped if its offset is larger than .5 seconds.
        makestep 0.5 -1

        The other settings (driftfile, rtsync, log) can be left as is, and the new settings will take effect after the chronyd service is restarted.

        Time Synchronization with ntpd

        The following is a sample ntpd.conf file:

        #The current host has an ip of 10.10.0.1
        server ntpserver burst iburst minpoll 4 maxpoll 4
         
        #All of the cluster hosts are peered with each other.
        peer mlHost01 burst iburst minpoll 4 maxpoll 4
        peer mlHost02 burst iburst minpoll 4 maxpoll 4
        peer mlHost03 burst iburst minpoll 4 maxpoll 4
         
        #Add the local host so the peered servers can negotiate
        # and choose a host to act as the reference server
        server 10.10.0.1
        fudge 10.10.0.1 stratum 10

        The fudge setting is used to alter the stratum of the server from the default of 0.

        Choosing Between NTP Daemons

        Red Hat states that chrony is the preferred NTP daemon, and should be used when possible.

        Chrony should be preferred for all systems except for the systems that are managed or monitored by tools that do not support chrony, or the systems that have a hardware reference clock which cannot be used with chrony.

        As always, system configuration changes should always be tested and validated prior to putting them into production use.

        References

        Summary

        On March 1, 2016, a vulnerability in OpenSSL named DROWN, a man-in-the-middle attack that stands for “Decrypting RSA with Obsolete and Weakened eNcryption", was announced. All MarkLogic Server versions 5.0 and later are *not* affected by this vulnerability.

        Advisory

        The Advisory reported by OpenSSL.org states

        CVE-2016-0800 (OpenSSL advisory)  [High severity] 1st March 2016: 

        A cross-protocol attack was discovered that could lead to decryption of TLS sessions by using a server supporting SSLv2 and EXPORT cipher suites as a Bleichenbacher RSA padding oracle. Note that traffic between clients and non-vulnerable servers can be decrypted provided another server supporting SSLv2 and EXPORT ciphers (even with a different protocol such as SMTP, IMAP or POP) shares the RSA keys of the non-vulnerable server. This vulnerability is known as DROWN (CVE-2016-0800). Recovering one session key requires the attacker to perform approximately 2^50 computation, as well as thousands of connections to the affected server. A more efficient variant of the DROWN attack exists against unpatched OpenSSL servers using versions that predate 1.0.2a, 1.0.1m, 1.0.0r and 0.9.8zf released on 19/Mar/2015 (see CVE-2016-0703 below). Users can avoid this issue by disabling the SSLv2 protocol in all their SSL/TLS servers, if they've not done so already. Disabling all SSLv2 ciphers is also sufficient, provided the patches for CVE-2015-3197 (fixed in OpenSSL 1.0.1r and 1.0.2f) have been deployed. Servers that have not disabled the SSLv2 protocol, and are not patched for CVE-2015-3197 are vulnerable to DROWN even if all SSLv2 ciphers are nominally disabled, because malicious clients can force the use of SSLv2 with EXPORT ciphers. OpenSSL 1.0.2g and 1.0.1s deploy the following mitigation against DROWN: SSLv2 is now by default disabled at build-time. Builds that are not configured with "enable-ssl2" will not support SSLv2. Even if "enable-ssl2" is used, users who want to negotiate SSLv2 via the version-flexible SSLv23_method() will need to explicitly call either of: SSL_CTX_clear_options(ctx, SSL_OP_NO_SSLv2); or SSL_clear_options(ssl, SSL_OP_NO_SSLv2); as appropriate. Even if either of those is used, or the application explicitly uses the version-specific SSLv2_method() or its client or server variants, SSLv2 ciphers vulnerable to exhaustive search key recovery have been removed. Specifically, the SSLv2 40-bit EXPORT ciphers, and SSLv2 56-bit DES are no longer available. In addition, weak ciphers in SSLv3 and up are now disabled in default builds of OpenSSL. Builds that are not configured with "enable-weak-ssl-ciphers" will not provide any "EXPORT" or "LOW" strength ciphers. Reported by Nimrod Aviram and Sebastian Schinzel.

        Fixed in OpenSSL 1.0.1s (Affected 1.0.1r, 1.0.1q, 1.0.1p, 1.0.1o, 1.0.1n, 1.0.1m, 1.0.1l, 1.0.1k, 1.0.1j, 1.0.1i, 1.0.1h, 1.0.1g, 1.0.1f, 1.0.1e, 1.0.1d, 1.0.1c, 1.0.1b, 1.0.1a, 1.0.1)

        Fixed in OpenSSL 1.0.2g (Affected 1.0.2f, 1.0.2e, 1.0.2d, 1.0.2c, 1.0.2b, 1.0.2a, 1.0.2)

        MarkLogic Server Details

        Marklogic Server disallows SSLv2 and disallows weak ciphers in all supported version.  As a result, MarkLogic Server is not affected by this vulverability.

        Whenever MarkLogic releases a new version of MarkLogic Server, OpenSSL versions are reviewed and updated. 

         

        Note: The Ops Director feature has been deprecated with MarkLogic 10.0-5.

        Introduction

        Ops Director enables you to monitor MarkLogic clusters ranging from a single node to large multi-node deployments. A single Ops Director server can monitor multiple clusters. Ops Director provides a unified browser-based interface for easy access and navigation.

        Ops Director presents a consolidated view of your MarkLogic infrastructure, to streamline monitoring and troubleshooting of clusters with alerting, performance, and log data. Ops Director provides enterprise-grade security of your cluster configuration and performance data with robust role-based access control and information security powered by MarkLogic Server.

        Problems installing Ops Director 2.0.0, 2.0.1 & 2.0.1-1

        Check gradle.properties

        To successfully install Ops Director, the value for mlhost in gradle.properties must have a hostname and that hostname must match the name of one of the hosts in the cluster.  You can not use localhost to install Ops Director, nor can you use a host name other than one that is listed as a host in the cluster as this effects the use of certificates for authentication to the OpsDirectorSystem application server.

        Check for App-Services

        Ops Director can sometimes encounter errors when attempting to install in groups other than Default. To successfully install, the Ops Director installer needs to be able to connect to the App-Services application server on port 8000 in the group where Ops Director is being installed.  There are two ways to work around this issue:

        • Create a copy of the App-Services app server in the new group, then install Ops Director
          • Be aware this allows QConsole access in the new group, for users with appropriate privileges. 
          • If you wish to prevent QConsole access in that group, the App-Services application server should be deleted after Ops Director has been installed.
        • Install Ops Director in the Default group, then move the host to the new group, and create the OpsDirector app servers in the new group.
          • Be aware this allows Ops Director access to remain in the Default group.
          • If you wish to prevent Ops Director access in the Default, the Ops Director application servers should be deleted from the Default group.
            • To do this you must also copy the scheduled tasks associated with Ops Director over to the new group, and delete the scheduled tasks from the old group

        See the attached Workspace OpsDirCopyAppServers.xml which has scripts to do the following:

        • Copy and/or remove the App-Services app server
        • Copy and/or remove the OpsDirectorSystem/OpsDirectorApplication/SecureManage app servers
        • Copy and/or remove the scheduled tasks associated with the Ops Director application.

        Also note that Ops Director will install forests on all hosts in the cluster, regardless of group assignments.

        Managing a Cluster

        Check DNS Settings

        When setting up a managed host, it's important to note that the hosts in both the Ops Director cluster, and the cluster being managed must be able to resolve hostnames via DNS.  Modifying the /etc/hosts file is not sufficient.

        Check Ops Director Scheduled Tasks

        When setting up a managed host, you may encounter a XDMP-DEADLOCK error, or have an issue seeing the data for a managed cluster.  If this occurs do the following:

        • Un-manage the affected cluster.  If there are any issues un-managing the cluster, use the procedures in this KB under the Problems with Un-managing Clusters to un-manage the cluster
        • Disable the scheduled tasks associated with Ops Director
          • /common/tasks/info.xqy
          • /common/tasks/running.xqy
          • /common/tasks/expire.xqy
          • /common/tasks/health.xqy
        • Manage the cluster again
        • Enable the scheduled tasks that were disabled

        Verify Necessary Ports are Open

        Assuming the default installation ports are in use, verify the following access:

        • 8003 Inbound TCP on the Managed Cluster, accessed by the Ops Director Cluster.
        • 8008 Inbound TCP on the Ops Director Cluster, accessed by the Ops Director Users.
        • 8009 Inbound TCP on the Ops Director Cluster, accessed by the Managed Cluster

        Upgrading Ops Director

        When upgrading to a new version of Ops Director, it may necessary to uninstall the previous version.  To do that, you must un-manage any clusters being managed by Ops Director, prior to uninstalling the application.

        Un-managing Clusters

        The first step in uninstalling Ops Director is to remove any clusters from being managed from Ops Director.  This is done via the Admin UI on a host in the managed cluster, as detailed in the Ops Director Guide: Disconnecting a Managed Cluster from Ops Director

        Uninstalling Ops Director 2.0.0 & 2.0.1

        These versions of Ops Director use the ml-gradle plugin for deployment.  To uninstall these versions, you will also use gradle, as detailed in the Ops Director Guide: Removing Ops Director 2.0.0 and 2.0.1

        Uninstalling Ops Director 1.1 or Earlier

        If you are using the 1.1  version that was installed via the Admin UI, then it can be uninstalled via the Admin UI as detailed in the Ops Director Guide: Removing Ops Director 1.1 or Earlier

        Problems with Uninstalling Ops Director

        Occasionally an Ops Director installation may partially fail, due to misconfiguration, or missing dependencies.  Issues can also occur that prevent the standard removal methods from working correctly.  In these cases, Ops Director can be removed manually using the attached QConsole Workspace, OpsDirRemove.xml.  The instructions for running the scripts are contained in the first tab of the workspace.

        Problems with Un-managing Clusters

        Occasionally, disconnecting a managed cluster from Ops Director may partially fail.  If this occurs, you can use the attached QConsole Workspace, OpsDirUnmanage.xml.  The instructions for running the scripts are contained in the first tab of the workspace.

        Further Reading

        Installing, Uninstalling, and Configuring Ops Director

        Monitoring MarkLogic with Ops Director

        Introduction

        MarkLogic offers many different ways to access your data. The best interface to use is ultimately determined by your use case. The table below is taken from MarkLogic University's on demand training course "Using the Optic API." That course runs only 12 minutes, and the table below appears at the 9:50 mark.

        When to use which API

        OPTIC SEARCH SQL SPARQL
        Data Shape Multi-model Documents Relational lens Semantic data
        Output Rows, Documents, Any structure Documents, parts of documents (snippet, highlight) Rows, values Solutions
        Strengths Combines aspects of each query mechanism Discovery, relevance, fuzzy text, matching/stemming Joins, aggregates, summarizing large data sets, exact matches Relating entities, linking facts, inferring relationships
        Sample queries Review liability terms for every Race with >1000 Runners that offered a "cash prize" What is the best holiday race for me to enter (nearest, highest rated, most relevant)? Which Races had the most evenly balanced Runners based on gender? Find Runners who ran Races in Europe

        Summary

        Performance of MarkLogic Server query evaluation can be impacted by user and roles the user inherits running the query.

        Impact of Number of Roles inherited by User on Query evaluation.

        When application users are assigned necessary application roles, security evaluation for each user comes into play. By design, query performance is inversely proportional to the number of roles inherited by the user executing the query. Meaning, each new Role user inherits, Query run by that user will take little longer to evaluate Security schema.

        Question: How does number of Roles inherited by user increase query evaluation time?

        For each role that a user has, MarkLogic Server adds an index term to every query the user executes.

        For example, if a user inherits ten roles, MarkLogic Server adds ten terms to every query the user executes; One hundred roles adds one hundred terms to every query; One thousand roles adds one thousand terms to every query that specific user runs.

        If your testing shows that the performance of queries with hundreds of terms is acceptable, then having a user inherit hundreds of roles may also be acceptable. However, if a query with hundreds of terms is too slow, then a user inheriting hundred of Roles will also be too slow.

        Question: Does a large number of new roles for different users, but not all roles inherited by single user, have impact on query performance ?

        You can have thousands of roles defined and not have your query performance affected by the security evaluation overhead, as long as those roles are not inherited by same user. It is only when those roles are all inherited by a single user, do they increase the security evaluation overhead for queries run by that particular user.

        Query performance is not correlated with the total number of roles, but there is performance degradation with the number of roles per user. MarkLogic can easily handle tens of thousands of total roles, but cannot easily handle more than tens of roles per user.

        Recommendation:

        It is unlikely that thousands of roles inherited by user will give acceptable performance to query run by that specific user. Unless absolute necessary and role evaluation performance overhead considered, we recommend against using thousands of roles for user.

        Further Reading

        Summary

        This article briefly looks at the performance implications of ad hoc queries versus passing external variables to a query in a module

        Details

        Programatically, you can achieve similar results by dynamically generating ad hoc queries on the client as you can by definining your queries in modules and passing in external variable values as necessary.

        Dynamically generating ad hoc queries on the client side results in each of your queries being compiled and linked with library modules before they can be evaluated - for every query you submit. In contrast, queries in modules only experience that performance overhead the first time they're invoked.

        While it's possible to submit queries to MarkLogic Server in any number of ways, in terms of performance, it's far better to define your queries in modules, passing in external variable values as necessary.

        Summary

        MarkLogic does not enforce a programmatic upper limit on How many indexes you *can* have. This leaves open the question of how many range indexes should be used in your application. The answer is that you should have as many as the application requires, but with the caveat that there are some infrastructure limits that should be taken into account. For instance:

        1. More Memory Mapped file Handles (file fd)

        OS has limits of how many file handles a given process can have at a given point in time. This limit, therefore, affects how many range index files, and therefore range indexes a given MarkLogic process can have; However, One could configure higher File Handle limits on most platforms (ulimit, vm.max_map_count).

        2. More RAM requirement 

        In-memory footprint of node involves In-memory structures like in-memory-list-cache, in-memory-tree-cache, in-memory-range index, in-memory-reverse-index (if-reverse-query-enabled) , in-memory-triple-index (if-triple-positions-enabled); multiply those with total number of forests + buffer.

        A Large number of Range indexes can result in a huge index expansion in memory use. Also, values mentioned above are in addition to memory that would be required for MarkLogic Server to maintain its HTTP servers, perform merges, reindex, re-balance, as well as operations like processing queries, etc.

        Tip: Memory consumption can be reduced by configuring a database to optimize range indexes for minimum memory usage (memory-size); Default is configured for maximum performance (facet-time). 

        UI : Admin UI > Databases > {database-name} > Configure > range index optimize [facet-time or memory-size]

        API : admin:database-set-range-index-optimize 

        3. Longer Merge Times (Bigger stands due to Large index expansion)

        Large number of Range Index ends up expanding data in forests. Now for a given host size and number of hosts- larger stand sizes in forest will make range index query faster; However it will also make merge times slower. If we want to make Queries and merges all fast with a large number of range indexes, we will need to scale out the number of physical hosts. 

        4. More CPU, Disk & IO requirement 

        Merges are IO intensive processes; this, combined with frequent updates/load could result in CPU as well as IO bottlenecks.

        5. Longer Forest Mount times

        In general, Each configured range index with data takes two memory mapped files per stand.

        A typical busy host has on the order of 10 forests, each forest with on the order of 10 stands; So a typical busy host has on the order of 100 stands.

        Now for 100 stands -

        • With 100 range indexes, we have in the order of 10,000 files to open and map when the server starts up.
        • While for 1,000 range indexes, we have in the order of 100,000 files to open and map when the server starts up.
        • While for 10,000 range indexes, we have in the order of 1,000,000 mapped files to open and map when the server starts up.

        As we increase our range indexes, at some point of time, Server will take unreasonably long time to start up (unless we throw equivalent processing power).

        The amount of time one is willing to wait for the server to start up is not a hard limit, but the question should be "what is 'reasonable' behavior for Server start-up in eyes of Server Admin based on current hardware."

        Conclusion

        Range Indexes in magnitude of a thousand starts affecting Performance if not managed properly and if above consideration are not accounted for; In most scenarios the solution to the problem is not about "How many indexes can we configure", but rather about "How many indexes do we need".

        MarkLogic considers configured range index in the order of 100 as a “reasonable” limit, because it results in “reasonable” behaviors of the Server.

        Tips for Best Performance for Solutions with lots of Range Indexes

        Before launching your application, review the number of Range Indexes and work to 1) Remove ones that are not being used, and 2) Consolidate any range indexes that are mutually redundant. This will help you get under the prescribed 100 range index limit.

        On systems that already have a large number of range indexes (say 100+), merging multiple stands may become a performance issue. Thus, you will need to think about easing the query and merge load, here are some strategies for easing the load on your system: 

        1. Increase merge-max-size from 32768 to 49152 on your database. This will create larger stands and will lower the number of merges that need to be performed.
        2. There is configuration setting "preload mapped data" (default false), by leaving it as false, it will speed up merging of forest stands. Bear in mind that this will come at the cost of slower query performance immediately after forest mounts.
        3. If your system begins to slow down due to merging activity, you can spread the load by adding more hosts & forests to your cluster. The smaller forests and stands will merge and load faster when there are more CPU cores and IO bandwidth to service them.

        Further Reading

        Performance implications of updating Module and Schema databases

        This article briefly looks at the performance implications of adding or modifying modules or schemas to live (production) databases.

        Details

        When XQuery modules or schemas are referenced for the first time after upload, they are parsed and then cached in memory so that subsequent access is faster.

        When a module is added or updated, the modules cache is invalidated and every module (for all Modules databases within the cluster) will need to be parsed again before they can be evaluated by MarkLogic Server.

        Special consideration should be made when updating modules or schemas in a production environment as reparsing can impact the performance of MarkLogic server for the duration that the cache is being rebuilt.

        MarkLogic was designed with the assumption that modules and schemas are rarely updated. As such, the recommendation is that updates to modules or schemas in production environments is made during periods of low activity or out of hours.

        Further reading

        Overview

        Performance issues in MarkLogic Server typically involve either 1) unnecessary waiting on locks or 2) overlarge workloads. The goal of this knowledgebase article is to give a high level overview of both of these classes of performance issue, as well as some guidelines in terms of what they look like - and what you should do about them.

        Waiting on Locks

        We often see customer applications waiting on unnecessary read or write locks. 

        What does waiting on read or write locks look like? You can see read or write lock activity in our Monitoring History dashboard at port 8002 in the Lock Rate, Lock Wait Load, Lock Hold Load, and Deadlock Wait Load displays. This scenario will typically present with low resource utilization, but spikes in the read/write lock displays and high request latency.

        What should you do when faced with unnecessary read or write locks? Remediation of this scenario pretty much always goes through optimization of either request code, data model, or both. Additional hardware resources will not help in this case because there is no hardware resource bound present. You can learn more about data model optimizations through MarkLogic University's On-Demand courses, in particular XML and JSON Data Modeling Best Practices and Impact of Normalization: Lessons Learned

        Relevant Knowledgebase articles:

        1. Understanding XDMP Deadlock
        2. How Do Updates Work in MarkLogic Server?
        3. Fast vs Strict Locking
        4. Read Only Queries Run at a Timestamp & Update Transactions use Locks
        5. Performance Theory: Tales From MarkLogic Support

        Overlarge Workloads

        Overlarge workloads typically take two forms: a. too many concurrent workloads or b. work intensive individual requests

        Too Many Concurrent Workloads

        With regard to too many concurrent workloads - we often see clusters exhibit poor performance when subjected to many more workloads than the cluster can reasonably handle. In this scenario, any individual workload could be fine - but when the total amount of work over many, many concurrently running workloads is large, the end result is often the oversubscription of the underlying resources.

        What does too many concurrent workloads look like? You can see this scenario in our Monitoring History at port 8002, in the Disk I/O, CPU, Memory Footprint, App Server Request Rate, App Server Latency, or Task Server Queue Size displays. This scenario will typically present with spikes in both App Server Latency and App Server Request Rate, and correlated maximum level plateaus in one or more of the aforementioned hardware resource utilization charts.

        What should you do when faced with too many concurrent workloads? Remediation of this scenario pretty much always involves the addition of more rate-limiting hardware resource(s). This assumes, of course, that request code and/or data model are both already fully optimized. If either could be further optimized, then it might be possible to enable a higher request count given the same amount of resources - see the "Work Intensive Individual Requests" section, below. Rarely, in circumstances where traffic spikes are unpredictable - but likely - we’ve seen customers incorporate load shedding or traffic management techniques in their application architectures. For example, when request times pass a certain threshold, traffic is then routed through a less resource hungry code path.

        Note that concurrent workloads entail both request workload and maintenance activities such as merging or reindexing. If your cluster is not able to serve both requests and maintenance acitvities, then the remidiation tactics are the same as listed above: you either need to a. add more rate-limiting hardware resource(s) to serve both, or b. you need to incorporate load shedding or traffic management techniques like restricting maintenance activities to periods where the necessary resources are indeed available.

        Relevant Knowledgebase articles:

        1. When submitting lots of parallel queries, some subset of those queries take much longer - why?
        2. How reindexing works, and its impact on performance
        3. MarkLogic Server I/O Requirements Guide
        4. Sizing E-nodes
        5. Performance Theory: Tales From MarkLogic Support
        Work Intensive Individual Requests

        With regard to work intensive individual requests - we often see clusters exhibit poor performance when individual requests attempt to do too much work. Too much work can entail an unoptimmized query, but it can also be seen when an otherwise optimized query attempts to work over a dataset that has grown past its original hardware specification.

        What do work intensive requests look like? You can see this scenario in our Monitoring History at port 8002, in the Disk I/O, CPU, Memory Footprint, App Server Request Rate, App Server Latency, or Task Server Queue Size displays. This scenario will typically present with spikes in one or more system resources (Disk I/O, CPU, Memory Footprint) and App Server Latency. In contrast to the "Too Many Concurrent Requests" scenario App Server Request Rate should not exhibit a spike.

        What should you do when faced with work intensive requests? As in the case with too many concurrent requests, it's sometimes possible for customers to address this situation with additional hardware resources. However, remediation in this scenario more typically involves finding additional efficiencies via code or data model optimizations. Code optimizations can be made with the use of xdmp:plan() and xdmp:query-trace(). You can learn more about data model optimizations through MarkLogic University's On-Demand courses, in particular XML and JSON Data Modeling Best Practices and Impact of Normalization: Lessons Learned. If the increase in work is rooted in data growth, it's also possible to reduce the amount of data. Customers pursuing this route will typically do periodic data purges or by using features like Tiered Storage.

        Relevant Knowledgebase articles:

        1. Gathering information to troubleshoot long-running queries
        2. Fast searches: resolving from the indexes vs. filtering
        3. What do I do about XDMP-LISTCACHEFULL errors?
        4. Resolving XDMP-EXPNTREECACHEFULL errors
        5. When should I look into query or data model tuning?
        6. Performance Theory: Tales From MarkLogic Support

        Additional Resources

        1. Monitoring MarkLogic Guide
        2. Query Performance and Tuning Guide
        3. Performance: Understanding System Resources

         

        ATTENTION

        This knowledgebase article dates from 2014 - which is a long time ago in terms of available hardware and MarkLogic development. While some of the fundamental principles in the article bellow still apply, you'll find more recent specific guidance in this "Performance Testing with MarkLogic" whitepaper.


        Performance Theory: Tales From MarkLogic Support

        This article is a snapshot of the talk that Jason Hunter and Franklin Salonga gave next at MarkLogic World 2014, also titled, “Performance Theory: Tales From The MarkLogic Support Desk.” Jason Hunter is Chief Architect and Frank Salonga is Lead Engineer at MarkLogic. 

        MarkLogic is extremely well-designed, and from the ground up it’s built for speed, yet many of our support cases have to do with performance. Often that’s because people are following historical conventions that no longer apply. Today, there are big-memory systems using a 64-bit address space with lots of CPU cores, holding disks that are insanely fast (but that haven’t grown in speed as much as they have in size*), hooked together by high-speed bandwidth. MarkLogic lives natively in this new reality, and that changes the guidelines you want to follow for finding optimal performance in your database.

        The Top 10 (Actually 16) Tips

        The following is a list of top 16 tips to realize optimal performance when using MarkLogic, all based on some of the common problems encountered by our customers:

        1. Buy Enough Iron
        MarkLogic is optimized for server-grade systems, those just to the left of the hockey-stick price jump. Today (April 2014) that means 16 cores, 128-256 Gigs of RAM, 8-20 TB of disk, 2 disk controllers.

        2. Aim for 100KB docs +/- 2 Orders of Magnitude
        MarkLogic’s internal algorithms are optimized for documents around 100 KB (remember, in MarkLogic, each document should be one unit of query and should be seen more like relational rows than tables). You can go down to 1 KB but below that the memory/disk/lock overhead per document starts to be troublesome. And, you can go up to 10 MB but above that line the time to read it off disk starts to be noticeable.

        3. Avoid Fragmentation
        Just avoid it, but if you must, then understand the tradeoffs.  See also Search and Fragmentation.

        4. Think of MarkLogic Like an Only Child
        It’s not a bug to use 100 percent of the CPU—that’s a feature. MarkLogic assumes you want maximum performance given available resources. If you’re using shared resources (a SAN, a virtual machine) you may want to impose restrictions that limit what MarkLogic can use.

        5. Six Forests, Six Replicas
        Every use case is different, but in general deployments of MarkLogic 7 are proving optimal with 6 forests on each computer and (if doing High Availability) 6 replicas.

        6. Earlier Indexing is Better Indexing
        Adding an index after loading requires touching every document with data relating to that index. Turning off an index is instant, but no space will be reclaimed until the re-index occurs. A little thought into index settings before loading will save you time.

        7. Filtering: Your Friend or Foe
        Indexes isolate candidate documents, then filtering verifies the hits. Filtering lets you get accurate results even without accurate indexes (e.g., a case sensitive query without the case sensitive index). So, watch out, as filtering can hide bad index settings! If you really trust the indexes, you can use “unfiltered.” It is best to perfect your index settings in a small test environment, then apply them to production.

        8. Use Meaningful Markup If You Can
        If you can use meaningful markup (where the tags describe the content they hold) you get both prettier XML and XML that’s easier to write indexes against.

        9. Don’t Try to Outsmart Merging
        Contact support if you plan to change any of the advanced merge settings (max size, min size, min ratio, timeout periods). You shouldn’t usually tweak these. If you’re thinking about merge settings, you’re probably underprovisioned (See Recommendation #1).

        10. Big Reads Go In Queries, Not Updates
        Hurrah! Using MVCC for transaction processing means lock-free reads. But, to be a “read” your module can’t include any update calls. This is determined by static analysis in advance, so even if the update call isn’t made, it still changes your behavior. Locks are cheap but they’re not free, and any big search to find the top 10 results will lock the full result set during the sort. Whenever possible, do update calls in a separate nested transaction context using xdmp:invoke() with an option specifying “different-transaction”.

        11. Taste Test
        Load a bit of data early, so you can get an idea about rates, sizes, and loads. Different index settings will affect performance and sizes. Test at a few sizes because some things scale linearly, some logarithmically.

        12. Measure
        Measure before. Measure after. Measure at all levels. When you know what’s normal, you can isolate when something goes different. MarkLogic 7 can internally capture “Monitoring History” to a Meters database. There are also tools such as Cacti, Ganglia, Nagios, Graphite, and others.

        13. Keep a Staging Box
        A staging box (or cluster) means you can measure changes in isolation (new application code, new indexes, new data models, MarkLogic upgrades, etc.). If you’re running on a cluster, then stage on a cluster (because you’ll see the effects of distribution, like net traffic and 2-phase commits). With AWS it’s easier than ever to “spin up” a cluster to test something.

        14. Adjust as Needed
        You need to be measuring so you know what is normal and then know what you should adjust. So, what can you adjust?

        • Code: Adjusting your code often provides the biggest bang
        • Memory sizes: The defaults assume a combo E-node/D-node server
        • Indexes: Best in advance, maybe during tasting. Or, try on staging
        • Cluster size and forest distribution: This is much easier in MarkLogic 7

        15. Follow Our Advice on Swap Space
        Our release notes tell you:

        • Windows: 2x the physical memory
        • Linux: 1x the physical memory (minus any huge pages), or 32GB, whichever is lower
        • Solaris: 1x-2x the physical memory

        MarkLogic doesn’t intend to leverage swap space! But, for an OS to give memory to MarkLogic, it wants the swap space to exist. Remember, disk is 100x cheaper than RAM, and this helps us use the RAM.

        16. Don’t Forget New Features
        MarkLogic has plenty of features that help with performance, including MLCP, tiered storage, and semantics. With the MLCP fast-load option, you can perform forest assignments on the client, and directly insert to that forest. It’s really a sharp tool, but you don’t use it if you’re changing forest topology or assignment policies. With tiered storage, you can use HDFS as cheap mass storage of data that doesn’t need high performance. Remember, you can “partition” data (i.e. based on dates) and let it age to slower disks. With semantics, you have a whole new way to model your data, which in many cases can produce easier to optimize queries.

        That’s it! With these pro tips, you should be able to handle the most common performance issues. 

        *With regard to storage, as you add capacity, it is critical that you add throughput in order to maintain a fast system (http://tylermuth.wordpress.com/2011/11/02/a-little-hard-drive-history-and-the-big-data-problem/)

        Summary

        This is a procedure to assist with maintenance activities that may require the MarkLogic service to be shutdown for a period of time, or for an OS reboot, while minimizing unavailability. It is assumed that High Availability (HA) is configured using local disk failover and all primary forests have a replica forest configured.

        NOTE: Security and App-Services databases must also be configured for HA.

        When a host in a MarkLogic cluster becomes unavailable, the host is not be fully disconnected from the cluster until the configured host timeout(default is 30 seconds) expires. If a primary forest resides on that host, the database and any application that references it will be unavailable from the time the host becomes unavailable until all replica forests assume the role of acting primary.

        If the host unavailability is planned, then you can take steps to minimize the database and application unavailability. This article discusses that a procedure.

        Planning

        When a host from the MarkLogic cluster is taken offline, all the remaining hosts must assume the workload previously performed by that host. For this reason, we recommend:

        • Scheduling server maintenance during low usage periods.
        • Evenly distributing a host's replica forests across the other nodes in the cluster so that the extra workload is evenly distributed when that host is unavailable.
        • Minimize the number of hosts removed for maintenance at any one time.

        If performing maintenance on more than one host at a time:

        • Define a maintenance group of hosts containing primary forests that have their local disk replica forests on hosts not in the maintenance group.
        • All required forests must have replica forests defined. This includes all content forest, security database forests and forests for all linked schema databases.

        Maintenance groups should be sized so that the remaining available hosts represents a reasonable portion of compute, memory and IO resources that can absorb the extra workload required during the maintenance period.

        Step 0: Verify all replica forests are synchronized

        Before initiating this procedure, verify that all replica forests are in sync with the primary forest by checking the forest status of the replicas are in the “sync replicating” state.

        This can be achieved using the MarkLogic Server administrative function xdmp:foreststatus or the Management API GET /manage/v2/forests/{id|name}?view=status endpoint.

        Step 1: Shutdown the host via REST API, forcing an immediate failover

        Make a call to the /manage/v2/hosts/{id|name} (POST) endpoint, setting failover to true.

        curl --anyauth --user user:password -X POST -i --data "state=shutdown failover=true" 
        -H "Content-type: application/x-www-form-urlencoded" 
        http://localhost:8002/manage/v2/hosts/my-host?format=JSON Using this endpoint with the failover parameter tells the cluster to use fast failover, which immediately fails the primary forests managed by that host over to their replicas, instead of waiting 30 seconds for the host to timeout.

        Step 2: Verify failover succeeded

        Wait until all of the replica forests take over – configured replica forests are now the acting primary forests and in the “open” state, while the configured primary forest is now disabled. You can manually monitor forest status in the Admin UI by refreshing the Forest status display. Once all forests have assumed their new roles, the database will be online.

        This step can also be achieved using the methods identified in Step 0.

        Step 3: Verify forests are synchronized

        Once maintenance has been completed and all hosts are back online, some of the replica forests may still be the acting as primaries. Verify that all acting replicas are in sync with the acting primary forests by by checking the forest status, and checking that the acting replicas are in the "sync replicating" state.

        This step can also be achieved using the methods identified in Step 0.

        Step 4: Force configured primary forests to resume acting primary forest role

        In order to force the configured primary forests to assume the role of acting primary forests, restart the configured replica / acting primary forests together. Restarting all forests together will help minimize outage impact.

        This step can also be achieved using the MarkLogic Server administrative function xdmp:forest-restart or the Managment API POST /manage/v2/forests/{id|name} endpoint.

        Further Reading

        Introduction

        Administrators can achieve very fine granularity on restores when incremental backups are used in conjunction with log archiving.

        Details

        Journal archiving can enable a restore to be performed to any timestamp since the last incremental backup.  For example, when using daily incremental backups in conjunction with 24-hour log archive retention, a restore can be made to any point in the previous 24 hours.

        This capability enables administrators to go back to the exact point in time before a user error caused bad data to be ingested into the database, minimizing any data loss on the restore. Although this is a very powerful capability, the entire operation to perform a restore is simplified. Administrators can execute a simple operation as the server restores the backup set and replays the journal starting from the timestamp given by the admin.

        For further information, see the documentation Restoring from an Incremental Backup with Journal Archiving.

        Summary

        There are index settings that may be problematic if your documents contain encoded binary data (such as Base64 encoded binary).  This article identifies a couple of these index settings and explains the potential pitafall.

        Details

        When word lexicons or string range indexes are enabled, each stand in the database's forest will contain a file called the 'atom data' file.  The contents of this file includes all of the relevant unique tokens.  This could include all the unique tokens in the forest (stand).  If your documents contain encoded binary data, all of the encode binary may be replicated as atom data and stored in the atom data file.

        Pitfall: There is an undocumented limit on the size of the atom data file of 4GB.  If this limit is exceeded for the content of a forest, then stand merges will begin to fail with the error

            "XDMP-FORESTERR: Error in merge of forest forest-nameSVC-MAPBIG: Mapped file too large to map: NNN bytes: '\path\Forests\forest-name\stand-id\AtomData'"

        Workarounds

        There are a few options that you can pursue to get around these problems

        1. Do not include encoded binary data in your documents.  An alternative is to store the binary content seperately using MarkLogic Server support for binary documents and to include a reference to the binary document in the original.

        2. If word lexicons are required, and the encoded binary data is limited to a finite number of elements in your documents, then you can create word query exclusions for those elements. In the MarkLogic Server Admin UI, word query element exclusions can be configured by navigating to -> Configure -> Databases -> {database-name} -> Word Query -> Exclude tab. 

        3. If a string range index is defined on an element that contains encoded binary, then you can either remove the string range index or change the document data model so that the element containing the encoded binary is not shared with an element that requires a string range index. 

         

         

        Introduction

        Looking at the MarkLogic Admin UI, you may have noticed that the status page for a given database displays the last backup date and time for a given database. We have been asked in the past how this gets computed so the same check can be performed using your own code. This Knowledgebase article shows examples that utilise XQuery to get this information and explores the possibility of retrieving this using the MarkLogic ReST API

        XQuery: How does the code work?

        The simple answer is in the forest status for each of the forests in the database (note these values only appear if you have created a backup already).  For the sake of these examples, let's say I have a database (called "test") which contains 12 forests (test-1 to test-12).  I can get the backup status using a call to our ReST API:

        http://localhost:8002/manage/LATEST/forests/test-1?view=status&format=html

        In the results returned, you should see something like:

        last-backup : 2016-02-12T12:30:39.916Z datetime
        last-incr-backup : 2016-02-12T12:37:29.085Z datetime
        

        In generating that status page in the MarkLogic Admin UI code, we create an aggregate - a database doesn't contain documents in MarkLogic, it contains forests and those forests contain documents.

        Continuing the example above (with a database called "test" containing 12 forests) if I run the following:

        This will return the forest status(es) for all forests in the database "test" and return the forest names using XPath, so in my case, I would see:

        <forest-name xmlns="http://marklogic.com/xdmp/status/forest">test-1</forest-name>
        [...]
        <forest-name xmlns="http://marklogic.com/xdmp/status/forest">test-12</forest-name>
        

        The MarkLogic Admin UI interrogate each forest in turn for that database and finds the metrics for the last backup.  To put that into context, if we ran the following:

        This gives us:

        <last-backup xmlns="http://marklogic.com/xdmp/status/forest">2016-02-12T12:30:39.946Z</last-backup>
        [...]
        <last-backup xmlns="http://marklogic.com/xdmp/status/forest">2016-02-12T12:30:39.925Z</last-backup>
        

        The code (or the status report) doesn't want values for all 12 forests, it just wants the time the last forest completed the backup (because that's the real time the backup completed), so our code is running a call to fn:max:

        Which gives us the max value (as these are all xs:dateTimes, it's finding the most recent date), which in the case of this example is:

        2016-02-12T12:30:39.993Z

        The same is true for the last incremental backup (note all that we're changing here is the XPath to get to the correct element):

        So we can get the max value for this by getting the most recent time across all forests:

        This would give us 2016-02-12T12:37:29.161Z

        Using the ReST API

        The ReST API does allow you to get this information but you'd need to jump through a few hoops to get to it:

        The ReST API status for a given database would give you the names of all the forests attached to that database:

        http://localhost:8002/manage/LATEST/databases/test

        And from there you could GET the information for all of those forests:

        http://localhost:8002/manage/LATEST/forests/test-1?view=status&format=html
        [...]
        http://localhost:8002/manage/LATEST/forests/test-12?view=status&format=html

        Once you'd got all those values, you could calculate the max values for them - but at this point, I think it would make more sense to write a custom endpoint that returns this information, something like:

        Where you could make a call to that module to get the aggregates (e.g.):

        http://[server]:[port]/[modulename.xqy]?db=test

        This would return the database status for any given parameter-name that is passed in.

        Introduction

        In this Knowledgebase article, we will discuss a technique which will allow you to scope queries in such a way to ensure that they occur only within a parent element.

        Details

        cts:element-query

        Consider a scenario where you have an XML document structured in this way:

        <rootElement>
          <id>7635940284725382398</id>
          <parentElement>
          <childElement1>valuea</childElement1>
          <childElement2>false</childElement2>
          </parentElement>
          <parentElement>
          <childElement1>valuea</childElement1>
        <childElement2>truthy</childElement2>
        </parentElement>
        <parentElement>
        <childElement1>valueb</childElement1>
        <childElement2>true</childElement2>
        </parentElement>
        <childElement1>valuec</childElement1>
        </rootElement>

        And you want to find the document where where a parentElement has a childElement1 with a value of 'valuec'.

        A search like

        cts:search (/,
            cts:element-value-query(xs:QName('childElement1'), 'valuec', 'exact')
        )

        will give you the above document, but doesn't consider where the childElement1 value is. This isn't what you want. Search queries perform matching per fragment, so there is no constraint that childElement1 be in any particular spot in the fragment.

        Wrapping a cts:element-query around a subquery will constrain the subquery to exist within an instance of the named element. Therefore,

        cts:search (/,
            cts:element-query (
                xs:QName ('parentElement'),
                cts:element-value-query(xs:QName('childElement1'), 'valuec', 'exact')
            )
        )

        will not return the above document since there is no childElement1 with a value of 'valuec' inside a parentElement.

        This applies to more-complicated subqueries too. For example, looking for a document that has a childElement1 with a value of 'valuea' AND a childElement2 with a value of 'true' as

        cts:search (/, 
            cts:and-query ((
                cts:element-value-query(xs:QName('childElement1'), 'valuea', 'exact'),
                cts:element-value-query(xs:QName('childElement2'), 'true', 'exact')
            ))
        )

        will return the above document. But you may want these two child element-values both inside the same parentElement. This can be accomplished with

        cts:search (/, 
            cts:element-query (
                xs:QName ('parentElement'),
                cts:and-query ((
                    cts:element-value-query(xs:QName('childElement1'), 'valuea', 'exact'),
                    cts:element-value-query(xs:QName('childElement2'), 'true', 'exact')
                ))
            )
        )

        This should give you expected results, as it won't return the above document since the two child element-value queries do not match inside the same parentElement instance.

        Filtering and indexes

        Investigating a bit further, if you run the query with xdmp:query-meters you will see (depending on your database settings) 

            <qm:filter-hits>0</qm:filter-hits>
            <qm:filter-misses>1</qm:filter-misses>

        What is happening is that the query can only determine from the current indexes that there is a fragment with a parentElement, and a childElement1 with a value of 'valuea', and a childElement2 with a value of 'true'. Then, after retrieving the document and filtering, it finds that the document is not a complete match and so does not return it (thus filter-misses = 1).

        (To learn more about filtering, refer to Understanding the Search Process section in our Query Performance and Tuning Guide.)

        At scale you may find this filtering slow, or the query may hit Expanded Tree Cache limits if it retrieves many false positives to filter through.

        If you have the correct positions enabled, the indexes can resolve this query without retrieving the document and filtering. In this case, after setting both

        element-word-positions

        and

        element-value-positions

        to true on the database and reindexing, xdmp:query-meters now shows

        <qm:filter-hits>0</qm:filter-hits>
        <qm:filter-misses>0</qm:filter-misses>

        (To track element-value-queries inside element-queries you need element-word-positions and element-value-positions enabled. The former is for element-query and the latter is for element-value-query.)

        Now this query can be run without filtering. However, if you have a lot of relationship instances in a document, the calculations using positions can become quite expensive to compute.

        Position details

        Further details: Empty-element positions are problematic. Positions are word positions, and the position of an element is the word position of the first word after the element starts to the word position of the first word after the element ends. Positions of attributes are the positions of their element. If everything is an empty element, you have no words and everything has the same position and so positions cannot discriminate between elements.

        Reindexing

        Note that if you change these settings you will need to reindex your database, and the usual tradeoffs apply (larger indexes and slower indexing). Please see the following for guidance on adding an index and reindexing in general:

        See also:

        Reindexing impact
        Adding an index in production

        Summary

        This article explains why you may encounter Cross-Site Request Forgery (CSRF) error (SECURITY-BADREQUEST) when using MarkLogic Server's Query Console application and how the issue can be resolved.

        Details

        Since the 8.0-6 release of MarkLogic Server, the security of Query Console is increased. Every time you load the application in the browser, there is a handshake between the browser and server, generating a secure CSRF token for the logged in user. This pairs the client with the server, allowing for secure communication. If another person logs into Query Console as the same user, their browser will perform another handshake, generating a new token and storing it on the server for that user. The other user whom was previously paired with the server will now have the wrong token and will see that CSRF error when performing any actions in the app that make a request to the server, until they refresh.

        MarkLogic is implementing the industry standard recommendation for CSRF. At this time, there is no option to disable this security feature.

        Best Practice

        Best practice would be to create a new user on MarkLogic Server for each person using the system. The "qconsole-user" role is enough to use the Query Console application. If they must be administrators, you can give them the "admin" role, but note that with this special role, the user will have the authority to perform any activity in MarkLogic Server, including adding or deleting users, adding or deleting documents, changing passwords, and so on.

        Further Reading

        Summary

        The Admin user bypasses all Security settings (roles/privileges) in the Security database and bypasses the document permissions in user databases. Any benchmark load test should be performed using a "real world" user account (i.e. not with the Admin user). 

        Treat Admin as a super user.

        MarkLogic treats the Admin user as a super user. When an Admin user executes a query, the query is not evaluated against any Security database settings and it bypasses all document permission checks (i.e. read, write, update) and Query privileges.

        When comparing performance of a query run by Admin user versus a non-Admin user, all other non-Admin user queries may show longer execution run times, depending upon how many roles the user inherits, the size of the security database, and on the nature of the query. You may not notice difference for an isolated single query executions, but when run under a large load, difference may be noticeable.

        Question: What is expected performance difference between Admin and non-Admin user.

        Each non Admin-user is different and will likely inherit different number of roles and have different permissions on documents. Hence security evaluation overhead for different user is different and should be tested for their specific environment and bench marked.

        Recommendation:

        Non-Admin Application user and roles should be part of any query development process, which will give good measure of the performance impact of the Security schema from initial phase.

        Further Reading

        Summary

        There is a limit to the number of registered queries held in the forest registry.  If your application does not account for that fact, you may get unexpected results. 

        Where is it?

        If a specific registered query is not found, then a cts:search operation with an invalid cts:registered-query throws an XDMP-UNREGISTERED exception. The XDMP-UNREGISTERED error occurs when a query could not be found in a forest query registry. If a query that had been previously registered can not be found, it may have been discarded automatically.  (In the most recent versions of MarkLogic Server at the time of this writing) The forest query registry only contains up to about 48,000 of the most recently used registered queries. If you register more than that, the least recently used ones get discarded.

        Recommendation

        To avoid registered queries being dropped, it’s a good idea to unregister queries when you know they aren’t needed any more.

        Summary

        If your index settings have a very large number of range indexes specified (on the order of thousands or even tens of thousands), you may find your MarkLogic Server instance returning a message saying that it "Cannot allocate memory" - even when your OS monitoring metrics indicate that there appears to be plenty of unused RAM.

        XDMP-FORESTERR: Error in startup of forest: SVC-MAPINI: Mapped file initialization error: mmap: Cannot allocate memory

        Detail

        The issue is not how much memory a system has, but how it's being used. In the interests of performance, MarkLogic Server indexes your content upon ingestion to the system, then memory maps those indexes to serialized data structures on disk. While it's true that each of those memory maps requires some amount of RAM, if you've got thousands of indexes and system monitoring is reporting RAM to spare, then you might be running up against Linux's default vm.max_map_count value.

        While it's possible to get past this issue by simply increasing the vm.max_map_count limit, you should seriously consider revisiting your index usage, as 1) it's likely the current indexing scheme could be replaced by a different one that uses far fewer indexes and 2) when your configuration exceeds on the order of 100 or so range indexes, you'll likely need to take special care to size and manage your topology so that you don’t run out of system resources, as well as potentially make configuration changes to the linux kernel on the d-nodes to which the relevant forests are assigned.

        ---

        Related Blog Post - 10000 Range Indexes  

        Introduction

        Seeing too many "stand limit" messages in your logs frequently? This article explains what this message means to your application and what actions should you take.

         

        What are Stands and how their numbers can increase?

        A stand holds a subset of the forest data and exists as a physical subdirectory under the forest directory. This directory contains a set of compressed binary files with names like TreeData, IndexData, Frequencies, Qualities, and such. This is where the actual compressed XMLdata (in TreeData) and indexes (in IndexData) can be found.

        At any given time, a forest can have multiple stands. To keep the number of stands to a manageable level MarkLogic runs merges in the background. A merge takes some of the stands on disk and creates a new singular stand out of them, coalescing and optimizing the indexes and data, as well as removing any previously deleted fragments.

        MarkLogic Server has a fixed limit for the maximum number of stands (64). When that limit is reached you will no longer be able to update your system. While MarkLogic automatically manage merges and it is unlikely to reach this limit, there are few configurations under user control that may impact merges and you may see this issue. e.g.

        1.) You can manage merges using Merge Policy Controls. e.g. setting a low merge max size would stop merges beyond the configured size and hence the overall number of stands would keep growing.

        2.) Low value of background-io-limit would mean less amount of I/O for background tasks such as merges. This may also adversely affect the merge rate and hence the number of stands may grow.

        3.) Low in-memory settings not keeping up with an aggressive data load. e.g. If you are bulk loading large documents and have low in memory tree size then stands may accumulate and reach the hard limit.

         

        What you can do to keep the number of stands within manageable limit?

        While MarkLogic automatically manage merges to keep the number of stands at a manageable level, it adds WARNING entry to the logs when it sees the number of stands growing alarmingly! e.g. Warning: Forest XXXXX is at 92% of stand limit

        If you see such messages in your logs, you should take some action as reaching the hard limit of 64 would mean you will no longer be able to update your system.

        Here's what you can check and do to lower the number of stands.

        1.) If you have configured merge policy controls then check if they actually match with your application usage. You could change the required settings as needed. For instance:

        2.) There should be no merge blackouts during ingestion, or any time there is heavy updating of your content.

        3.) Beginning with MarkLogic version 7, the server is able to manage merges with less free space required on your drives (1.5 times the size of your content). This is accomplished by setting the merge max size to 32768 (32GB). Although this does create more stands, this is OK on newer systems, since the server is able to use extra CPU cores in parallel.

        2.) If you have configured background-io-limit then check if that is sufficient for your application usage. If needed, increase the value so that merges can make use of more IO. You should only use this setting on systems that have limited disk IO. In general you want to first set it to 200, and if the disk IO seems to still be overwhelmed, set it to 150 and so on. A setting of 1oo may be too low for systems that are doing ingestion, since the merge process needs to be able to keep up with stand creation.

        3.) If you are performing bulk loads then check if the in-memory settings are suffificient and can be increased. If needed, increase the required value so that in-memory stands (and as a result on-disk stands) accomodate more data and thereby decreases the number of stands. If you do grow the in-memory caches, make sure to grow the database journal files by a corresponding amount. This will insure that a single large transaction will be able to fit in the journals.

         

        Conclusion 

        If you decide to control MarkLogic's merge process, you should monitor the system for any adverse effect that it may cause and take actions accordingly. MarkLogic Server continuously assesses the state of each database and the default merge settings and the dynamic nature of merges will keep the database tuned optimally at all times. So if you are unsure - let MarkLogic handle the merges for you!

        Introduction

        This article presents the steps to create a Read only Access User and a full access user to a Webdav Server.

        Details

        For read-only WebDAV access you can connect to WebDAV using the credentials of a user who does not have the rights to insert/update documents. This can be accomplished by creating a user and assigning roles to them through steps given below.

        1. If one does not already exist, create a WebDAV server (Instructions available in the MarkLogic Server Administrators Guide)

        • leave default user to "nobody", and 
        • leave required privilege empty

        2. Create a role - for the purpose of these instructions, call the new role "Read_only_Access" 

        • After you have entered a name for the new role (Read-Only-Access),  refresh the page and scroll to the "Default Permissions" section near the end of the page. The default permissions section will allow you to assign a capability to a particular role. In this case, we would select the "Read-Only-Access" role from the role drop down as well as the "read" capability.

        3. Create a user and grant that user the "Read_only_Access" role.

        4. Create another role - for the purpose of these instructions, call the new role "Write_only_Access"

        • After you have entered a name for the new role (Write_only_Access), you can refresh the page and scroll to the "Default Permissions" section near the end of the page. The default permissions section will allow you to assign a capability to a particular role. In this case, we would select the "Write_only_Access" role from the role drop down as well as the "read", "insert","execute" and "update"capabilities.

        5. Create another user and grant that user the "Write_only_Access" role.

        6. Set permission on the "/" directory so the "Read_only_Access" / "Write_only_Access" role can view/make changes respectivley.  This can also be accomplished by code as well.

           xdmp:document-add-permissions("/",xdmp:permission("Read_only_Access","read"))

          xdmp:document-add-permissions("/",xdmp:permission("Write_only_Access",("read", "insert","execute","update"))

        7. When you connect to a WebDAV client, both user will be able to view the root "/" directory, but cannot create files or folders. For this you will need to create a URI privilege for the "/" URI and add the  "Write_only_Access" role.

        Now the "Read_only" user can read those documents, and the "Write_only" user can both read and update the documents.

        Existing Documents

        While the user just created will have expected access to all the new documents, for previously existing documents in the database you will need to add the read permission to the documents contained in your database. This can be accomplished with xdmp:document-add-permission().

        For example:
            xdmp:document-add-permissions("/example.xml", xdmp:permission("Read_only_Access", "read"))

        MarkLogic Documentation

        For more details on how to manage security. please refer to the Security Administration section of our Administrators Guide.

         

         

         

         

        Overview

        Update transactions run with readers/writers locks, obtaining locks as needed for documents accessed in the transaction. Because update transactions only obtain locks as needed, update statements always see the latest version of a document. The view is still consistent for any given document from the time the document is locked. Once a document is locked, any update statements in other transactions wait for the lock to be released before updating the document.

        Read only query transactions run at a particular system timestamp, instead of acquiring locks, and have a read-consistent view of the database. That is, the query transaction runs at a point in time where all documents are in a consistent state.

        The system timestamp is a number maintained by MarkLogic Server that increases every time a change or a set of changes occurs in any of the databases in a system (including configuration changes from any host in a cluster). Each fragment stored in a database has system timestamps associated with it to determine the range of timestamps during which the fragment is valid.

        On a clustered system where there are multiple hosts, the timestamps need to be coordinated accross all hosts. Marklogic Server does this by passing the timestamp in every message communicated between hosts of the cluster, including the heartbeat message. Typically, the message carries two important pieces of information:

        • The origin host id
        • The precise time on the host at the time that heartbeat took place

        In addition to the heartbeat information, the "Label" file for each forest in the database is written as changes are made. The Label file also contains timestamp information; this is what each host uses to ascertain the current "view" of the data at a given moment in time. This technique is what allows queries to be executed at a 'point in time' to give insight into the data within a forest at that moment.

        You can learn more about transactions in MarkLogic Server by reading the Understanding Transactions in MarkLogic Server section of the MarkLogic Server Application Developers Guide.

        The distribute timestamps option on Application Server can specify how the latest timestamp is distributed after updates. This affects performance of updates and the timeliness of read-after-write query results from other hosts in the group.

        When set to fast, updates return as quickly as possible. No special timestamp notification messages are broadcasted to other hosts. Instead, timestamps are distributed to other hosts when any other message is sent. The maximum amount of time that could pass before other hosts see the update timestamp is one second, because a heartbeat message is sent to other hosts every second.

        When set to strict, updates immediately broadcast timestamp notification messages to every other host in the group. Updates do not return until their timestamp has been distributed. This ensures timeliness of read-after-write query results from other hosts in the group.

        When set to cluster, updates immediately broadcast timestamp notification messages to every other host in the cluster. Updates do not return until their timestamp has been distributed. This ensures timeliness of read-after-write query results from any host in the cluster, so requests made to any app server on any host in the cluster will see immediately consistent results.

        The default value for "distribute timestamps" option is fast. The remainder of this article is applicable when fast mode is used.

        Read after Write in Fast Mode

        We will look at the different scenario for the case where a read occurs in a transaction immediately following an update transaction.

        • If the read transaction is executed against an application server on the same node of the cluster (or any node that participated in the update) then the read will execute at a timestamp equal to or greater than the time that the update occurred.
        • If the read is executed in the context of an update transaction, then, by acquiring locks, the view of the documents will be the latest version of the documents.
        • If the read is executed in a query transaction, then the query will execute at the latest timestamp that the host on which it was executed is aware of. Although this will always produce a transactionally consistent view of the database, it may not return the latest updates. The remainder of this article addresses this case.

        Consider the following code:

        The above example performs the following steps:

        • Instantiates two XCC ContentSource Objects - each connecting to a different host in the cluster.
        • Establishes a short loop (which runs the enclosed steps 10 times)
          • Creates a unique UUID which is used as a URI for the Document
          • Establishes a session with the first host in the cluster and performs he following:
            • Gets the timestamp (session.getCurrentServerPointInTime()) and writes it out to the console / stdout
            • Inserts a simple, single element () as a document-node into a given database
            • Gets the timestamp again and writes it out to the console / stdout
          • The session with the first host is then closed. A new session is established with the second host and the following steps are performed:
            • Gets the timestamp at the start of the session and writes it out to the console / stdout
            • An attempt is made to retrieve the document which was just inserted
          • On success the second session will be closed.
          • If the document could not be read successfully, an immediate retry attempt follows thereafter - which will result a successful retrieval.

        Running this test will yield one of two results for each iteration of the loop:

        Query Transaction at Timestamp that includes Update

        Most of the time, you will find that the timestamps will be in lockstep with the host before - note that there is no time difference between the output from getCurrentServerPointInTime() after the document has been inserted and before the attempt is made to retrieve the document from the connection to the second host in the cluster.

        ----------------- START OF INSERT / READ CYCLE (1) -----------------
        First host timestamp before document is inserted: 	13673327800295300
        First host timestamp after document is inserted: 	13673328229180040
        Second host timestamp before document is read: 	13673328229180040
        ------------------ END OF INSERT / READ CYCLE (1) ------------------

        However, you may also see this:

        ----------------- START OF INSERT / READ CYCLE (10) -----------------
        First host timestamp before document is inserted: 	13673328311216780
        First host timestamp after document is inserted: 	13673328322546380
        Second host timestamp before document is read: 	13673328311216780
        ------------------ END OF INSERT / READ CYCLE (10) ------------------

        Note that on this run, the timestamps are out of sync; at the point where getCurrentServerPointInTime() is called, the timestamp for the second connection is at that point just before the document is inserted.

        Yet this also returns results that include the updates; in the interval between the timestamp being written to the console and the construction and submission of the newAdhocQuery(), the document has become available and was successfully retrieved during the read process.

        The path with an immediate retry

        Now let's explore what happens when the read only query transaction runs at a point in time that does not include the updates:

        ----------------- START OF INSERT / READ CYCLE (2) -----------------
        First host timestamp before document is inserted: 	13673328229180040
        First host timestamp after document is inserted: 	13673328240679460
        Second host timestamp before document is read: 		13673328229180040
        WARNING: Immediate read failed; performing an immediate retry
        Second host timestamp for read retry: 		13673328240679460
        Result Sequence below:
        <?xml version="1.0" encoding="UTF-8"?>
        <ok/>
        ------------------ END OF INSERT / READ CYCLE (2) ------------------

        Note that on this occasion, we see an outcome that starts much like the previous example; the timestamps mismatch and we see that we've hit the point in the code where our validation of the response fails.

        Also note that the timestamp at the point where the retry takes place is now back in step; from this, we can see that the document should be available even before the retry request is executed. Under these conditions, the response (the result) is also written to stdout so we can be sure the document was available on this attempt.

        Multi Version Concurrency Control

        In order to gurarantee that the "holistic" view of the data is current and available in a read only query transaction across each host in the cluster, two things need to take place:

        • All forests need to be up-to-date and all pending transactions need to be committed.
        • Each host must be in complete agreement as to the 'last known good' (safest) timestamp from which the query can be allowed to take place.

        In all situations, to ensure a complete (and reliable) view of the data, the read only query transaction must take place at the lowest known timestamp across the cluster

        With every message between nodes in the cluster, the latest timestamp information is communicated across each host in the cluster - the first "failed" attempt to read the document necessitates communication between each host in the cluster - and by doing so, this action propagates a new "agreed" timestamp across every node in the cluster.

        It is because of this, the retry will always work; at the point where the immediate read after write fails, timestamp changes are propagated, and the new timestamp is now at a waypoint for the retry query to take place. This is why the single retry is always guaranteed to work.

        Context

        This KB article talks specifically about how the Rebalancer interacts with database replication, and how to solve the issues that may arise if not configured correctly.

        For a general discussion on how rebalancing works in MarkLogic, refer to this article and the server documentation.

        Rebalancing and Database Replication

        When database replication is configured for a database, rebalancing is disabled by default on the Replica database and no rebalancing will occur until the database replication configuration is deleted. Until the time when the primary is available, forest to forest mapping will remain.

        Note that the Replica databases must have at least as many forests as the Master database. Otherwise, not all of the data on the Master database will be replicated.

        It is important to make sure that the assignment policy on the Replica is the same as the Master - so that in a DR situation, when the Replica takes over as the Primary, rebalancing is not triggered.

        Forest order mismatch can cause Rebalancing

        Forest order is the order in which forests are attached to the database. When the document assignment policy is set to 'Segment', 'Legacy' or 'Bucket', it is required that the Replica database configuration should have the same forest order as the Master to ensure rebalancing does not occur if or when replication is deconfigured.

        If there is a difference in forest orders between the Master and the Replica, a Warning level message is logged on the Replica, which looks like this:

        2015-10-21 13:34:59.359 Warning: forest order mismatch: local forest Test_12 is at position 15 
        while foreign master forest 2108358988113530610 (cluster=8893136914265436826) is at position 12

        In this state, when database replication is deleted between the clusters, the database on the Replica cluster will start to rebalance right away and it could take variable amount of time depending on how many documents need to be rebalanced.

        Fixing the forest order:

        On clusters with database replication enabled and both Master and Replica databases in sync (document counts match and all primary forests on Replica db are in 'open replica' state), the following steps help in removing the mismatch and making the forest order same on both Master and Replica

        i. Make sure that both Master and Replica databases have the same rebalancer assignment policy.

        ii. Disable rebalancer and reindexer, if you have them enabled on both clusters for the database in question.

        iii. Obtain the forest order from the Master cluster - below is the query for an example database:

        xquery version "1.0-ml";

        (: Returns a list of forests in order for a given database :)

        import module namespace admin = "http://marklogic.com/xdmp/admin" at "/MarkLogic/admin.xqy";

        let $config := admin:get-configuration()
        let $dbid := admin:database-get-id($config, "content-db-master")

        return admin:database-get-attached-forests($config,$dbid) ! xdmp:forest-name(.)

        Example output for this query is

        content-forest-2, content-forest-1, content-forest-3

        iv. On the Replica cluster, reorder the forests according to the order returned on the Master from step iii:

        xquery version "1.0-ml";

        import module namespace admin = "http://marklogic.com/xdmp/admin" at "/MarkLogic/admin.xqy";
        let $config := admin:get-configuration()
        let $dbid := admin:database-get-id($config, "content-db-replica")
        let $forest-names-in-order := (
        "content-forest-2",
        "content-forest-1",
        "content-forest-3"
        )

        let $forest-ids := $forest-names-in-order ! xdmp:forest (.)
        let $config := admin:database-reorder-forests($config, $dbid, $forest-ids)
        return (
        'reordering to: ' || fn:string-join ($forest-names-in-order, ', '),
        admin:save-configuration($config)
        )

        v. Re-enable rebalancer and reindexer on both clusters, if you had them enabled previously.

        vi.Verify that the Warning messages on the Replica cluster do not appear anymore. (these messages are logged once every hour)

        Further Reading:

        Database Rebalancing

        Understanding what work the rebalancer will do

        Using the rebalancer to move the content in one forest to another location

        Checking database replication status

        On a MarkLogic 7 cluster or a MarkLogic 8 cluster that was previously upgraded from MarkLogic Server version 6, reindexing of the triple index does not always get triggered when the triple index is turned off. Reindexing is performed after turning off an index in order to reclaim space that the index was using.

        The workaround is to force a manual reindexing.

        Summary

        When used as a file system, GFS needs to be tuned for optimal performance with MarkLogic Server.

        Recommendations

        Specifically, we recommend tuning the demote_secs and statfs_fast parameters. The demote_secs parameter determines the amount of time GFS will wait before demoting a lock on a file that is not in use. (GFS uses a time-based locking system.) One of the ways that MarkLogic Server makes queries go fast is its use of memory mapped index files. When index files are stored on a GFS filesystem, locks on these memory-mapped files are demoted purely on the basis of demote_secs, regardless of use. This is because they are not accessed using a method that keeps the lock active -- the server interacts with the memory map, not direct access to the on-disk file.

        When a GFS lock is demoted, pages from the memory-mapped index files are removed from cache. When the server makes another request of the memory-mapped file, GFS must acquire another lock and the requested page(s) from the on-disk file must be read back into cache. The lock reacquisition process, as well as the I/O needed to load data from disk into cache, may causes noticeable performance degradation.

        Starting with MarkLogic Server 4.0-4, MarkLogic introduced an optimization for GFS. From that maintenance release forward, MarkLogic gets the status of its memory-maps files every hour, which results in the retention of the GFS locks on those files so that they do not get demoted. Therefore, it is important that demote_secs is equal to or greater than one hour. It is also recommended that the tuning parameter statfs_fast is set to "1" (true), which makes statfs on GFS faster.

        Using gfs_tool, you should be able to set the demote_secs and statfs_fast parameters to the following values:

        demote_secs 3600

        statfs_fast 1

        While we're discussin tuning a Linux filesystem, it is worth noting the following Linux tuning tips also:

        • Use the deadline elevator (aka I/O scheduler), rather than cfq, on all hosts in the cluster. This has been added to our installation requirements for RHEL. With RHEL-4, this requires the elevator=deadline option at boot time. With RHEL-5, this can be changed at any time via /sys/block/*/queue/scheduler
        • If you are running on a VM slice, then no-op I/O scheduler is recommended.
        • Set the following kernel tuning parameters:

        Edit /etc/sysctl.conf:

        vm.swappiness = 0

        vm.dirty_background_ratio=1

        vm.dirty_ratio=40

        Use sudo sysctl -f to apply these changes.

        • It is very important to have at least one journal per host that will mount the filesystem. If the number of hosts exceeds the number of journals, performance will suffer. It is, unfortunately, impossible to add more journals without rebuilding the entire filesystem, so be sure to set journals up for each host during your initial build.

         

        Working with RedHat

        Should you run into GFS-related problems, running the following Script will provide all the information that you need in order to work with the Redhat Support Team:


        mkdir /tmp/debugfs

        mount -t debugfs none /tmp/debugfs

        mkdir /tmp/$(hostname)-hangdata

        cp -rf /tmp/debugfs/dlm/ /tmp/$(hostname)-hangdata

        cp -rf /tmp/debugfs/gfs2/ /tmp/$(hostname)-hangdata

        echo 1 > /proc/sys/kernel/sysrq 

        echo 't' > /proc/sysrq-trigger 

        sleep 60

        cp /var/log/messages /tmp/$(hostname)-hangdata/

        clustat > /tmp/$(hostname)-hangdata/clustat.out

        cman_tool services > /tmp/$(hostname)-hangdata/clustat.out

        mount -l > /tmp/$(hostname)-hangdata/mount-l.out

        ps aux > /tmp/$(hostname)-hangdata/ps-aux.out

        tar cjvf /tmp/$(hostname)-hangdata.tar.bz /tmp/$(hostname)-hangdata/

        umount /tmp/debugfs/

        rm -rf /tmp/debugfs

        rm -rf /tmp/$(hostname)-hangdata

        Introduction

        MarkLogic is supported on XFS filesystem. The minimum system requirements can be found here:

        https://developer.marklogic.com/products/marklogic-server/requirements-9.0

        The default mount options will generally give good performance, assuming the underlying hardware is capable enough in terms of IO performance and durability of writes, but if you can test your system adequately, you can consider different mount options.

        The values provided here are just general recommendations, if you wish to fine tune your storage performance, you need to ensure that you do adequate testing both with MarkLogic and low level tools such as fio:

        http://freecode.com/projects/fio

        1. I/O Schedulers

        Unless you have a directly connected single HDD or SSD, noop is usually the best choice, see here for more details:

        https://help.marklogic.com/Knowledgebase/Article/View/8/0/notes-on-io-schedulers

        2. XFS Mount options

        relatimeThe default atime behaviour is relatime, which has almost no overhead compared to noatime but still maintains sane atime values. All Linux filesystems use this as the default now (since around 2.6.30), but XFS has used relatime-like behaviour since 2006, so no-one should really need to ever use noatime on XFS for performance reasons.

        attr2 This options enables an "opportunistic" improvement to be made in the way inline extended attributes are stored on-disk. It's the default and should be kept as such in most scenarios.

        inode64 - to sum up this allows xfs to create nodes anywhere and not worry about backwards compatibility, which should result in better scalability. See here for more information: https://access.redhat.com/solutions/67091

        sunit=x,swidth=y XFS allows you to specify RAID settings. This enables the file system to optimize its read and write access for RAID alignment, e.g. by committing data as complete stripe sets for maximum throughput. These RAID optimizations can significantly improve performance, but only if your partition is properly aligned or of you are avoiding misalignment by creating the xfs on a device without partitions. 

        largeio, swalloc - these are intended to further optimize streaming performance on RAID storage. You need to do your own testing.

        isize=512 - XFS allow inlinings of data into inodes to avoid the need for additional blocks and the corresponding expensive extra disk seeks for directories. In order to use this efficiently, the inode size should be increased to 512 bytes or larger.

        allocsize=131072k (or larger) XFS can be tuned to a fixed allocation size, for optimal streaming write throughput. This setting could have a significant impact on the interim space usage in systems with many parallel write and create operations.

        As with any advice of this nature, we strongly advise that you always do your own testing to ensure that options you choose are stable and reliable for your workload.

        Summary

        The XDMP-LABELBADMAGIC error appears when attempting to mount a forest with a corrupted or zero length Label file.  This article identifies a potential cause and provides the steps required to work around this issue.

        Details

        The XDMP-LABELBADMAGIC error is often seen on systems where the server was running out of disk space.  If there is no space for MarkLogic Server to write the forest's Label file, a zero length Label file may result. The side effect of that would be the XDMP-LABELBADMAGIC error.

        Below is an example showing how this error might appear in ErrorLog.txt when the Triggers forest has a zero length Label file.

        2013-03-21 13:02:11.835 Alert: XDMP-FORESTERR: Error in mount of forest Triggers: XDMP-LABELBADMAGIC: Bad forest label magic number: 0x0 instead of 0x1020304

        2013-03-21 13:02:11.835 Error: NullAssignment::localMount: XDMP-LABELBADMAGIC: Bad forest label magic number: 0x0 instead of 0x1020304

        In order to recover from this error, you will need to manually remove the bad Label file.  Removing the Label file will force MarkLogic Server to recreate the file and will allow the forest to be mounted.

        Steps for recovery:

        1. Make sure MarkLogic Server is shutdown on the affected host.

        2. Remove the Label file for the forest displaying the error

        a. In Linux the default location is "/var/opt/MarkLogic/Forests/[Forest-Name]/Label"

        b. In Windows the default location is "c:\Program Files\MarkLogic\Data\Forests\[Forest-Name]\Label"

        3. Restart MarkLogic Server.

        Introduction

        In some situations an existing cluster node needs to be replaced. There are multiple reasons for this activity like hardware failure or hardware replacement.

        In this Knowledgebase article we will outline the steps necessary to replace the node by reusing the existing cluster configuration without registering it again.

        Important notes:

        • The replacement node must have the same architecture as all other nodes of the cluster (e.g., Windows, Linux, Solaris). The CPUs must also have the same number of bits (e.g., 64, 32).
        • The replacement node must have the same (or higher) count of CPU cores
        • The replacement node must have the same (or higher) allocated disk space and mount points as the old node
        • The replacement node must have the same hostname as the old node, unless the node is an AWS EC2 instance using MARKLOGIC_EC2=1(default when using MarkLogic AMIs)

        Preparation steps for re-joining a node into the cluster

        • Install and configure the operating system
          • make sure the mount points are matching the old setup
          • in case the previous storage is healthy it can be reused (forests located on it will be mounted)
        • For any non-MarkLogic data (such as XQuery modules, Deployment scripts etc.) required to run on this node, ensure these are manually zipped and copied over as part of the staging process
        • Copy over MarkLogic configuration files (/var/opt/MarkLogic/*.xml) from a backup of the old node
          • If xdqp ssl enabled is set to true, change the setting to false.  If you can’t do this through the Admin UI, you can manually update the value of xdqp-ssl-enabled to false.
          • To re-enable ssl for xdqp connections once the node has rejoined the cluster, you will need to regenerate the replacement host certificate.  Follow the instructions in theRegenerating a XDQP Host Certificatessection of this article.

        Downloading MarkLogic for the New Host

        MarkLogic Server, and the optional MarkLogic Converters and Filters, can be downloaded from the MarkLogic Developer Community, the most recent versions can be found at the following URLS, and will provide you the option of downloading by either https or curl:

        If the exact version you are running is not available, you may still be able to download it by getting the download link for the closest current version (8,9 or 10), and editing the minor version number in the link.

        So if you need 10.0-1, and the current available version is 10.0-2, when you choose the Download via Curl option, you will get a download link that looks like this:

        https://developer.marklogic.com/download/binaries/10.0/MarkLogic-10.0-2-amd64.msi?t=SomeHashValue/1&email=myemail%40mycompany.com

        Update the URL with the minor release version you need:

        https://developer.marklogic.com/download/binaries/10.0/MarkLogic-10.0-1-amd64.msi?t=SomeHashValue/1&email=myemail%40mycompany.com

        If you are unable to get the version you need this way, then contact MarkLogic Support.

        Rejoining the Replacement Node to the Cluster

        There are two methods to rejoin a host into the cluster, depending on the availability of configuration files.

        1. Using an older set of configuration files from the node being replaced
        2. Creating a new set of configuration files from another node in the cluster

        Method 1: Rejoining the Cluster With Existing Configuration Files

        This procedure can be only performed if existing configuration files from /var/opt/MarkLogic/*.xml are available from the lost/old node otherwise it will fail causes a lot of problems.

        • Perform a standard MarkLogic server installation on the new target node
          • $ rpm -Uvh /path/to/MarkLogic-<version>.x86_64.rpm or yum install /path/to/MarkLogic-<version>.x86_64.rpm
          • $ rpm -Uvh /path/to/MarkLogicConverters-<version>.x86_64.rpm or yum install /path/to/MarkLogicConverters-<version>.x86_64.rpm (optional)
          • Verify local configuration settings in/etc/marklogic.conf (optional)
          • Do not start MarkLogic server
        • Create a new data directory
          • $ mkdir /var/opt/MarkLogic (default location; might already exist if this separate mount point)
          • Verify ownership of the data directory, daemon.daemon by default.
            • To fix: $ chown -R daemon:daemon /var/opt/MarkLogic
        • Copy an existing set of configuration files into the data directory
          • $ cp /path/to/old/config/*.xml /var/opt/MarkLogic
          • Verify ownership of the configuration files, daemon.daemon by default.
            • To fix: $ chown daemon:daemon /var/opt/MarkLogic/*.xml
        • Perform a last sanity check
          • Hostname must be the same as the old node, except for AWS EC2 nodes as mentioned above
          • Verify firewall or Security Group rules are correct
          • Verify mount points, file ownership and permissions are correct
        • Start MarkLogic
          • $ service MarkLogic start
        • Monitor the startup process

        After starting the node it will reuse the existing configuration settings and assume the identity of the missing node. 

        Method 2: Rejoining the Cluster With Configuration Files From Another Node

        This procedure is required if there is no older configuration file set available. For example no file backup was made from /var/opt/MarkLogic/*.xml. It requires manual editing of a configuration file.  

        • Perform a standard MarkLogic server installation on the new target node
          • $ rpm -Uvh /path/to/MarkLogic-<version>.x86_64.rpm or yum install /path/to/MarkLogic-<version>.x86_64.rpm
          • $ rpm -Uvh /path/to/MarkLogicConverters-<version>.x86_64.rpm or yum install /path/to/MarkLogicConverters-<version>.x86_64.rpm (optional)
          • Verify local configuration settings in /etc/marklogic.conf (optional)
        • Start MarkLogic, and perform a normal server setup as a single node. DO NOT join the cluster now.
          • $ service MarkLogic start
          • Perform a basic setup
          • DO NOT join the host to the cluster!
        • Stop MarkLogic, and move current configuration files in /var/opt/MarkLogic to a new location
          • $ service stop MarkLogic
          • $ mv /var/opt/MarkLogic/*.xml/some/place
        • Copy a configuration files set from one of the other nodes over
          • $ scp <othernode>:/var/opt/MarkLogic/*.xml /var/opt/MarkLogic
          • Verify ownership of the data directory, daemon.daemon by default.
            • To fix: $ chown -R daemon:daemon /var/opt/MarkLogic
        • Make note of the <host-id> for the node be recreated in hosts.xml
          • $ grep -B1 hostname /var/opt/MarkLogic/hosts.xml
        • Edit /var/opt/MArkLogic/server.xml **Note: This step is critically important to ensure correct operation of the cluster.
          • Use a UTF-8 safe editor like nano or vi
          • Update <host-id> with the value found in/var/opt/MarkLogic/hosts.xml
          • Update <license-key> value if necessary.
          • Update <licensee> value if necessary.
          • Save the changes
        • Perform a last sanity check
          • <host-id> must match the <host> defined in hosts.xml.
            • Important: host will not start if these values do not match 
          • Hostname must be the same as the old node, unless the node is a