Knowledgebase : Errors

Summary

Occasionally, you might see an "Invalid Database Online Event" error in your MarkLogic Server Error Log. This article will help explain what this error means, as well as provide some ways to resolve it.

What the Error Means

The XDMP-INVDATABASEONLINEEVENT means that something went wrong during the database online trigger event. There are many situations that can trigger this event, such as a server-restart, or when any of the databases has a change in configuration). In most cases, this error is harmless - it is just giving you information.

Resolving the Error

We often see this error when the user id that is baked into the database online event created by CPF is no longer valid, and the net effect is that CPF's restart handling is not functioning. We believe reinstalling CPF should fix this issue.

If re-installing CPF does not resolve this error, you will want to further analyze and debug the code that is invoked by the restart trigger.

 

 

 

Details:

Upon boot of CentOS 6.3, MarkLogic users may encounter the following warning:

:WARNING: at fs/hugetlbfs/inode.c:951 hugetlb_file_setup+0x227/0x250() (Not tainted)

MarkLogic 6.0 and earlier have not been certified to run on CentOS 6.3. This message is due to MarkLogic using a resource that has been deprecated in CentOS 6.3. The message can be ignored, as it will not cause any issues with MarkLogic performance. Although this example points specifically points out CentOS 6.3, this message could potentially occur in other MarkLogic/Linux combinations.

Introduction

After upgrading to MarkLogic 10.x from any of the previous versions of MarkLogic, examples of the following Warning and Notice level messages may be observed in the ErrorLogs:

Warning: Lexicon '/var/opt/MarkLogic/Forests/Documents/00000006/c4ea1b602ee84a34+Lexicon' collation='http://marklogic.com/collation/zh-Hant' out of order


Notice: Repairing out of order lexicon /var/opt/MarkLogic/Forests/Documents/00000006/c4ea1b602ee84a34+Lexicon collation 'http://marklogic.com/collation/zh-Hant' version 0 to 602

Warning: String range index /space/Forests/Documents/0006ef0e/c0dc932d1b4bcaae-37c6e3905909f64e+string collation 'http://marklogic.com/collation/' out of order. 

Notice: Repairing out of order string range index /space/Forests/Documents/0006ef0e/c0dc932d1b4bcaae-37c6e3905909f64e+string collation 'http://marklogic.com/collation/' version 0 to 602

Starting with MarkLogic 10.0, the server now automatically checks for any lexicons or string range indexes that may be in need of repair.  Lexicons and range indexes perform "self-healing" in non-read-only stands whenever a lexicon/range index is opened within the stand.

Reason

This is due to changes introduced to the behavior of MarkLogic's root collation.

Starting with MarkLogic 10.0, the root collation has been modified, along with all collations that derive from it, which means there may be some subtle differences in search ordering.

For more information on the specifics of these changes, please refer to http://www.unicode.org/Public/UCA/6.0.0/CollationAuxiliary.html

This helps the server to support newer collation features, such as reordering entire blocks of script characters (for example: Latin, Greek, and others) with respect to each other. 

Implementing these changes has, under some circumstances, improved the performance of wildcard matching by more effectively limiting the character ranges that search scans (and returns) for wildcard-based matching.

Based on our testing, we believe this new ordering yields better performance in a number of circumstances, although it does create the need to perform full reindexing of any lexicon or string range index using the root collation.

MarkLogic Server will now check lexicons and string range indexes and will try to repair them where necessary.  During the evaluation, MarkLogic Server will skip making further changes if any of the following conditions apply:

(a) They are already ordered according to the latest specification provided by ICU (1.8 at the time of writing)

(b) MarkLogic Server has already checked the stand and associated lexicons and indexes

(c) The indexes use codepoint collation (in which case, MarkLogic Server will be unable to change the ordering).

Whenever MarkLogic performs any repairs, it will always log a message at Notice level to inform users of the changes made.  If for any reason, MarkLogic Server is unable to make changes (e.g. a forest is mounted as read-only), MarkLogic will skip the repair process and nothing will be logged.

As these changes have been introduced from MarkLogic 10 onwards, you will most likely observe these messages in cases where recent upgrades (from prior releases of the product) have just taken place.

Repairs are performed on a stand by stand basis, so if a stand does not contain any values that require ordering changes, you will not see any messages logged for that stand.

Also, if any ordering issues are encountered during the process of a merge of multiple stands, there will only be one message logged for the merge, not one for each individual stand involved in that merge.

Summary

  • Repairs will take place for any stand that has been found to have a lexicon or string index that has an out-of-order and out-of-date (e.g. utilising a collation described by an earlier version of ICU) collation, unless that stand is mounted as read only.
  • Any repair will generate Notice messages when maintenance takes place.
  • Whenever a lexicon or string Range index is opened, this check/repair will take place for any string range index; lexicon call (e.g. cts:values); range query (e.g. cts:element-range-query) and during merges merges.
  • The check looking for ICU version mismatches plus items that are out-of-order, so any lexicon / string range index with older ordering (and which requires no further changes), no further action will be taken for that stand.

Known side effects

If the string range index or lexicon is very large, repairing can cause some performance overhead and may impact search performance during the repair process.

Solution

These messages can be avoided by issuing a full reindex of your databases immediately after performing your upgrade to MarkLogic 10.

Introduction

When configuring database replication, it is important to note that the Connect Forests by Name field is true by default. This works great because, when new forests of the same name are later added to the Master and Replica databases, they will be automatically configured for Database Replication.

The issue

The problem arises when you use replica forest names that do not match the original Master forest names. In that case, you may find that failover events cause forests to get stuck in the Wait Replication state. The usual methods of failing back to the designated masters will not work - restarting the replicas will not work, and neither will shutting down cluster/removing labels/restarting cluster.

Resolution

In this case, the way to fix the issue is to set Connect Forests by Name to false, and then you must manually connect the Master forests on the local cluster to the Replica forests on the foreign cluster, as described in the documentation: Connecting Master and Replica Forests with Different Names.

it is worth noting that, starting MarkLogic 7, you are also allowed to rename the replica forests. Once you rename the replica forests to the same name as the forest name of the designated master database (e.g., the Security database should have a Security forest in both the master and replica), then they will be automatically configured for Database Replication, as expected.

Summary

XDMP-ODBCRCVMSGTOOBIG can occur when a non-ODBC process attempts to connect to an ODBC application server.  A couple of reasons that this can happen is that there is an http application that has been accidentally configured to point to the ODBC port, or a load balancer is sending http health checks to an ODBC port. There are a number of common error messages that can indicate whether this is the case.

Identifying Errors and Causes

One method of determining the cause of an XDMP-ODBCRCVMSGTOOBIG error is to take the size value and convert it to Characters.  For example, given the following error message:

2019-01-01 01:01:25.014 Error: ODBCConnectionTask::run: XDMP-ODBCRCVMSGTOOBIG size=1195725856, conn=10.0.0.101:8110-10.0.0.103:54736

The size, 1195725856, can be converted to the hexadecimal value 47 45 54 20, which can be converted to the ASCII value "GET ".  So what we see is a GET request being run against the ODBC application server.

Common Errors and Values

Error Hexadecimal Characters
XDMP-ODBCRCVMSGTOOBIG size=1195725856 47 45 54 20 "GET "
XDMP-ODBCRCVMSGTOOBIG size=1347769376 50 55 54 20 "PUT "
XDMP-ODBCRCVMSGTOOBIG size=1347375956 50 4F 53 54 "POST"
XDMP-ODBCRCVMSGTOOBIG size=1212501072 48 45 4C 50 "HELP"

Conclusion

XDMP-ODBCRCVMSGTOOBIG errors, do not affect the operation of MarkLogic Server, but can cause error logs to fill up with clutter.  Determining that the errors are caused by an http request to an ODBC port can help to identify the root cause, so the issue can be resolved.

Summary

CSV files are a very common data exchange format. It is often used as an export format for spreadsheets, databases or any other application. Depending on the application, you might be able to change the delimiter character to a #hash or *asterix etc. One of the default delimiter definitions is a tab character. Content Pump supports reading and loading such CSV files.

Detail

The Content Pump -delimiter option defines which delimiter will be used to split the columns. Defining a tab as a value for the delimiter option on the command line isn't straight forward.

Loading tab delimited data files with content pump can result in an error massage like the following:

mlcp>bin/mlcp.sh IMPORT -host localhost -port 9000 -username admin -password secret -input_file_path sample.csv -input_file_type delimited_text -delimiter '    ' -mode local
13/08/21 15:10:20 ERROR contentpump.ContentPump: Error parsing command arguments: 
13/08/21 15:10:20 ERROR contentpump.ContentPump: Missing argument for option: delimiter
usage: IMPORT [-aggregate_record_element <QName>]
... 

Depending on the command line shell, a tab needs to be escaped to be understand from the shell script: 

On bash shell, this should work: -delimiter $'\t'
On Bourne shell, this should work: -delimiter 'Ctrl+V followed by tab' 
Alternative way would be to use: -delimiter \x09 

If none of these work, another approach you can try is to use the -options_file /path/to/options-file parameter. The options file can contains all of the same parameters as the command line does. The benefit of using an option file is that the command line is simpler and characters are interpreted as intended. The options file will contain multiple lines where the first line is always the action like IMPORT,  EXPORT etc. followed by a pair of lines. The first line is the option parameter and second the value for the option.

A sample could look like the following:

IMPORT
-host
localhost
-port
9000
-username
admin
-password
secret
-input_file_path
/path/to/sample.csv
-delimiter
' '
-input_file_type
delimited_text


Make sure the file is saved in UTF-8 format to avoid any parsing problems. To define a tab as delimiter, place a real tab between single quotes (i.e. '<tab>')

To use this option file with mlcp execute the following command:

Linux, Mac, Solaris:

mlcp>bin/mlcp.sh -options_file /path/to/sample.options

Windows:

mlcp>bin/mlcp.bat -options_file /path/to/sample.options

The options file can take any paramter which mlcp understands. It is important that the action command is defined on the first line. It is also possible to use both command line parameters and the option file. Command line parameters take precedence over those defined in the options file.

Introduction 

Division operations involving integer or long datatypes may generate XDMP-DECOVRFLW in MarkLogic 7. This is the expected behavior but it may not be obvious upon initial inspection.  

For example, similar queries with similar but different input values executed in Query Console on Linux/Mac machine running MarkLogic 7 gives the following results

1. This query returns correct results

let $estimate := xs:unsignedLong("220")

let $total := xs:unsignedLong("1600")

return $estimate div $total * 100

==> 13.75

2. This query returns the XDMP-DECOVRFLOW Error

 

let $estimate := xs:unsignedLong("227")

let $total := xs:unsignedLong("1661")

return $estimate div $total * 100

==> ERROR : XDMP-DECOVRFLW: (err:FOAR0002)

Details

The following defines relevant behaviors in MarkLogic 7 and previous releases.

  • In MarkLogic 7, if all the operands involved in div operations are integer, long or integer sub-types in XML, then the resulting value of the div operation are stored as xs:decimal.
  • In versions previous to MarkLogic 7, if an xs:decimal value is large and occupies all digits then it was implicitly cast into an xs:double for further operations - i.e. beginning with MarkLogic, implict casting no longer occurs in this situation .
  • xs:decimal can accomodate 18 digits as a datatype.
  • In MarkLogic 7 on Linux & Mac, xs:decimal can occupy all digits depending upon actual value ( 227 div 1661 = 0.1366646598434677905 ), all 18 digits occupied in xs:decimal
  • MarkLogic 7 on Windows does not perform division with full decimal precision ( 227 div 1661 produces 0.136664659843468 ); as a result, not all 18 digits occupied in xs:decimal
  • MarkLogic 7 will generates Overflow Exception : FOAR0002, when an operation is performed on an xs:decimal that is already at full decimal precision

In the example above, multiplying the result with 100 gives an error in Linux/Mac, while its OK on Windows.

Recommendations:

We recommend xs:double be used for all division related operations in order to explicitly cast resulting value to larger data-type.

For example: These will return results

xs:double($estimate) div $total * 100

$estimate div $total * xs:double(100)

.

 

 

Introduction

In the more recent versions of MarkLogic Server, there are checks in place to prevent the loading of invalid documents (such as documents with multiple root nodes).  However, documents loaded in earlier versions of MarkLogic Server can now result in duplicate URI or duplicate document errors being reported.

Additionally, under normal operating conditions, a document/URI is saved in a single forest. If somehow the load process gets compromised, then user may see issues like duplicate URI (i.e. same URI in different forests) and duplicate documents (i.e. same document/URI in same forest).

Resolution

If the XDMP-DBDUPURI (duplicate URI) error is encountered, refer to our KB article "Handling XDMP-DBDUPURI errors" for procedures to resolve.

If one doesn't see XDMP-DBDUPURI errors but running fn:doc() on a document returns multiple nodes then it could be a case of duplicate document in same forest.

To check that the problem is actually duplicate documents, one can either do an xdmp:describe(fn:doc(...)) or fn:count(fn:doc((...)). If these commands return more than 1 e.g. xdmp:describe(fn:doc("/testdoc.xml")) returns (fn:doc("/testdoc.xml"), fn:doc("/testdoc.xml")) or fn:count(fn:doc("/testdoc.xml")) returns 2 then the problem is of duplicate documents in the same forest (and not duplicate URIs).

To fix duplicate documents, the document will need to be reloaded.

Before reloading, you can take a look at the two version to see if there is a difference.  Check fn:doc("/testdoc.xml")[1] versus fn:doc("/testdoc.xml")[2] to see if there is a difference, and which one you want to reload.

If there is a difference, that may also that may point the operation that created the situation.

Summary

A forest reindex timeout error may occur when there are transactions holding update locks on documents for an extended period of time. A reindexer process is started as a result of a database index change or a major MarkLogic Server upgrade.  The reindexer process will not complete until after update locks are released.

Example error text seen in the MarkLogic Server ErrorLog.txt file:

XDMP-FORESTERR: Error in reindex of forest Documents: SVC-EXTIME: Time limit exceeded

Detail

Long running transactions can occur if MarkLogic Server is participating in a distributed transaction environment. In this case transactions are managed through a Resource Manager. Each transaction is executed in a two phase commit. In the first phase, the transaction will be prepared for a commit or a rollback. The actual commit or rollback will occur in the second phase. More details about XA transactions can be found in the Applicactions Developer Guide - Understanding Transactions in MarkLogic Server

In a situation where the Resource Manager get's disconnected between the two phases, all transactions may be left in a "prepare" state within MarkLogic Server. The Resource Manager maintains transaction information and will clean up transactions left in "prepare" state after a successful reconnect. In the rare case where this doesn't happen, all transactions left in "prepare" state will stay in the system until they are cleaned up manually. The method to manually intervene is described in the XCC Developers Guide - Heuristically Completing a Stalled Transaction.

In order for a XA transaction to take place, it needs to prepare the execution for the commit. If updates are being made to pre-existing documents, update locks are held against the URIs for those documents. When reindexing is occuring during this process, the reindexer will wait for these locks to be released before it can successfully reindex the new documents.   Because the reindexer is unable to complete due to these pending XA transactions, the hosts in the cluster are unable to completely finish the reindexing task and will eventually throw a timeout error.

Mitigation

To avoid these kind of reindexer timeouts, it is recommended that the database is checked for outstanding XA transactions in "prepare" state before starting a reindexing process. There are two ways to verify if the database has outstanding transactions in "prepare" state:

  • In the Admin UI, navigate  to each forest of the database and review the status page; or
  • Run the following XQuery code (in Query Console):

    xquery version "1.0-ml"; 
    declare namespace fo = "http://marklogic.com/xdmp/status/forest";   

    for $f in xdmp:database-forests(xdmp:database()) 
    return    
      xdmp:forest-status($f)//fo:transaction-coordinator[fo:decision-state = 'prepare']

In the case where there are transactions in the "prepare" state, a roll-back can be executed:

  • In the Admin UI, click on the "rollback" link for each transaction; or
  • Run the following XQuery code (in Query Console):

    xquery version "1.0-ml"; 
    declare namespace fo = "http://marklogic.com/xdmp/status/forest";

    for $f in xdmp:database-forests(xdmp:database()) 
    return    
      for $id in xdmp:forest-status($f)//fo:transaction-coordinator[fo:decision-state = 'prepare']/fo:transaction-id/fn:string()
      return
        xdmp:xa-complete($f, $id, fn:false(), fn:false())

Introduction

Query Console is an interactive web-based query development tool for writing and executing ad-hoc queries in XQuery, Server-Side JavaScript, SQL and SPARQL. Query Console enables you to quickly test code snippets, debug problems, profile queries, and run administrative XQuery scripts.  Query Console uses workspaces to assist users with organizing queries.  A user can have multiple workspaces, and each workspace can have multiple queries.

Issue

In MarkLogic Server v9.0-11, v10.0-3 and earlier releases, users may experience delays, lag or latency between when a key is pressed on the keyboard, and when it appears in the Query Console query window.  This typically happens when there are a large number of queries in one of the users workspaces.

Workaround

A workaround to improve performance is to reduce the number of queries in each workspace.  The same number of queries can be managed by increasing the number of workspaces and reducing the number of queries in each workspace.  We suggest keeping no more than 30 queries in a workspace to avoid these latency issues.  

The MarkLogic Development team is looking to improve the performance of Query Console, but at the time of this writing, this performance issue has not yet been resolved. 

Further Reading

Query Console User Guide

Summary

When attempting to start MarkLogic Server on older versions of Linux (Non-supported platforms), a "Floating Point Exception" may prevent the server from starting.

Example of the error text from system messages:

kernel: MarkLogic[29472] trap divide error rip:2ae0d9eaa80f rsp:7fffd8ae7690 error:0

Detail

Older Linux kernels will, by default, utilize older libraries.  When a software product such as MarkLogic Server is built using a newer version of gcc, it is possible that it will fail to execute correctly on older systems.  We have seen it in cases where the glibc library is out of date, and not containing certain symbols that were added in newer versions. Refer to the RedHat bug that explains this issue: https://bugzilla.redhat.com/show_bug.cgi?id=482848

The recommended solution is to upgrade to a newer version of your Linux distribution.  While you may be able to resolve the immediate issue by only upgrading the glibc library, it is not recommended.

Problem:

The errors 'XDMP-MODNOTFOUND - Module not found' and 'XDMP-NOPROGRAM - Server unable to build program from request' may occur when the requested module does not exist or the user does not have the right permissions on the module.

Solution:

When either of these errors is encountered, the first step would be to check if the requested XQuery/JS module is actually present in the modules database. Make sure the the document uri matches the 'root' of the relevant app-server.

'Modules' field of the app-server configuration specifies the name of the database in which this app-server locates the application code (if it is not set to 'File-system'). When it is set to a specific database, then only documents in that database whose URI begin with the specified root directory are executable. For example, if 'root'  of the database is set to "/codebase/xquery/", then only documents in the database which start with this uri "/codebase/xquery/" are executable.

If set to 'File-system' make sure the requested module exists in the location specified in the 'root' directory of the app-server. 

Defining a 'File-system' location is often used on single node DEV systems but not recommended on a clustered environment. To keep the deployment of code simple it is recommended to use a Modules database in clustered production system.

Once you made sure that the module does exist, the next step is to check if the user has the right permissions to execute the database. More often, it is likely that the error is caused because of a permissions issue.

(i) Check app-server privileges

The 'privilege' field in the app-server configuration, when set, specified the execute privilege required to access the server. Only users who are assigned this privilege can access the server and the application code. Absence of this privilege may cause the XDMP-NOPROGRAM error.

Make sure the user accessing the app-server has the specified privileges. This can be checked by using sec:user-privileges() (Should be run against the Security database).

The documentation here - http://docs.marklogic.com/guide/admin/security#id_63953 contains more detailed information about privileges.

(ii) Check permission on the requested module

The user trying to access the application code/modules is required to have the 'execute' permission on the module. Make sure all the xquery documents have 'read' and 'execute' permissions for the user trying to access them. This can be verified by executing the following query against your 'modules' database:

                 xdmp:document-get-permissions("/your-module")

This returns a list of permission on the document - with the capability that each role has, in the below format:

              <sec:permission xmlns:sec="http://marklogic.com/xdmp/security">
              <sec:capability>execute</sec:capability>
              <sec:role-id>4680733917602888045</sec:role-id>
              </sec:permission>
              <sec:permission xmlns:sec="http://marklogic.com/xdmp/security">
              <sec:capability>read</sec:capability>
              <sec:role-id>4680733917602888045</sec:role-id>
              </sec:permission>

You can then map the role-ids to their role names as below: (this should be done against the Security database)

              import module namespace sec="http://marklogic.com/xdmp/security" at "/MarkLogic/security.xqy";
              sec:get-role-names((4680733917602888045))

If you see that the module does not have execute permission for the user, the required permissions can be added as below: (http://docs.marklogic.com/xdmp:document-add-permissions)

             xdmp:document-add-permissions("/document/uri.xqy",

              (xdmp:permission("role-name","read"),
             xdmp:permission("role-name", "execute")))

 

 

     

 

 

 

Problem Statement : AWS has updated the lambda python runtime version to python:3.9.v19 in us-east regions and it fails to satisfy some dependencies that we package with our Managed Cluster Lambda code and fails to create the Managed ENI stack and also NodeManager stack. Stack creation works perfectly fine in other AWS region (us-west-2, eu-central-1) as lambda runtime still uses python:3.9.v18

Proposed Solution: 

  1. For the newly creating Clusters that use custom templates with ML Managed ENI and NodeManager as reference. Below is what needs to be changed.

Managed ENI and NodeManager Template Reference: (Code highlighted in blue need to be added, region "us-east-2" should be edited based on the region where stack is created)

Managed ENI

ManagedEniFunction:
    Type: 'AWS::Lambda::Function'
    DependsOn: ManagedEniExecRole
    Properties:
      Code:
        S3Bucket: !Ref S3Bucket
        S3Key: !Join ['/', [!Ref S3Directory,'managed_eni.zip']]
      Handler: managedeni.handler
      Role: !GetAtt [ManagedEniExecRole, Arn]
      Runtime: python3.9
      RuntimeManagementConfig:
        RuntimeVersionArn: 'arn:aws:lambda:us-east-2::runtime:edb5a058bfa782cb9cedc6d534ac8b8c193bc28e9a9879d9f5ebaaf619cd0fc0'
        UpdateRuntimeOn: 'Manual'
      Timeout: '180'

NodeManager

NodeManagerFunction:
    Type: 'AWS::Lambda::Function'
    DependsOn: NodeManagerExecRole
    Properties:
      Code:
        S3Bucket: !Ref S3Bucket
        S3Key: !Join ['/', [!Ref S3Directory,'node_manager.zip']]
      Handler: nodemanager.handler
      Role: !GetAtt [NodeManagerExecRole, Arn]
      Runtime: python3.9
      RuntimeManagementConfig:
        RuntimeVersionArn: 'arn:aws:lambda:us-east-2::runtime:edb5a058bfa782cb9cedc6d534ac8b8c193bc28e9a9879d9f5ebaaf619cd0fc0'
        UpdateRuntimeOn: 'Manual'
      Timeout: '180'

2. For the newly creating clusters with default lambda templates that are offered by MarkLogic "ml-managedeni.template" and "ml-nodemanager.template". Marklogic Team patched the templates already. It will be from 10.0-9.2 to 10.0-9.5 and 11.0.0 to 11.0.2. For any ML 10 older versions customers needs to raise support ticket and we will address it.

3. For the customers who have existing stack and perform upgrades on regular basis. Please follow the below steps on the existing managedENI and NodeManager Lambda functions manually one time before performing any upgrades.

Look for Managed ENI function AWS Lambda console in the region where stack was deployed

Under Runtime Settings → Edit runtime management configuration

Select Manual option and input the ARN of the previous runtime  python:3.9.v18(arn:aws:lambda:us-west-2::runtime:edb5a058bfa782cb9cedc6d534ac8b8c193bc28e9a9879d9f5ebaaf619cd0fc0). Region must be edited accordingly based on where your lambda function is located.

Repeat the same steps for the NodeManager Lambda function as well and save it before performing any upgrades.

Introduction

Sometimes, when a host is removed from a cluster in an improper manner -- e.g., by some means other than the Admin UI or Admin API, a remote host can still try to communicate with its old cluster, but the cluster will recognize it as a "foreign IP" and will log a message like the one below:

2014-12-16 00:00:20.228 Warning: XDQPServerConnection::init(10.0.80.7:7999-10.0.80.39:44247): SVC-SOCRECV: Socket receive error: wait 10.0.80.7:7999-10.0.80.39:44247: Timeout

Explanation: 

XDQP is the internal protocol that MarkLogic uses for internal communications amongst the hosts in a cluster and it uses port 7999 by default. In this message, the local host 10.0.80.7 is receiveng socket connections from foreign host 10.0.80.39.

 

Debugging Procedure, Step 1

To find out if this message indicates a socket connection from an IP address that is not part of the cluster, the first place is to look is in the hosts.xml files. If the IP address in not found in the hosts.xml, then it is a foreign IP. In that case, the following are the steps will help to identify the the processes that are listening on port 7999.

 

Debugging Procedure, Step 2

To find out who is listening on XDQP ports, try running the following command in a shell window on each host:

      $ sudo netstat -tulpn | grep 7999

You should only see MarkLogic as a listner:

     tcp 0 0 0.0.0.0:7999 0.0.0.0:* LISTEN 1605/MarkLogic

If you see any other process listening on 7999, yopu have found your culprit. Shot down those processes and the messages will go away.

 

Debugging Procedure, Step 3

If the issue persists, run tcpdump to trace packets to/from "foreign" hosts using the following command:

     tcpdump -n host {unrecognized IP}

Shutdown MarkLogic on those hosts. Also, shutdown any other applications that are using port 7999.

 

Debugging Procedure, Step 4

If the cluster are hosts on AWS, you may also want to check on your Elastic Load Balancer ports. This may be tricky, because instances will change IP addresses if they are rebooted, so  work with AWS Support to help you find the AMI or load balancer instance that is pinging your cluster.

In the case that the "foreign host" is an elastic load balancer, be sure to remove port 7999 from its rotation/scheduler. In addition, you should set the load balancer to use port 7997 for the heartbeat functionality.

Introduction

Sometimes, when a cluster is under heavy load, your cluster may show a lot of XDQP-TIMEOUT messages in the error log. Often, a subset of hosts in the cluster may become so busy that the forests they host get unmounted and remounted repeatedly. Depending on your database and group settings, the act of remounting a forest may be very time-consuming, due to the fact that that all hosts in the cluster are being forced to do extra work of index detection.

Forest Remounts

Every time a forest remounts, the error log will show a lot messages like these:

2012-08-27 06:50:33.146 Debug: Detecting indexes for database my-schemas
2012-08-27 06:50:33.146 Debug: Detecting indexes for database Triggers
2012-08-27 06:50:35.370 Debug: Detected indexes for database Last-Login: sln
2012-08-27 06:50:35.370 Debug: Detected indexes for database Triggers: sln
2012-08-27 06:50:35.370 Debug: Detected indexes for database Schemas: sln
2012-08-27 06:50:35.370 Debug: Detected indexes for database Modules: sln
2012-08-27 06:50:35.373 Debug: Detected indexes for database Security: sln
2012-08-27 06:50:35.485 Debug: Detected indexes for database my-modules: sln
2012-08-27 06:50:35.773 Debug: Detected indexes for database App-Services: sln
2012-08-27 06:50:35.773 Debug: Detected indexes for database Fab: sln
2012-08-27 06:50:35.805 Debug: Detected indexes for database Documents: ss, fp

... and so on ...

This can go on for several minutes and will cost you more down time than necessary, since you already know the indexes for each database.

Improving the situation

Here are some suggestions for improving this situation:

  1. Browse to Admin UI -> Databases -> my-database-name
  2. Set ‘index detection’ to ‘none’
  3. Set ‘expunge locks’ to ‘none’

Repeat steps 1-4 for all active databases.

Now tweak the group settings to make the cluster less sensitive to an occasional busy host:

  1. Browse to Admin UI -> Groups -> E-Nodes
  2. Set ‘xdqp timeout’ to 30
  3. Set ‘host timeout’ to 90
  4. Click OK to make this change effective.

The database-level changes tell the server to speed up cluster startup time when a server node is perceived to be offline. The group changes will cause the hosts on that group to be a little more forgiving before declaring a host to be offline, thus preventing forest unmounting when it's not really needed.

If after performing these changes, you find that you are still experiencing XDQP-TIMEOUT's, the next step is to contact MarkLogic Support for assistance. You should also alert your Development team, in case there is a stray query that is causing the data nodes to gather too many results.

Related Reading

XML Data Query Protocol (XDQP)

Summary

When configuring a server to add a foreign cluster you may encounter the following error:

Forbidden
Host does not match origin or inferred origin, or is otherwise untrusted.

This error will typically occur when using MarkLogic Server versions prior to 10.0-6, in combination with Chrome versions newer than 84.x.

Our recommendation to resolve this issue is to upgrade to MarkLogic Server 10.0-6 or newer. If that is not an option, then using a different browser, such as Mozilla Firefox, or downgrading to Chrome version 84.x may also resolve the error.

Changes to Chrome

Starting in version 85.x of Chrome, there was a change made to the default Referrer-Policy, which is what causes the error. The old default was no-referrer-when-downgrade, and the new value is strict-origin-when-cross-origin. When no policy is set, the browser's default setting is used. Websites are able to set their own policy, but it is common practice for websites to defer to the browser's default setting.

A more detailed description can be found at developers.google.com

Introduction

For hosts that don't use a standard US locale (en_US) there are instances where some lower level calls will return data that cannot be parsed by MarkLogic Server. An example of this is shown with a host configured with a different locale when making a call to the Cluster Status page (cluster-status.xqy):

lexval-exception.gif

The problem

The problem you have encountered is a known issue: MarkLogic Server uses a call to strtof() to parse the values as floats:

http://linux.die.net/man/3/strtof

Unfortunately, this uses a locale-specific decimal point. The issue in this environment is likely due to the Operating System using a numeric locale where the decimal point is a comma, rather then a period.

Resolving the issue

The workaround for this is as follows:

1. Create a file called /etc/marklogic.conf (unless one already exists)

2. Add the following line to /etc/marklogic.conf:

export LC_NUMERIC=en_US.UTF-8

After this is done, you can restart the MarkLogic process so the change is detected and try to access the cluster status again.

Summary

Hung messages in the ErrorLog indicate that MarkLogic Server was blocked while waiting on host resources, typically I/O or CPU. 

Debug Level

The presence of Debug-level Hung messages in the ErrorLog does not indiciate a critical problem, but it does indicate that the server is under load and intermittently unresponsive for some period of time. A server that is logging Debug-level Hung messages should be closely monitored and the reason(s) for the hangs should be understood.  You'll get a debug message if the hang time is greater than or equal to the Group's XDQP timeout. 

Warning Level

When the duration of the Hung message is greater than or equal to two times the Group's XDQP timeout setting, the Hung message will appear at the Warning log level. Consequently, if the host is unresponsive to the rest of the cluster (that is, they have not received a heartbeat for the group's host timeout number of seconds), it may trigger a failover.

Common Causes

Hung messages in the ErrorLog have been traced back to the following root causes:

  • MarkLogic Server is installed on a Virtual Machine (VM), and
    • The VM does not have sufficient resources provisioned for peak use; or
    • The underlying hardware is not provisioned with enough resources for peak use.
  • MarkLogic Server is using disk space on a Storage Area Network (SAN) or Network Attached Storage (NAS) device, and
    • The SAN or NAS is not provisioned to handle peak load; or
    • The network that connects the host to the storage system is not provisioned to handle peak load.
  • Other enterprise level software is running on the same hardware as MarkLogic Server. MarkLogic Server is designed with the assumption that it is running on dedicated hardware.
  • A file backup or a virus scan utility is running against the same disk where forest data is stored, overloading the I/O capabilities of the storage system.
  • There is insufficient I/O bandwidth for the merging of all forests simultaneously.
  • Re-indexing overloads the I/O capabilities of the storage system.
  • A query that performs extremely poorly, or a number of such queries, caused host resource exhaustion.

Forest Failover

If the cause of the Hung message further causes the server to be unresponsive to cluster heartbeat requests from other servers in the cluster, for a duration greater than the host timeout, then the host will be considered unavailable and will be voted out of the cluster by a quorum of its peers.  If this happens, and failover is configured for forests stored on the unresponsive host, the forests will fail over.  

Debugging Tips

Look at system statistics (such as SAR data) and system logs from your server for entries that occurred during the time-span of the Hung message.  The goal is to pinpoint the resource bottleneck that is the root cause.

Provisioning Recommendation

The host on which MarkLogic Server runs needs to be correctly provisioned for peak load. 

MarkLogic recommends that your storage subsystem simultaneously support:

  •     20MB/s read throughput, per forest
  •     20MB/s write throughput, per forest

We have found that customers who are able to sustain these throughput rates have not encountered operational problems related to storage resources.

Configuration Tips

If the Hung message occurred during a I/O intensive background task (such as database backup, merge or reindexing), consider setting of decreasing the backgound IO Limit - This group level configuration controls the I/O resources that background I/O tasks will consume.

If the Hung message occurred during a database merge, consider decreasing the merge priority in the database’s Merge Policy.  For example, if the priority is set to "normal", then try decreasing it to "lower".

 

Summary

There are scenarios where you may want to restore a database from a MarkLogic Server backup that was taken from a database on a different cluster. 

Examples

Two example scenarios where this may be appropriate:

- For development or testing purposes - you may want to take the content from one system to perform development of testing on a different cluster.

- A system failed, and you need to recreate a cluster and restore the database to the last known good state.

Constraints

There are constraints on performing a database restore from a MarkLogic database backup across clusters

  1. The source and target servers must be the same Operating System.  More specifically, they must be able to use the same MarkLogic Server installation package.
  2. The backups must be accessible from all servers on which a forest in the target database resides.   
  3. The path to the backups must be identical on all of the servers.
  4. The MarkLogic process must have sufficient access credentials to read the files in the backup.
  5. If the number of hosts and/or forests is different, see Restoring a Reconfigured Database.

If running MarkLogic versions prior to 9.0-4 then the following conditions must also be met

  1. The forest names must be identical in both the source database and the target database.
  2. The number of forests in both the source and target databases should be the same.  If the source database has a forest that does not reside on the target, then that forest data will not be included in the target after the database restore is complete.

Note: Differences in index configuration and/or forest order may result in reindexing or rebalancing after the restore is complete

Debugging Problems

If you are experiencing difficulties restoring a database backup, you can validate the backup using xdmp:database-backup-validate, or xdmp:database-incremental-backup-validate:

1. In the Query Console, execute a simple script that validates restoring the backup.  Something like

xquery version "1.0-ml";

let $db-name := "Documents"

let $db-backup-path := "/my-backup-dir/test"

return xdmp:database-restore-validate(

    xdmp:database-forests( xdmp:database($db-name)),

    $db-backup-path)

But with the $db-name and $db-backup-dir set appropriately.  The results will be a backup plan in xml format. Look at both the ‘forest-status’ and ‘directory-status’ for each of the forests.  Both should have the “okay” value.

A common error for the ‘directory-status’ is “non-existent”.  If you get this error, check the following.

- Verify that the backup directory exists on each server in the cluster that has a forest in the database;

- Verify that the backup directory has a “Forests” subdirectory, and the “Forests” directory contains subdirectories for each of the forests that reside on the Server.

- For the above directories, subdirectories and file contents, verify that the MarkLogic process has the proper credentials to access them.

2. If xdmp:database-backup-validate, or xdmp:database-incremental-backup-validate does not indicate any errors, then look in the MarkLogic Server’s ErrorLog.txt for entries during the time of the restore for any errors reported.  It is a good idea to set the MarkLogic Server group’s ‘File log level’ to ‘debug’ in order to get detailed error messages.

Helpful Commands:  

On Unix Systems, the following commands may be useful in troubleshooting:

  • Check the 'file system access user ID' for the MarkLogic process
    • ps -A -o fuser,pid,comm | grep MarkLogic
  • View file/directory permissions, owner and group
    • ls -l
  • Change ownership recursively.  In a default installation this should be daemon
    • chown -R daemon.daemon /path/to/Backup
  • Add read and write permissions recursively
    • chmod -R +rw /path/to/Backup

Further Reading

Transporting Resources to a New Cluster

Phases of Backup or Restore Operation

Restoring a Reconfigured Database

Summary

MarkLogic may fail to start, with an XDMP-ENCODING error, Initialization: XDMP-ENCODING: (err:XQST0087) Unsupported character encoding: ascii.  This is caused by a mismatch in the Linux Locale character set, and the UTF-8 character set required by MarkLogic.

Solutions

There are two primary causes to this error. The first is using service instead of systemctl to start MarkLogic on some Linux distros.  The second is related to the Linux language settings.

Starting MarkLogic Service

On an Azure MarkLogic VM, as well as some more recent Linux distros, you must use systemctl, and not service to start MarkLogic. To start the service, use the following command:

  • sudo systemctl start MarkLogic

Linux Language Settings

This issue occurs when the Linux Locale LANG setting is not set to UTF-8.  This can be accomplished by changing the value of LC_ALL to "en_US.UTF-8".  This should be done for the root user for default installations of MarkLogic.  To change the system wide locale settings, the /etc/locale.conf needs to be modified. This can be done using the localectl command.

  • sudo localectl set-locale LANG=en_US.UTF-8

If MarkLogic is configured to run as a non-root user, then setting the locale can be done in the users environment.  Setting the value can be done using the $HOME/.i18n file.  If the file does not exist, please create it and ensure it has the following:

  • export LANG="en_US.UTF-8"

If that does not resolve the issue in the user environment, then you may need to look at setting LC_CTYPE, or LC_ALL for the locale.

  • LC_CTYPE will override the character set part of the LANG setting, but will not change other locale settings.
  • LC_ALL will override both LC_CTYPE and all locale configurations of the LANG setting.

References

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system_administrators_guide/ch-keyboard_configuration

https://access.redhat.com/solutions/974273

https://www.unix.com/man-page/centos/1/localectl/

http://man7.org/linux/man-pages/man1/locale.1.html

Introduction

Rosetta 2 is a seamless, very efficient emulator designed to bridge the transition between Intel and Apple Silicon processors (e.g. M1[x]). The first time you launch a Mac app on an Apple Silicon computer, you might be asked to install the Rosetta component to open it. 
Currently, when installing MarkLogic Server DMG (pkg) on Apple Silicon macOS, you will be blocked by the following error:
“MarkLogic Server ([version]) can’t be installed on this computer.
MarkLogic Server requires an Intel processor.”
The error above is caused by MarkLogic’s macOS system call to verify if it’s running on an Intel processor. This legacy check was required when Apple was transitioning from PowerPC to Intel CPUs (announced in June 2005, Rosetta 1 emulation). MarkLogic Server has never been available for PowerPC-based Apple Computers. In order to install MarkLogic’s Intel package on Apple Silicon, the legacy check has to be removed from the installation script.

Procedure

*1. Open a Terminal [0] and install Rosetta2 emulation software.

$ softwareupdate --install-rosetta

Note: For additional information, please check the official Apple Rosetta 2 article. [1]
[1] https://support.apple.com/en-us/HT211861  
* Step not required if Rosetta 2 is already installed for other Intel-based applications.

2. Download any ML server DMG from the ML-DMC website [2]
[2] https://developer.marklogic.com/products/marklogic-server  

3. Mount the DMG and copy the install package to a writable temporary location in the local filesystem

$ cp -R /Volumes/MarkLogic/ /Users/[your_user_name]/tmp

4. In a Terminal window, edit Contents/Resources/InstallationCheck in a text editor (e.g. vim or nano)

$ vim /Users/[your_username]/tmp/MarkLogic-[downloaded_package_version].pkg/Contents/Resources/InstallationCheck 

Note: As an alternative, in the GUI-Finder, right-click and "Show Package Contents”. Navigate to “Contents/Resources/“, and edit the file “InstallationCheck” with a GUI text editor.

5. Delete or comment out the block starting with (lines 46-52) and save the file “InstallationCheck”:

 46 echo "Checking for Intel CPU"
 47 if [[ $CPU_TYPE != "7" ]] ;
 48    then
 49    echo "MarkLogic Server requires a CPU with an Intel instruction set."
 50    exit 114;     # displays message 18
 51 fi
 52 echo "$CPU_NAME is an Intel CPU."Save the file and back out of the folder

6. Install the MarkLogic package from the GUI Finder or CLI as intended. [3]
[3] https://docs.marklogic.com/guide/installation/procedures#id_28962 

Conclusions
• The procedure in this knowledge base article allows to install MarkLogic Server on macOS Rosetta2 - Apple Silicon M1 / M[x]. 
• MacOS is supported for development only. Conversion (Office and PDF) and entity enrichment are not available on macOS. [4] 
• The legacy installation check is removed starting from MarkLogic 10.0-10+ release.
• Once the legacy check is removed, Rosetta 2 emulation software will be still required till an official native M1 / M[x] MarkLogic Server package will be available.

References
[0] https://support.apple.com/guide/terminal/open-or-quit-terminal-apd5265185d-f365-44cb-8b09-71a064a42125/ 
[1] https://support.apple.com/en-us/HT211861  

[2] https://developer.marklogic.com/products/marklogic-server  
[3] https://docs.marklogic.com/guide/installation/procedures#id_28962 
[4] https://docs.marklogic.com/guide/installation/intro#id_63469 

SUMMARY:

Prior to MarkLogic 4.1-5, role-ids were randomly generated.  We now use a hash algothm that ensures that roles created with the same name will be assigned the same role-id.  When attempting to migrate data from a forest created prior to MarkLogic 4.1-5 to a newer installation can cause the user to be met with a "role not defined error".  In order to work around this issue, we will need to create a new role with the role-id defined in the legacy system. 

Procedure:

This process creates a new role with the same role-id from your legacy installation and assigns this old role to your new role with the correct name.

Step 1: You will need to find the role-id of the legacy role. This will need to be run against the security DB on the legacy server. 

<code>

xquery version "1.0-ml";
import module namespace sec="http://marklogic.com/xdmp/security" at
"/MarkLogic/security.xqy";

let $role-name := "Enter Roll Name Here" 

return
/sec:role[./sec:role-name=$role-name]/sec:role-id/text()

</code>


Step 2: In the new environment, store the attached module to the following location on the host containing the security DB.

/opt/MarkLogic/Modules/role-edit/create-master-role.xqy

Step 3: Ensure that you have created the role on the new cluster.

Step 4: Run the following code against the new clusters security DB. This will create a new role with the legacy role-id. Be sure to enter the role name, description, and role-id from Step 1.

<code>
xquery version "1.0-ml";
import module namespace cmr="http://role-edit.com/create-master-role" at
"/role-edit/create-master-role.xqy";

let $role-name := "ENTER ROLE NAME"
let $role-description := "ENTER ROLE DESCRIPTION"
let $legacy-role-id := 11658627418524087702 (: Replace this with the Role ID from Step 1:)

let $legacy-role := fn:concat($role-name,"-legacy")
let $legacy-role-create := cmr:create-role-with-id($legacy-role, $role-description, (), (), (), $legacy-role-id)

return
fn:concat("Inserted role named ",$legacy-role," with id of ",$legacy-role-id)

</code>


Step 5: Run the following code against the new clusters security database to assign the legacy role to the new role.

<code>
xquery version "1.0-ml";
import module namespace sec="http://marklogic.com/xdmp/security" at
"/MarkLogic/security.xqy";

let $role-name := "ENTER ROLE NAME"
let $legacy-role := fn:concat($role-name,"-legacy")

return
(
sec:role-set-roles($role-name, ($legacy-role)),
"Assigned ",$legacy-role," role to ",$role-name," role"
)

</code>

 

You should now have a new role named [your-role]-legacy.  This legacy role will contain the role-id from your legacy installation and will be assigned to [your-role] on the new installation.  Legacy documents in your DB will now have the same rights they had in the legacy system.

Summary

There is a limit to the number of registered queries held in the forest registry.  If your application does not account for that fact, you may get unexpected results. 

Where is it?

If a specific registered query is not found, then a cts:search operation with an invalid cts:registered-query throws an XDMP-UNREGISTERED exception. The XDMP-UNREGISTERED error occurs when a query could not be found in a forest query registry. If a query that had been previously registered can not be found, it may have been discarded automatically.  (In the most recent versions of MarkLogic Server at the time of this writing) The forest query registry only contains up to about 48,000 of the most recently used registered queries. If you register more than that, the least recently used ones get discarded.

Recommendation

To avoid registered queries being dropped, it’s a good idea to unregister queries when you know they aren’t needed any more.

Range indexes and invalid values

We will discuss range index type casting and the behavior based the invalid-values setting.

Casting values

We can cast a string to an unsignedLong as
xs:unsignedLong('4235234')
and the return is 4235234 as an unsignedLong.  However, if we try
xs:unsignedLong('4235234x')
it returns an error
XDMP-CAST: (err:FORG0001) xs:unsignedLong("4235234x") -- Invalid cast: "4235234x" cast as xs:unsignedLong
Similarly,
xs:unsignedLong('')
returns an error
XDMP-CAST: (err:FORG0001) xs:unsignedLong("") -- Invalid cast: "" cast as xs:unsignedLong
This same situation can arise when a document contains invalid values.  The invalid-values setting on the range index determines what happens in the case of a value that can't be cast to the type of the range index.

Range indexes---values and types

Understanding Range Indexes discusses range indexes in general, and Defining Element Range Indexes discusses typed values.
Regarding the invalid-values parameter of a range index:
In the invalid values field, choose whether to allow insertion of documents that contain elements or JSON properties on which range index is configured, but the value of those elements cannot be coerced to the index data type. You can choose either ignore or reject. By default, the server rejects insertion of such documents. However, if you choose ignore, these documents can be inserted. This setting does not change the behavior of queries on invalid values after documents are inserted into the database. Performing an operation on an invalid value at query time can still result in an error.

Behavior with invalid values

Create a range index

First, create a range index of type unsignedLong on the id element in the Document database:
import module namespace admin = "http://marklogic.com/xdmp/admin"
    at "/MarkLogic/admin.xqy";
let $config := admin:get-configuration()
let $dbid := xdmp:database('Documents')
let $rangespec := admin:database-range-element-index('unsignedLong', '', 'id', (), fn:false())
return
     admin:save-configuration (admin:database-add-range-element-index($config, $dbid, $rangespec))

Insert a document with a valid id value

We can insert a document with a valid value:
xdmp:document-insert ('test.xml', <doc><id>4235234</id></doc>)
Now if we check the values in the index as
cts:values (cts:element-reference (xs:QName ('id')))
we get the value 4235234 with type unsignedLongWe can search for the document with that value as
cts:search (/, cts:element-range-query (xs:QName ('id'), '=', 4235234), 'filtered')
and the document is correctly returned.

Insert a document with a invalid id value

With the range index still set to reject invalid values, we can try to insert a document with a bad value
xdmp:document-insert ('test.xml', <doc><id>4235234x</id></doc>)
That gives an error as expected:
XDMP-RANGEINDEX: xdmp:eval("xquery version &quot;1.0-ml&quot;;&#10;xdmp:document-insert ('te...", (), <options xmlns="xdmp:eval"><database>16363513930830498097</database>...</options>) -- Range index error: unsignedLong fn:doc("test.xml")/doc/id: XDMP-LEXVAL: Invalid lexical value "4235234x"

and the document is not inserted.

Setting invalid-values to ignore and inserting an invalid value

Now we use the Admin UI to set the invalid-values setting on the range index to ignore.  Inserting a document with a bad value as
xdmp:document-insert ('test.xml', <doc><id>4235234x</id></doc>)
now succeeds.  But remember, as mentioned above, "... if you choose ignore, these documents can be inserted. This setting does not change the behavior of queries on invalid values after documents are inserted into the database. Performing an operation on an invalid value at query time can still result in an error."

Values.  Checking the values in the index 

cts:values (cts:element-reference (xs:QName ('id')))
does not return anything.
Unfiltered search.  Searching unfiltered for a value of 7 as
cts:search (/, cts:element-range-query (xs:QName ('id'), '=', xs:unsignedLong (7)), 'unfiltered')
returns our document (<doc><id>4235234x</id></doc>).  This is a false positive.  When you insert document with an invalid value, that document is returned for any search using the index.
Filtered search.  We can search filtered for a value of 7 to see if the false positive can be removed from the results:
cts:search (/, cts:element-range-query (xs:QName ('id'), '=', xs:unsignedLong (7)), 'filtered')
throws an error 

XDMP-CAST: (err:FORG0001) cts:search(fn:collection(), cts:element-range-query(fn:QName("","id"), "=", xs:unsignedLong("7")), "filtered") -- Invalid cast: xs:untypedAtomic("4235234x") cast as xs:unsignedLong

That's because when the document is used in filtering, the invalid value is cast to match the query and it throws an error as in the earlier cast test.

Adding a new index and reindexing

If you have documents already in the database, and add an index, the reindexer will automatically reindex the documents.

If there are invalid values for one of your indexes index then the reindexer will reindex the document but will issue a Debug-level message about the problem:

2023-06-26 16:44:28.646 Debug: IndexerEnv::putRangeIndex: XDMP-RANGEINDEX: Range index error: unsignedLong fn:doc("/test.xml")/doc/id: XDMP-LEXVAL: Invalid lexical value "4235234x"

The reindexer will not reject or delete the document.  You can use this URI given to find the document and correct the issue.  

Finding documents with invalid values

Since documents with invalid values always are returned by searches, you can use this to find the documents by doing an and-query of two searches that are normally mutually exclusive.  For the document with the invalid value, 

cts:uris ((), (),
    cts:and-query ((
        cts:element-range-query (xs:QName ('id'), '=', 7),
        cts:element-range-query (xs:QName ('id'), '=', 8)
    ))
)

returns /test.xml.

Summary

Disk utilization is an important part of the hosts ecosystem.  The results of filling the file system can have disastrous effects on server performance and data integrity.  It is very important to ensure that your host always has an appropriate amount of free disk space. 

Detection

When the file system runs out of space there will be a dramatic decrease in query performance and merges will cease.  There will also be a number entries in the ErrorLog.txt file that look like these:

SVC-FILWRT: File write error: write 'filename': No space left on device

Error in merge of forest [Forest-Name]: XDMP-MERGESPACE: Not merging due to disk space limitations, need=xxxMB, have=xxxMB

Mitigation

The best practice is to ensure that the total physical disk space available is sufficient to store all your forest data.  If you happen into a situation where you are dangerously low on disk space the following methods can be used to correct the situation.

  1. Move/Remove any unwanted files from the file system. This might include cleaning up log files that have grown very large.
  2. Add additional storage to the host.  This may require that you move the location of forest data.  Please see Moving Forests Across Storage Devices for more details.

In the event that the data directory containing the forest data and the directory containing the MarkLogic config files are on the same partition (which is common on Windows installations), it is possible to encounter a unique situation. In this situation, if the file system is completely full, MarkLogic will be unable to write config files.  If this is the case, you will not be able to perform any task in the Admin UI.  For this situation you will need to manually move your forest data if you cannot free enough disk space to allow for writing of configuration files.  This is a special case and is not recommended for situation where the Admin UI can be used.  Please refer to "Moving Forests Across Storage Devices" for our recommended process.

Step 1. Stop MarkLogic

Step 2. Move the forest data to another location.  Be sure to maintain permissions assigned to the forest data.

Step 3. Start MarkLogic

Step 4.  Detach your forest from the DB

  • In the Admin UI navigate to Configure -> Databases -> [Your-Database] -> Forests
  • Uncheck the box next to your forest and click "OK".

Step 5. Create a new forest, specifying the new storage location

  • In the Admin UI navigate to Configure -> Forests
  • Click the "Create" tab and fill in the appropriate information

Step 6. Copy all data from the old forest directory into the new forest directory

Step 7.  Attach the new forest to the DB

  • In the Admin UI navigate to Configure -> Database -> [Your-Database] -> Forests
  • Check the box next tot the newly created database and click "ok".

Step 8. Ensure there were no errors while mounting the new forest

  • In the Admin UI navigate to Configure -> Database -> [Your-Database] and click the status tab
  • It is also a good idea to look for any errors in the error log (/var/opt/MarkLogic/Logs/ErrorLog.txt)

Step 9. Provided there were no issues with the new forest, you can now delete the old forest

  • In the Admin UI navigate to Configure -> Forests -> [Old-Forest]
  • On the "Configure" tab select delete.
  • The old forest data will need to be deleted manually.

Database and Forest Size References:

MarkLogic Installation Guide: Disk Space Requirements 

Knowledge Base Article: Beginning in MarkLogic 7, the 3x disk space requirement can be reduced if configured and managed.

 

 

Summary

The XDMP-LABELBADMAGIC error appears when attempting to mount a forest with a corrupted or zero length Label file.  This article identifies a potential cause and provides the steps required to work around this issue.

Details

The XDMP-LABELBADMAGIC error is often seen on systems where the server was running out of disk space.  If there is no space for MarkLogic Server to write the forest's Label file, a zero length Label file may result. The side effect of that would be the XDMP-LABELBADMAGIC error.

Below is an example showing how this error might appear in ErrorLog.txt when the Triggers forest has a zero length Label file.

2013-03-21 13:02:11.835 Alert: XDMP-FORESTERR: Error in mount of forest Triggers: XDMP-LABELBADMAGIC: Bad forest label magic number: 0x0 instead of 0x1020304

2013-03-21 13:02:11.835 Error: NullAssignment::localMount: XDMP-LABELBADMAGIC: Bad forest label magic number: 0x0 instead of 0x1020304

In order to recover from this error, you will need to manually remove the bad Label file.  Removing the Label file will force MarkLogic Server to recreate the file and will allow the forest to be mounted.

Steps for recovery:

1. Make sure MarkLogic Server is shutdown on the affected host.

2. Remove the Label file for the forest displaying the error

a. In Linux the default location is "/var/opt/MarkLogic/Forests/[Forest-Name]/Label"

b. In Windows the default location is "c:\Program Files\MarkLogic\Data\Forests\[Forest-Name]\Label"

3. Restart MarkLogic Server.

Summary

Occasionally, while running a query, you may see the following message returned: XDMP-EXPNTREECACHEFULL: Expanded tree cache full.

The expanded tree cache is the memory pool used for MarkLogic's proprietary representation of XML fragments while they are being used during query processing. XML documents are stored in disk as fragments, in a highly compressed form. As XML fragments are needed during query evaluation, they are retrieved from disk in compressed format and cached in the compressed tree cache. When the query needs to actually retrieve elements, values, or otherwise traverse the contents of one of these fragments, the fragment is uncompressed and cached in the expanded tree cache.

Consequently, the expanded tree cache needs to be large enough to maintain a copy of every expanded XML fragment that is simultaneously needed during query processing. (Note that this does not necessarily imply that every fragment used by a given query is needed simultaneously; a lot depends on what a query does and how it is written.) Expanded fragments may be 3-5x the byte count of the original XML

The error message XDMP-EXPNTREECACHEFULL: Expanded tree cache full means that MarkLogic has run out of room in the expanded tree cache during query evaluation, and that consequently it cannot continue evaluating the complete query.

Options

There are four approaches to solving this problem:

  1.  Change the problem query so that it does not need to use as much XML data.

  2.  Tune the problem query so that it does not need to simultaneously cache as much XML.

  3.  Increase the size of the expanded tree cache, using the setting under Groups > Default > Configure

  4.  Ensure that your content is properly fragmented, if appropriate.

Change the problem query

Approach (1) generally means a change in requirements (for instance, returning only 100 results instead of 500 results).

Tune the problem query

Approach (2) requires a very effective knowledge of performance tuning XQuery. For instance, in some cases it's possible for a particular query to process 20GB of XML with only 128 MB of expanded tree cache IF the query is written properly. An initial implementation of that query, however, could easily require a 20 GB expanded tree cache. Typically, our professional services staff is involved in these exercises, but if you want to send us the problem query, we're happy to take a quick look and see if we can give you general advice.

Increase size of expanded tree cache (Restart Required)

Approach (3) will work so long as you have sufficient available memory and the memory required is not over-large, but it may be a band-aid to a problem that should really be fixed through approach (1), such as the 20 GB example outlined.  

Alternatively, if there is not sufficient memory to increase total cache size, you can increase the size of the cache partitions by decreasing the number of partitions. More partitions allow more concurrency, but make each individual cache partition smaller, which could make it more likely for the cache to fill up.  There is a maximum size for the expanded tree cache of 32768 MB (73728 MB as of v7.0-5 and v8.0-2), and each partition should be no more than 8192 MB.

The server will determine the cache settings it uses at startup and log them at Debug level.  For example:

    2022-06-15 08:59:24.614 Debug: Initializing expanded tree cache: 65280 MB, 8 partitions at 8160 MB/partition

This can be used to confirm the size and number of cache partitions in use.  

Note that both of the above solutions do require a cluster restart. As a last resort, if a cluster restart is not desired, you may use Query Console to make a call as follows:

    xdmp:expanded-tree-cache-clear()

Note that this call is undocumented, but it will allow you to clear the cache on a host without a server restart. In a cluster, you will have to call it for each host. Also, please note that it will kill the cache for each host that you invoke it on, which will result in a temporary hit on performance until the cache warms up again.

Ensure content is properly fragmented

Approach (4) reflects the fact that large XML documents can take up a lot of memory during query evaluation, and MarkLogic's fragmentation capabilities are designed with this in mind. Fragmentation allows Mark Logic to load only the needed parts of large documents during query evaluation, thereby reducing memory requirements. Fragmentation does have other ramifications for query evaluation, however, as described in the Developer's Guide. If your expanded tree cache problem occurs while working with large documents, fragmentation may be an appropriate solution. If your problem occurs while working with small documents, fragmentation will not help.

Introduction

There have been a number of reported incidents where database replication has been configured and where the main Schema database on the replica has been used alongside database replication; in a situation where MarkLogic's default Schema database is used for data, replicating the Schemas database to itself on a foreign cluster will cause instability and likely will lead to a system outage that can only be resolved by breaking replication for the Schemas database.

MarkLogic's documentation (pre-ML9) on database replication warns users against this, stating that:

You cannot replicate a Master database that acts as its own schema database. When replicating a Master schema database, create a second empty schema database for the Replica schema database on the Replica cluster.

What happens if I attempt to do this?

In earlier releases of MarkLogic Server, this has been known to cause cluster outage and several support cases have been raised due to this configuration causing undesired - and unexpected - results.

In newer releases of the product (such as MarkLogic 8.0-5 and above), the Admin GUI on port 8001 and our admin APIs will now stop users from making this configuration change in the following ways:

  • In the Admin UI on port 8001, setting up replication between clusters and configuring database replication for the Schemas database should now fail with the message "Cannot setup replication for a database whose schema database is itself"
  • Using the Admin API with a call to error 2. Couple a cluster and call to admin:database-set-foreign-master against the Schemas Database should fail, instead causing an ADMIN-DBREPSCHEMADBSAMEASDB exception to be thrown.

This is still an issue for MarkLogic 7 and earlier releases, so it's important to always ensure that you are using a separate database for your replica as advised in our documentation.

Best Practice: Always separate out Schemas Databases where necessary

If your application makes use of Schemas, create a completely separate schemas database for your application.  Doing this keeps your application self contained and allows you to replicate it as you would with any other database.

Introduction

This article is intended to address the impact of AWS deprecation of Python 3.6 (Lambda runtime dependency) and Classic Load Balancer (CLB) on MarkLogic Cloud Formation Templates (CFT).

Background

AWS announced deprecation of Python 3.6 and Classic Load Balancer (CLB).

For Python 3.6, please refer to 'Runtime end of support dates'.
For Classic Load Balancer, please refer to 'Migrate Classic Load Balancer'.

MarkLogic 10 provided CFTs prior to 10.0-9.2 are impacted by the python 3.6 deprecation as MarkLogic uses custom lambdas. CFTs prior to 10.0-9.2 are also impacted by the CLB deprecation since the MarkLogic single-host deployment uses CLB.

Solutions

1. Upgrade to latest MarkLogic CFT templates:

Starting with release of 10.0-9.2, MarkLogic CFT uses python 3.9 and has removed CLB for single-host deployments.

The fully-qualified domain name (FQDN) of the node is based on internal IP address from the persistent reusable ENI. In single-host cluster without CLB, the FQDN for the node is referenced in the list of outputs as the endpoint to access Admin UI. For example, http://ip-10.x.x.x.ap-southeast-2.compute.internal:8001.

For a single-host cluster in a private subnet, client residing in public domain will not be able to connect to single host directly. Your AWS Administrator will be required to set up a bastion host (jump box) or a reverse proxy, which acts as an addressable middle-tier to route traffic to the MarkLogic host. Alternatively, your Administrator can assign an Elastic IP to single-host which makes the host publicly accessible.

2. Running with MarkLogic prior to 10.0-9.2

2.1: Modify MarkLogic's most current CFT.

You can use the latest version of the MarkLogic CFT, and then change the MarkLogic AMI version inside that CFT to refer to specific prior version of MarkLogic AMI.

2.2: Customized CFT (derived from MarkLogic CFT but with specific modification).

You can modify your copy of template to upgrade to Python 3.9 and remove the use of CLB.

a) To upgrade the Python changes: Please refer to the custom lambda templates (ml-managedeni.template, ml-nodemanger.template) and search for "python3.6" and replace it with "python3.9".

Format to build the URL: https://marklogic-db-template-releases.s3.<<AWS region>>.amazonaws.com/<<ml-version>>/ml-nodemanager.template

Download v10.0-7.1 custom lambda templates for upgrade using below links:

https://marklogic-db-template-releases.s3.us-west-2.amazonaws.com/10.0-7.1/ml-managedeni.template

https://marklogic-db-template-releases.s3.us-west-2.amazonaws.com/10.0-7.1/ml-nodemanager.template

 

After the changes are done, the modified templates should be uploaded to the s3 bucket. Also, the 'TemplateURL' should be updated in the main CFTs (mlcluster-vpc.template, mlcluster.template) under 'Resources' -> ManagedEniStack, 'Resources' -> NodeMgrLambdaStack.

 

b) To remove the CLB changes: Please refer to the latest CFT version (mlcluster-vpc.template, mlcluster.template) and compare/modify the templates accordingly.

c) To upgrade the Python version existing old stack without redeployment: Please navigate to the AWS Lambdas console (Lambda->Functions->ActivityManager-Dev->Edit runtime setting) and update the runtime to use "Python 3.9".

AWS deprecation does not impact already deployed stack, since the Lambda functions are created during service creation (and only deleted when the service is terminated). Similarly, updating the cluster capacity does not have impact on existing deployed stack.

MarkLogic Cloud Services (DHS)

The issue is already addressed by the MarkLogic Cloud Services team with an upgrade of underlying dependency to "Python 3.9".

MarkLogic 9

Please Note that this Knowledgebase article refers to MarkLogic 10 Cloud Formation Template changes alone. For MarkLogic 9 Cloud Formation templates, work on recommended Solutions is still in progress. 

References

  1. MarkLogic 10.0-9.2 Release Notes Addendum
  2. Latest MarkLogic CFT

Summary

When attempting to send email from MarkLogic, from Ops Director, Query Console, or other application, you might encounter one of the following errors in your MarkLogic Server Error Log, or in the Query Console results pane.

  • Error sending mail: STARTTLS: 502 5.5.1 Error: command not implemented
  • Error sending mail: STARTTLS: 554 5.7.3 Unable to initialize security subsystem

This article will help explain what these errors mean, as well as provide some ways to resolve it.

What these Errors Mean

These errors indicate that MarkLogic is attempting to send an SMTPS email through the relay, and the relay either does support SMTPS, or SMTPS has not been configured correctly.

Resolving the Error

One possible cause of this error is when the smtp relay setting for MarkLogic server is set to localhost.  The error can be resolved by using the Admin Interface to update the smtp relay setting with the organizational SMTP host or relay.  That setting can be found under Configure --> Groups --> [GroupName]: Configure tab, then search for 'smtp relay'.

If this error occurs when testing the Email configuration for Ops Director, you can configure Ops Director to use SMTP instead of SMTPS by ensuring the Login and Password fields are blank.  These fields can be found under Console Settings --> Email Configuration in the Ops Director application. (Note: The Ops Director feature has been deprecated with MarkLogic 10.0-5.)

Alternatively, install/configure an SMTP server with SMTPS support.

Related Reading

https://en.wikipedia.org/wiki/SMTPS

https://www.f5.com/services/resources/deployment-guides/smtp-servers-big-ip-v114-ltm-afm

Summary

Some disk related errors, such as SVC-MAPINI, seen on MarkLogic Servers running on the Linux platform can sometimes be attributed to background services attempting to read or monitor MarkLogic data files.

SVC-MAPINI Errors

In some cases when background services are attempting to access MarkLogic data files, you may encounter an error similar to the following:

SVC-MAPINI: Mapped file initialization error: open '/var/opt/MarkLogic/Forests/my-forest-02/0000145a/Timestamps': Operation not permitted

The most common cause of this issue is Anti-Virus software.

Resolution

To avoid file access conflicts, MarkLogic recommends that all MarkLogic data files, typically /var/opt/MarkLogic/, be excluded from access by any background services, which includes AV software. As a general rule, ONLY MarkLogic Server should be maintaining MarkLogic Server data files. If those directories MUST be scanned, then MarkLogic should be shutdown, or the forests be fully quiesced, to prevent issues.

Further Reading

Summary

The try/catch expression allows you to catch and handle exceptions in your XQuery code. Most exceptions can be caught with a try/catch block, but some exceptions are not. This article describes scenarios where exceptions are not caught.

Uncatchable Exceptions:

The following exceptions are not caught by a try/catch block.

  • SVC-CANCELED
  • XDMP-CANCELED
  • SVC-MEMCANCELED
  • XDMP-MEMCANCELED
  • XDMP-DISABLED
  • XDMP-ROLLBACK
  • XDMP-EXLIMIT

Differences between XDMP-CANCELED and SVC-CANCELED:

There is really no difference between XDMP-CANCELED and SVC-CANCELED. Both these messages mean the same thing and are triggered the same way, either by an explicit cancel or implicitly by the client closing the connection while the request is still running.

This message usually indicates that an operation such as a merge, backup or query was explicitly canceled. The message includes information about what operation was canceled. Cancellation may occur through the Admin Interface or by calling an explicit cancellation function, such as xdmp:request-cancel. Also this is what the server normally does when it detects that a query has been canceled or 'abandoned' at the browser level. The server maintains a keep-alive timeout of < 2 minutes and then cancels the query and logs it in the ErrorLog.txt file (if the app-server has log-errors set to true).

Errors that happen during the commit:

The MarkLogic query evaluator looks at all the updates that were requested, and performs them together at the end of the transaction after all the XQuery statements have been evaluated – including after the try/catch expression.  Errors that occur during the commit phase of an update transaction are not caught by a try/catch block.

The workaround for this limitation is to execute the updates in an xdmp:eval() or an xdmp-invoke()with different-transaction isolation, and wrap the eval or invoke inside a try/catch block. In this way, The commit time errors can be caught.

Examples of commit time errors include XDMP-RANGEINDEX.

Static errors (e.g. syntax errors):

 Static (Syntax) errors are seen before the try/catch statements have been evaluated and, as a result, are not caught by the try/catch block

Some system level errors:

Some system level errors (e.g. out of memory) are handled by MarkLogic Server and not exposed to the application.

Lazily Evaluated Exceptions

MarkLogic Server is a lazy evaluator.  Some exceptions are thrown only when you actually look at the result and not when the operation is requested.  In these circumstances, you can force the evaluator to wait for the results to complete inside a try-catch block by using functions such as valueOf() or xdmp:eager(). 

Summary

An XDMP-DBDUPURI error will occur if the same URI occurs in multiple forests of the same database. This article describes some of the conditions under which this can occur and describes a number of strategies to help identify, prevent and fix them.

If you encounter multiple documents returned for one URI without an error, please see Duplicate Documents.

Under normal operating conditions, duplicate URIs are not allowed to occur, but there are ways that programmers and administrators can bypass the server safeguards.

Since duplicate URIs are considered a form of corruption, any query that encounters one will fail and post an error similar to the following:

XDMP-DBDUPURI: URI /foo.xml found in forests Library06 and Library07

We will begin by exploring the different ways that duplicate URIs can be created. Once we understand how this situation can occur, we will discuss how to prevent it from happening in the first place.  We will also discuss ways to resolve the XDMP-DBDUPURI error when it does occur.

How Administrators Can Cause Duplicate URIs

There are several administrative actions that can result in duplicate URIs:

  1. By detaching a forest from its parent database (for administrative purposes - e.g., backup, restore) while allowing updates to continue on the database. If an update is committed to an Uri that exists on the detached forest, the database will create a new Uri on a different forest. When the forest is re-attached to the database, you will have duplicates of these Uris.
  2. By detaching a forest from Database-1 and then attaching it to Database-2. Database-2 may already have some of the URIs that the new forest contains, including directory fragments covering common URI paths (such as "/").
  3. By performing a forest-level restore from forest-a to forest-b, where the database that contains forest-b already has some URIs that also exist in forest-a.

Prevention: the causes and our recommendations

To prevent case #1: Instead of detaching the forest to perform administrative operations, put the forests in read-only mode instead.

You can do this by setting 'updates-allowed' to 'read-only' in the forest settings. This will let the database know that a given URI exists, but will disallow updates on it, thus preventing any duplicates from being created.

Case #2 can be prevented by not using forest attach/detach for content migration between databases.  There are other alternatives such as database replication.

The best way to avoid case #3 is by using database level restore, rather then forest level restore.

If you must use forest restore, make sure to use an Admin API script that double-checks that any given forest backup is being restored to the corresponding restore target. Be sure to test your script thoroughly before making changes in your production and other critical environments.

How Programmers Can Create Duplicate URIs

There are several ways that programmers can create duplicate URIs:

1. By using an xdmp:eval() to insert content with one or more forests set in the database option. We normally check whether a URI exists in all forests before inserting, but xdmp:eval bypasses that safeguard.

2. By using the OUTPUT_FAST_LOAD option in the MapReduce connector (see the mapreduce Javadoc for more details).

3. By loading content with the database 'locking' option set to 'off'.

Prevention: the causes and our recommendations

To prevent case #1, avoid using 'place keys' (specifying a forest in the database option) during document inserts. This will allow the database to decide where the document goes and thereby prevent duplicates. You can also use the API xdmp:document-assign() to figure out where xdmp:document-insert() would place that URI, and then pass that value in to the xdmp:eval()

In reality, while there can be minor performance gains from using in-forest evals ('place keys'), the practice of loading documents into specified forests is generally not advised, so the example code should be seen as an illustration of the process. We do not consider this to be a best practice.

In the in-forest-eval example function below, you can either use a hardcoded forest name:

Or you can call it using the output of the xdmp:document-assign() function, which prevents duplicate URIs:

It is important to note that there is generally no performance advantage in using the manual xdmp:document-assign(); if you're using this in your code, you should consider instead using xdmp:document-insert() as this approach will manage the forest assignment for you.

To prevent case #2, use the default settings for ContentOutputFormat when using the MarkLogic Connector for Hadoop. Here is the explanation from the documentation:

To prevent duplicate URIs, the MarkLogic Connector for Hadoop defaults to a slower protocol for ContentOutputFormat when it detects the potential for updates to existing content. In this case, MarkLogic Server manages the forest selection, rather than the MarkLogic Connector for Hadoop. This behavior guarantees unique URIs at the cost of performance.

You may override this behavior and use direct forest updates by doing the following:

  • Set mapreduce.marklogic.output.content.directory. This guarantees all inserts will be new documents. If the output directory already exists, it will either be removed or cause an error, depending on the value of the mapreduce.marklogic.output.content.cleandir setting.
  • Set mapreduce.marklogic.output.content.fastload to true. When fastload is true, the MarkLogic Connector for Hadoop always optimizes for performance, even if duplicate URIs are possible.

You can safely set mapreduce.marklogic.output.content.fastload to true if the number of forests in the database will not change while the job runs, and at least one of the following is true:

  • Your job only creates new documents. That is, you are certain that the URIs do not exist in any document or property fragments in the database.
  • The URIs output with ContentOutputFormat may already be in use, but both these conditions are true:
  • The in-use URIs were not originally inserted using forest placement.
  • The number of forests in the database has not changed since initial insertion.
  • You have set mapreduce.marklogic.output.content.directory.

For case #3, be sure to use use either the 'fast' or the 'strict' locking option on your target database when loading content. From the documentation:

[This option] Specifies how robust transaction locking should be.

When set to strict, locking enforces mutual exclusion on existing documents and on new documents.

When set to fast, locking enforces mutual exclusion on existing and new documents. Instead of locking all the forests on new documents, it uses a hash function to select one forest to lock. In general, this is faster than strict. However, for a short period of time after a new forest is added, some of the transactions need to be retried internally. When set to off, locking does not enforce mutual exclusion on existing documents or on new documents; only use this setting if you are sure all documents you are loading are new (a new bulk load, for example), otherwise you might create duplicate URIs in the database.

It is OK to use the 'off' setting only if performing a new bulk load onto a fresh database.

Repairing Duplicate URIs

Once you encounter duplicate URIs, you will need to delete them as soon as possible in order to restore functionality to the affected database.

Here are some utility XQuery scripts that will help you to do this work:

1. Script to view the document singled out in the error message.

2. Script to allow you to delete a duplicate document or property.

3. This script helps you delete a duplicate directory.

4. If you need to find duplicate uris, this script will show duplicate documents.

Summary

Deadlocks occur when two transactions are each waiting to acquire a lock and neither can continue until the other releases a lock. The XDMP-DEADLOCK error log message indicates that MarkLogic Server detected a deadlock. When the XDMP-DEADLOCK error log message occurs at the ‘Debug’ log level, the deadlock was successfully resolved by MarkLogic Server. When the XDMP-DEADLOCK error log message occurs at the ‘Notice’ log level, the deadlock was not resolvable by MarkLogic.

Deadlock messages across MarkLogic versions:

MarkLogic 8.0-8.1
2019-10-15 11:10:30.161 Info: XDMP-DEADLOCK: Deadlock detected locking Documents /test.xml
2019-10-15 11:10:27.994 Info: XDMP-DEADLOCK: Deadlock detected locking Documents #10466205726835857951
2019-10-15 11:10:27.992 Info: XDMP-DEADLOCK: Deadlock detected locking Documents #10466205726835857951

MarkLogic 9.0-9.4
2019-10-15 11:14:36.544 Info: Deadlock detected locking Documents /test.xml
2019-10-15 11:14:34.414 Info: Deadlock detected locking Documents #10466205726835857951
2019-10-15 11:14:34.413 Info: Deadlock detected locking Documents #10466205726835857951

Details

MarkLogic Server is designed to automatically detect and resolve deadlocks. When a deadlock is detected, one of the deadlocked transactions is retried, allowing the other to acquire the lock and continue. When this expected behavior occurs, an XDMP-DEADLOCK is written to the e-node error log as a ‘Debug’ message to indicate that a deadlock occurred and was resolved.

If the deadlock cannot be resolved by repeated retries, an XDMP-DEADLOCK message is written to the e-node error log as a ‘Notice’ message.

Deadlocks are also reported at ‘Info’ level on the d-node on which they occur.

It is common for deadlocks to be the cause of a poor performing application.  The file log level needs to be set to the ‘debug’ level or higher in order to detect that the retryable deadlock conditions are occurring. You can set the file log level in the Admin UI by navigating to -> Configure -> Groups -> {group-name} -> Configure tab -> file log level = debug; press the “ok” button. 

Response

If XDMP-DEADLOCK appears as an infrequent ‘Debug’ message, no action is required. Deadlocks are a normal part of database operations, and the system successfully resolved the deadlock.

If XDMP-DEADLOCK appears frequently as a Debug message, you may have a performance issue. Revise your query or content structure to reduce the frequency of the deadlock.

If XDMP-DEADLOCK appears as a ‘Notice’ message, MarkLogic Server was unable to automatically resolve the deadlock. Examine the error message for details about the contentious resource. Revise your query or content structure to avoid the deadlock.

Diagnostics

You can obtain additional information in the error logs for debugging deadlocks

  1. Setting the "file log level" to finer will yield log messages showing what updates are doing. You can set the file log level in the Admin UI by navigating to -> Configure -> Groups -> {group-name} -> Configure tab -> file log level = finer; press the “ok” button. 
  2. The trace event "Locking" will give additional locking information in the error logs. You can add the "Locking" trace event in the Admin UI by navigating to -> Configure -> Groups -> {group-name} -> Diagnostics -> trace events activated = true; Add "Locking"; press the “ok” button. 
  3. You may also be able to identify the transaction request by correlating timestamps in the error logs with those in the access logs. 

Application Level Remediation

Once you have identified a transaction that references a common document, there are a couple of things that you can do to remediate XDMP-DEADLOCK issues.

  1. Re-architect your application so that it does not reference the common document.
  2. Isolate all updates to shared documents inside their own transactions. This can be done using xdmp:eval with isolation level of ‘different transaction’.
  3. Call the xdmp:lock-for-update() function at the beginning of the update transaction for the referenced documents. When called at the beginning of a transaction, xdmp:lock-for-update() locks the URI immediately, as opposed to later in the transaction as MarkLogic Server is a lazy evaluator. If there are multiple transactions doing a bunch of work and then attempting to lock the same URI, some of the transaction could be forced to redo work (as is often the case when you see the Debug level XDMP-DEADLOCK log message). By using xdmp:lock-for-update(), the lock is acquired by the first transaction immediately, and the other transactions wait to acquire the lock, resulting in no extra work being performed and some deadlock conditions are avoided. xdmp:lock-for-update() simply forces the update lock upfront and the lock will be released at the end of the transaction. 

Other Problematic Application Patterns

There are some application patterns that might lead to deadlocks in a system with concurrent queries:

  • Update transactions where the result of a search returns a large number of documents:  In this case, the update transaction will read-lock every matching document.  Some alternatives:
    • rewrite searches to return a small number of documents;
    • isolate the transaction context of updates from the search queries that return large result sets;
    • If you just want to get a list of URIs without locking you can use cts:uris instead of cts:search, as cts:uris does not lock.
  • Undetected deadlock
    • The most likely reason for an undetected deadlock is a request deadlocking against itself. This can happen when an update holding a lock then calls an xdmp:eval() or xdmp:invoke(), to invoke another nested update that tries to acquire the lock.
    • One way this shows up in the error logs is as a timeout on xdmp:lock-for-update()

                    Notice:  SVC-EXTIME: xdmp:lock-for-update(“/test-uri.xml") -- Time limit exceeded

Directory Creation

The “directory creation” database configuration set to ‘automatic’ is a common cause of deadlocks. For document inserts, the server has to lock all the directory (properties) fragments that match the document URI "path", which could cause deadlocks if other insert threads are trying to lock the same set of documents.

For example: If you have insert the document ‘/a/path/to/content.xml’, the server would create the following directories:

 /

/a/

/a/path/

/a/path/to/

These directories would have to be locked every time you want to put content into the "/a/path/to/" URI path. If you are experiencing deadlocks on directories, then you'll want to set directory creation to "manual". Note that "automatic" directory creation was the default setting in MarkLogic Server 5.0 and prior, and "automatic" is still required if the database will be attached to a WebDav Application Server. However, beginning in MarkLogic Server 6.0, directory creation was changed to "manual" by default.

Update Transactions, Locks and Lifetime

Update transactions do not run at a set timestamp (as Query transactions do).  Update transactions see the latest view of any given document at the time it is first accessed by any statement in the transaction. Because an update transaction must successfully obtain locks on all documents it reads or writes in order to complete evaluation, there is no chance that a given update transaction will see 'half' or 'some' of the updates made by other transactions (i.e. transactional integrity is enforced).

Within an update transaction, query operations require read locks and update operations require reader/writer locks. A reader/writer lock is an exclusive lock and cannot be acquired until any locks held by other transactions are released. A read lock is not exclusive. Once a lock is acquired, it is held until the transaction ends. This prevents other transactions from updating a read locked document and ensures a read-consistent view of the document. 

For a more thorough examination of MarkLogic transactions, refer to the "Understanding Transactions in MarkLogic Server" section of the MarkLogic Server Application Developer's Guide.

Further Reading


MarkLogic Knowlegebase: Understanding the Lock Trace diagnostic event

MarkLogic Documentation: Understanding Transactions in MarkLogic Server

MarkLogic Knowedgebase: How do updates work in MarkLogic Server?

MarkLogic Knowledgebase: Read only queries run at a timestamp and Update transactions use locks

 

Summary

The XDMP-INMMTREEFULL, XDMP-INMMLISTFULL, XDMP-INMMINDXFULL, XDMP-INMREVIDXFULL, XDMP-INMMTRPLFULL & XDMP-INMMGEOREGIONIDXFULL messages are informational only.  These messages indicate that in-memory storage is full, resulting in the forest stands being written out to disk. There is no error as MarkLogic Server is working as expected.

Configuration Settings

If these messages consistently appear more frequently than once per minute, increasing the ‘in-memory’ settings in the affected database may be appropriate.

  • XDMP-INMMTREEFULL corresponds to the “in memory tree size” setting. "in memory tree size" specifies the amount of cache and buffer memory to be allocated for managing fragment data for an in-memory stand.
  • XDMP-INMMLISTFULL corresponds to the “in memory list size” setting. "in memory list size" specifies the amount of cache and buffer memory to be allocated for managing termlist data for an in-memory stand.
  • XDMP-INMMINDXFULL corresponds to the “in memory range index size” setting. "in memory range index size" specifies the amount of cache and buffer memory to be allocated for managing range index data for an in-memory stand.
  • XDMP-INMREVIDXFULL corresponds to the “in memory reverse index size” setting. "in memory reverse index size" specifies the amount of cache and buffer memory to be allocated for managing reverse index data for an in-memory stand. 
  • XDMP-INMMTRPLFULL corresponds to the “in memory triple index size” setting. "in memory triple index size" specifies the amount of cache and buffer memory to be allocated for managing triple index data for an in-memory stand. 
  • XDMP-INMMGEOREGIONIDXFULL corresponds to the “in memory geospatial region index size” setting. "in memory geospatial region index size" specifies the amount of cache and buffer memory to be allocated for managing geo region index data for an in-memory stand. 

Increasing the in memory settings have implications on the ‘journal size’ setting. The default value of journal size should be sufficient for most systems; it is calculated at database configuration time based on the size of your system. If you change the other memory settings, however, the journal size should equal the sum of the in memory list size and the in memory tree size. Additionally, you should add space to the journal size if you use range indexes (particularly if you use a lot of range indexes or have extremely large range indexes), as range index data can take up journal space.

Introduction

This KB article is for those customers who are willing to upgrade their DHS (Data Hub Service) Data Hub version from Data Hub 5.1.0 (or earlier) to Data Hub 5.2.x+ on AWS. 

Note: This process only applies for requests to MarkLogic Support to upgrade the Data Hub version on a DHS AWS service.

Details

For customers who want to upgrade their DHS Data Hub version from Data Hub 5.1.0 (or earlier) to Data Hub 5.2.x in DHS AWS, they should be made aware of the following.

The user can still upgrade to Data Hub 5.2.x but with the following caveats:

Old DHS Roles DH 5.2 Roles
Flow Developer data-hub-developer
Flow Operator data-hub-operator
data-hub-monitor
Endpoint Developer data-hub-developer
Endpoint User data-hub-operator
Service Security Admin

data-hub-security-admin
data-hub-admin
pii-reader

    To determine which Data Hub version customers can upgrade to, see Version Compatibility in the DHS AWS documentation.
    - AWS https://docs.marklogic.com/cloudservices/aws/refs/version-compatibility.html

    Summary

    This article describes the errors thrown when decoding URLs and how to detect invalid characters to avoid the errors

    Details

    When decoding certain URLs using xdmp:url-decode(), it is possible that certain characters will cause one of two errors to be thrown. 

    1. XDMP-UTF8SEQ is thrown if the percent-encoded bytes do not form a valid UTF-8 octet sequence. A good description of UTF-8 can be found at: https://en.wikipedia.org/wiki/UTF-8 
    2. XDMP-CODEPOINT is thrown if the UTF-8 octet sequence specifies a Unicode codepoint invalid for XML.

    The specification for the Uniform Resource Identifier (URI): Generic Syntax can be found here: https://tools.ietf.org/html/rfc3986. In particular, the following section explains why certain characters are invalid: "Non-ASCII characters must first be encoded according to UTF-8 [STD63], and then each octet of the corresponding UTF-8 sequence must be percent-encoded to be represented as URI characters."

    The code below can be used to detect invalid characters.  Make sure to remove any invalid characters prior to URL decoding.

    (codepoint <= 0x8) ||
    (codepoint >= 0xb && codepoint <= 0xc) ||
    (codepoint > 0xd && codepoint < 0x20) ||
    (codepoint >= 0xd800 && codepoint < 0xe000) ||
    (codepoint > 0xfffd && codepoint < 0x10000) ||
    (codepoint >= 0x110000)

    Introduction:

    When trying to restore from a backup previously taken, the  XDMP-BACKDIRINUSE error message may sometimes be encountered:

        XDMP-BACKDIRINUSE - Backup data directory currently has a backup job in progress

    As described, the most common occurrence of this message is when a restore is attempted while a backup task is running.  However, you may also encounter this error when another process has the backup directory locked.

    Solution:

    To resolve this error, you will need to first determine if there is, indeed, a backup running to the same directory, or if the directory is locked by another process.

    If there is another backup running, wait for it to complete or kill it, and attempt the restore again.

    However, if there is no other backup task running, check if there are any files in the backup directory with the prefix "Job.*"  (This may happen when the backup files were copied during a running backup job)

    For example:

    -rw-r--r-- 1 mlogic dba  4747 May 20 15:48  Job.F5CDF0424BDC0DE1

    -rw-r--r-- 1 mlogic dba  4747 May 20 15:48  Job.B856D24DED41A543 

    When there are files which start with Job.* in the backup directory, the server assumes that there is another backup job in progress and will throw the XDMP-BACKDIRINUSE error. Deleting these files from the directory (or renaming them) and performing the restore again should get you past this error.

    If neither of these solutions are sufficient to get past this error, you should look for external (non-MarkLogic) processes that might be holding a lock on database backup files, such as file backup or virus scan programs. If these exist, either wait for the processes to complete or kill them, and then attempt the restore again.

     

     

     

     

    Summary

    XDMP-CANCELED indicates that a query or operation was cancelled either explicitly or as a result of a system event. XDMP-EXTIME also indicates that a query or operation was cancelled, but the reason for the cancellation is the result of the elapsed processing time exceeding a timeout setting.

    XDMP-CANCELED:  Canceled request

    The XDMP-CANCELED error message usually indicates that an operation such as a merge, backup or query was explicitly canceled. The message includes information about what operation was canceled. Cancellation may occur through the Admin Interface or by calling an explicit cancellation function, such as xdmp:request-cancel().

    An XDMP-CANCELED error message can also occur when a client breaks the network socket connection to the server while a query is running (i.e. if the client abandons the request), resulting in the query being canceled.

    try/catch:

    XDMP-CANCELED exception will not be caught in a try/catch block.

    XDMP-EXTIME: Time limit exceeded

    An XDMP-EXTIME error will occur if a query or other operation exceeded its processing time limit. Surrounding messages in the ErrorLog.txt file may pinpoint the operation which timed out.

    Inefficient Queries

    If the cause of the timeout is an inefficient or incorrect query, you should tune the query. This may involve tuning your query to minimize the amount of filtering required. Tuning queries in MarkLogic often includes maintaining the proper indexes for the database so that the queries can be resolved during the index resolution phase of query evaluation. If a query requires filtering of many documents, then the performance will be adversely affected. To learn more about query evaluation, refer to Section 2.1 'Understanding the Search Process' of the MarkLogic Server's Query Performance and Tuning Guide available in our documentation at https://docs.marklogic.com/guide/performance.pdf.  

    MarkLogic has tools that can be used to help evaluate the characteristic of your queries. The best way to analyze a single query is to instrument the query with query trace, query meters and query profiling API calls: Query trace can be used to determine if the queries are resolvable in the index, or if filtering is involved;  Query meters gives statistics from a query execution; and Query profiling will provide information regarding how long each statement in your query took. Information regarding these APIs are available in the Query Performance and Tuning Guide

    The Query Console makes it easy to profile a query in order to view sub-statement execution times. Once you have identified the poor performing statements, you can focus on optimizing that part of the code.

    Inadequate Processing Limit

    If the cause of the timeout is an inadequate processing limit, you may be able to configure a more generous limit through the Admin Interface. 

    A common setting which can contribute to the XDMP-EXTIME error message is the 'default time limit' setting for an Application Server or the Task Server.  An alternative to increasing the 'default time limit' is to use xdmp:set-request-time-limit() within your query.  Note that neither the 'default time limit' nor the request time limit can be larger than the "max time limit".

    Resource Bottlenecks

    If the cause of the timeout is the result of a resource bottleneck where the query or operation was not being serviced adequately, you will need to tune your system to eliminate the resource bottleneck. MarkLogic recommends that all systems where MarkLogic Server is installed should monitor the resource usage of its system components (i.e. CPU, memory, I/O, swap, network, ...) so that resource bottlenecks can easily be detected.

    try/catch

    XDMP-EXTIME can be caught in a try/catch block.

    Summary

    A query will fail with an XDMP-CONFLICTINGUPDATES exception if an update statement attempts to perform an update to a document that will conflict with other updates occurring in the same statement.

    Details

    Update statements are conceptually performed sequentially, with each working from the state after of the previous statement. However, update actions within a statement are conceptually performed concurrently, working from the same state. 

    When executing an update statement, all the update actions requested are accumulated for processing. It is only after all code in the update statement has finished that the accumulated updates are processed in a batch.

    Any given fragment is only written once per statement. In a single update statement, you can request multiple changes to a single fragment and MarkLogic Server coalesces them into individual single fragment write. The Server detects conflicting updates when it coalesces updates into a single fragment write.

    Examples

     - A single update transaction that attempts to updates a node and then attempts to add a child element to that node in the same transaction will fail with an XDMP-CONFLICTINGUPDATES exception.

     - A single update transaction that attempts to insert a document and then attempts to insert a node to that document will fail with an XDMP-CONFLICTINGUPDATES exception.

     - A single update transaction that attempts to insert a document at the same URI twice. 

    Application Remedies

    To avoid a XDMP-CONFLICTINGUPDATES exception, you can revise your application code to either:

     - Perform the conflicting operations in two separate transactions. There are multiple ways in MarkLogic Server to separate the execution of a statement into multiple transactions, such as by using the semicolon transaction delimiter; by using the xdmp:eval or xdmp:invoke() functions isolation option set to different-transaction; or by spawning a task using the xdmp:spawn() function.

     - Combine the changes so that all updates within a node hierarchy occur within a single document update or document insert.

    Introduction

    It is possible to encounter an XDMP-RIDXTOOBIG error during a database merge - for example, you may see an error that looks similar to: 

    XDMP-FORESTNOT: Forest ... not available: XDMP-FORESTERR: Error in merge of forest ...: XDMP-RIDXTOOBIG: Too many entries in range index.

    Encountering this error may result in the forest going offline.

    Cause

    The error XDMP-RIDXTOOBIG means that there are too many entries in the range index. Range indexes in MarkLogic server are limited to 2^32 (~4 billion) entries per stand. 

    During a merge, if the resulting stand will have a range index with more than 2^32 entries, then the merge will fail with the above mentioned error and the forest will be taken offline offline.

    Solution

    One way to avoid encountering the XDMP-RIDXTOOBIG error would be to set the 'merge-max-size' of the database in question to a size where you know it will not unlikely hit the range index entries limit.  A value that we often recommend for the 'merge-max-size' setting is 32GB. The 'merge-max-size' setting will enforce an upper limit of 32GB on the size of any individual stand in the forest.  MarkLogic Server does this by managing merges so that a merge will not occur if the resultant stand size would be bigger than that size.

    Note: In MarkLogic 7 and later releases, the default value for merge-max-size is 32GB, which is recommended as it provides a good balance between keeping the number of stands and preventing very large merges from using large amounts of disk space.

    Links to other related documentation:

    MarkLogic Server's Administrators Guide section on 'Understanding and Controlling Database Merges'

    Knowledgebase article on 'Migrating to MarkLogic 7 and understanding the 1.5x disk rule'

    Knowledgebase article on 'Range Indexes and Mapped File Initialization Errors'

     

     

     

     

     

     

    xdmp:value() vs. xdmp:eval():

    Both xdmp:value() and xdmp:eval() are used for executing strings of code dynamically. However, there are fundamental difference between the two:

    • The code in the xdmp:value() is evaluated against the current context - if variables are defined in the current scope, they may be referenced without re-declaring them
    • xdmp:eval() creates an entirely new context that has no knowledge of the context calling it - which means one must define a new XQuery prolog and variables from the main context. Those newly defined variables are then passed to the xdmp:eval() call as parameters and declared as external variables in the eval script

    Function behavior when used inline:

    Although both these functions seem to fulfill the same purpose, it is very important to note their behaviors changes when used inline. Consider the following example:

    declare namespace db = “http://marklogic.com/xdmp/database”;
    Let $t:= <database xmlns=”http://marklogic.com/xdmp/database” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”>
    <database-name>aptp-dev-modules</database-name>
                  </database>

    return

    fn:fold-left(function($a, $b){ xdmp:value(fn:concat(“$t/db:”, “database-name”)) }, (), (1,2,3))
    (or)
    fn:fold-left(function($a, $b){ $t/xdmp:value(fn:concat(“db:”, “database-name”)) }, (), (1,2,3))

    When a function is called inline, the expressions inside the function cannot be statically compiled to function items because the values of the closed-over variables are not yet available. Therefore, the query parser would have to look for any variable bindings during dynamic analysis to be able to evaluate the expression. Ideally, variables from the main context are passed to the function call as parameters. However, in the case of xdmp:value(), the function is expected to have the needed context to evaluate the expression and therefore the expression is evaluated without looking for any variable bindings - which can ultimately lead to unexpected behavior. This explains why the first return statement in the above example returns an ‘empty sequence’ and the second one returns the correct results because the variable is being referenced outside of the xdmp:value call. In other words, when used inline - xdmp:value() cannot reference variables declared in the current scope.

    In contrast, in the case of xdmp:eval, the parser would know to look for variable bindings during dynamic analysis as this function is not expected to have the knowledge of the calling context. Consequently, when using xdmp:eval the context needs to be explicitly created and the variables explicitly passed to the call as parameters and declared as external variables.