Knowledgebase : App Services

Summary

On Internet Explorer 9 and Internet Explorer 10, application services UI should be run in Compatibility Mode.

Details:

When using the Application Services UI in Internet Explorer 9 or Internet Explorer 10, you may notice some minor UI bugs.  These minor UI bugs occur just within MarkLogic Application Services, NOT within application built with it.  These UI bugs can be avoided if you run IE 9 or IE 10 in compatibility view.

Instructions on how to configure compatibility modes in IE 9 or IE 10: 

1. Press ALT-T to bring up the Tools menu
2. On the Tools menu, click 'Compatibility View Settings' 
3. Add the domain to the list of domains to render in compatibility view.

Introduction

This article is intended to give you enough information to enable you to understand the output from query console's profiler.

Details

Query

Consider the following XQuery:

xquery version '1.0-ml';
let $path := '/Users/chamlin/Downloads/medsamp2012.xml'
let $citations := xdmp:document-get($path)//MedlineCitation
for $citation at $i in $citations
return
xdmp:document-insert(fn:concat("/",$i,$citation/PMID/fn:string(),".xml"), $citation)

This FLWOR expression will load an xml file into memory, then find each MedlineCitation element and insert it as a document in the database.  Although this example is very simple, it should give us enough information to understand what the profiler does and how to understand the output.

Scenario / Walkthrough

Setup

  • Download the small dataset for medline at http://www.nlm.nih.gov/databases/dtd/medsamp2012.xml and save it to your MarkLogic machine
  • Open a buffer in Query Console
  • Load the XML fragments into your nominated database by executing the XQuery shown above, altering $path so it points to your downloaded medsamp2012.xml file
  • You should have loaded 156 Medline XML fragments in your database if everything worked correctly.  If you receive an error, make sure that the file is available to MarkLogic and has the correct permissions to allow access

Profile the query

Now run the same query again, only this time, ensure "Profile" is selected before you hit the run button.

You should see something like this (click image to view it in a separate window):

QConsole profiler output

 

The header shows overall statistics for the query:

Profile 627 Expressions PT0.286939S
The number of XQuery expression evaluations along with the entire query execution expressed as an xs:dayTimeDuration (hence the PT prefix)

The table shows the profiler results for the expressions evaluated in the request, one row for each expression:

Module:Line No.:Col No.
The point in the code where the expression can be found.
Count
The number of times the expression was evaluated.
Shallow %
The percentage of time spent evaluating a particular expression compared to the entire query, excluding any time spent evaluating any sub-expressions.
Shallow µs
The time (in microseconds) taken for all the evaluations of a particular expression. This excludes time spent evaluating any sub-expressions.
Deep %
The percentage of time spent evaluating a particular expression compared to the entire query, including any time spent evaluating any sub-expressions.
Deep µs
The time (in microseconds) taken for all the evaluations of a particular expression. This includes time spent evaluating any sub-expressions.
Expression
The particular XQuery expression being profiled and timed.  Expressions can represent FLWOR expressions, literal values, arithmetic operations, functions, function calls, and other expressions.

Shallow time vs deep time

In profiler output you will usually want to pay the most attention to expressions that have a large shallow time.  These expressions are doing the most work, exclusive of work done in sub-expressions.

If an expression has a very large deep time, but almost no shallow time, then most of the time is being spent in sub-expressions.

For example, in the profiler output shown, the FLWOR expression at .main:2:0 has the most deep time since it has includes the other expressions, but not a lot of shallow time since it doesn't do much work itself. The expression at .main:3:45 has a lot of deep time, but that all comes from the subexpression at .main:3:18, which takes the most time.

Sorting

The default sorting of the table is by Shallow % descending.  This a generally a good view as it will bring the expressions taking the most shallow time to the top.  You can sort on a different column by clicking on the column header.

Cold vs warm

Timings may change for a query if you execute it more than once, due to the caching performed by MarkLogic.  A query will be slower if it needs data that is not available in the caches (cold) vs where much of the information is available from caches (warm).  This is by design and gives better performance as the system runs and caches frequently used information.

Lazy evaluation

Another characteristic of MarkLogic Server is its use of lazy evaluation.  A relatively expensive evaluation may return quickly without performing the work necessary to produce results.  Then, when the results are needed, the work will actually be performed and the evaluation time will be assigned at that point.  This can give surprising results.

Wrapping an expression in xdmp:eager() will evaluate it eagerly, giving a better idea of how much time it really takes because the time for evaluation will be better allocated in the profiler output.

Further reading

Summary

This is a discussion about XDMP-FRAGTOOLARGE errors encountered while using Information Studio

Details

Information Studio uses the Fab database. The Fab database retains the state information related to the document transformation and distribution processes. Documents that generate errors during a load operation are retained in the Fab database.

If there are transformation steps configured for the flow, the collector loads the documents to the Fab database, where they are processed by a Content Processing Framework (CPF) pipeline. The CPF pipeline transforms the content and distributes the resulting documents to the destination database.

Depending on the size and/or structure of your content, you may occasionally encounter XDMP-FRAGTOOLARGE errors when performing transformations in Information Studio. When this occurs, consider increasing the in-memory-list size setting for the Fab database.

Summary

MarkLogic Admin GUI is convenient place to deploy the Normal Certificate infrastructure or use the Temporary Certificate generated by MarkLogic. However for certain advance solutions/deployment we need XQuery based admin operations to configure MarkLogic.

This knowledgebase discusses the solution to deploy SAN or Wildcard Certificate in 3 node (or more) cluster.

 

Certificate Types and MarkLogic Default Config

Certificate Types

In general, When browsers connect to a Server using HTTPS, they check to make sure your SSL Certificate matches the host name in the address bar. There are three ways for browsers to find a match:

a).The host name (in the address bar) exactly matches the Common Name in the certificate's Subject.

b).The host name matches a Wildcard Common Name. Please find example at end of article. 

c).The host name is listed in the Subject Alternative Name (SAN) field as part of X509v3 extensions. Please find example at end of article.

The most common form of SSL name matching is for the SSL client to compare the server name it connected to with the Common Name (CN field) in the server's Certificate. It's a safe bet that all SSL clients will support exact common name matching.

MarkLogic allows this common scenario (a) to be configured from Admin GUI, and we will discuss the Certificate featuring (b) and (c) deployment further.

Default Admin GUI based Configuration 

By default, MarkLogic generates Temporary Certificate for all the nodes in the group for current cluster when Template is assigned to MarkLogic Server ( Exception is when Template assignment is done through XQuery ).

The Temporary Certificate generated for each node do have hostname as CN field for their respective Temporary Certificate - designed for common Secnario (a).

We have two path to install CA signed Certificate in MarkLogic

1) Generate Certificate request, get it signed by CA, import through Admin GUI

or 2) Generate Certificate request + Private Key outside of MarkLogic, get Certificate request signed by CA, import Signed Cert + Private Key using Admin script

Problem Scenario

In both of the above cases, while Installing/importing Signed Certificate, MarkLogic will look to replace Temporary Certificate by comparing CN field of Installed Certificate with Temporary Certificaet CN field.

Now, if we have WildCard Certificate (b) or SAN Certificate (c), our Signed Certificate's CN field will never match Temporary Certificate CN field, hence MarkLogic will Not remove Temporary Certificates - MarkLogic will continue using Temporary Certificate.

 

Solution

After installing SAN or wildcard Certificate, we may run into AppServer which still uses Temporary installed Certificate ( which was not replaced while installing SAN/wild-card Certificate).

Use below XQuery against Security DB to remove all Temporary Certificates. XQuery needs uri lexicon to be enabled (default enabled). [Please change the Certificate Template-Name in below XQuery to reflect values from your environment.] 

xquery version "1.0-ml";

import module namespace pki = "http://marklogic.com/xdmp/pki"  at "/MarkLogic/pki.xqy";
import module namespace admin = "http://marklogic.com/xdmp/admin"  at "/MarkLogic/admin.xqy";
      

let $hostIdList := let $config := admin:get-configuration()
                   return admin:get-host-ids($config)
                     
for $hostid in $hostIdList
return
  (: FDQN name matching Certificate CN field value :)
  let $fdqn := "TestDomain.com"

  (: Change to your Template Name string :)
  let $templateid := pki:template-get-id(pki:get-template-by-name("YourTemplateName"))

  for $i in cts:uris()
  where 
  (   (: locate Cert file with Public Key :)
      fn:doc($i)//pki:template-id=$templateid 
      and fn:doc($i)//pki:authority=fn:false()
      and fn:doc($i)//pki:host-name=$fdqn
  )
  return <h1> Cert File - {$i} .. inserting host-id {$hostid}
  {xdmp:node-insert-child(doc($i)/pki:certificate, <pki:host-id>{$hostid}</pki:host-id>)}
  {
      (: extract cert-id :)
      let $certid := fn:doc($i)//pki:certificate/pki:certificate-id
      for $j in cts:uris()
      where 
      (
          (: locate Cert file with Private key :)
          fn:doc($j)//pki:certificate-private-key/pki:template-id=$templateid 
          and fn:doc($j)//pki:certificate-private-key/pki:certificate-id=$certid
      )
      return <h2> Cert Key File - {$j}
      {xdmp:node-insert-child(doc($j)/pki:certificate-private-key,
        <pki:host-id>{$hostid}</pki:host-id>)}
      </h2>
  } </h1>

Above will remove all Temporary Certificates (including Template CA) and their private-key, leaving only Installed Certificate associated with Template, forcing all nodes to use Installed Certificate. 

 

Example: SAN (Subject Alternative Name) Certificate

For 3 node cluster (engrlab-128-101.engrlab.marklogic.com, engrlab-128-164.engrlab.marklogic.com, engrlab-128-130.engrlab.marklogic.com)

$ opensl x509 -in ML.pem -text -noout
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 9 (0x9)
        Signature Algorithm: sha1WithRSAEncryption
        Issuer: C=US, ST=NY, L=NewYork, O=MarkLogic, OU=Engineering, CN=Support CA
        Validity
            Not Before: Apr 20 19:50:51 2016 GMT
            Not After : Jun  6 19:50:51 2018 GMT
        Subject: C=US, ST=NJ, L=Princeton, O=MarkLogic, OU=Eng, CN=TestDomain.com
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
            RSA Public Key: (1024 bit)
                Modulus (1024 bit):
                    00:97:8e:96:73:16:4a:cd:99:a8:6a:78:5e:cb:12:
                    5d:e5:36:42:d2:b8:52:51:53:6c:cf:ab:e4:c6:37:
                    2c:15:12:80:c1:1b:53:29:4c:52:76:84:80:1d:ee:
                    16:41:a6:31:c5:7b:0d:ca:d7:e5:da:d7:67:fe:80:
                    89:9f:0d:bc:46:4f:f0:7e:46:88:26:d5:a0:24:a6:
                    06:d1:fa:c0:c7:a2:f2:11:7f:5b:d5:8d:47:94:a8:
                    06:d9:46:8f:af:dd:31:d5:15:d2:7a:13:39:3e:81:
                    32:bd:5c:bd:62:9d:5a:98:1d:20:0e:30:d4:57:3f:
                    7f:89:e6:20:ae:88:4d:85:d7
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: 
                Key Encipherment, Data Encipherment
            X509v3 Extended Key Usage: 
                TLS Web Server Authentication
            X509v3 Subject Alternative Name: 
                DNS:engrlab-128-101.engrlab.marklogic.com, DNS:engrlab-128-164.engrlab.marklogic.com, DNS:engrlab-128-130.engrlab.marklogic.com
    Signature Algorithm: sha1WithRSAEncryption
        52:68:6d:32:70:35:88:1b:70:df:3a:56:f6:8a:c9:a0:9d:5c:
        32:88:30:f4:cc:45:29:7d:b5:35:18:a0:9a:45:37:e9:22:d1:
        c5:50:1d:50:b8:20:87:60:9b:c1:d6:a8:0c:5a:f2:c0:68:8d:
        b9:5d:02:10:39:40:b3:e5:f6:ae:f3:90:31:57:4c:e0:7f:31:
        e2:79:e6:a8:c0:e6:3f:ea:c5:75:67:3e:cd:ea:88:5d:60:d6:
        01:59:3c:dc:e0:47:96:3b:59:4a:13:85:bb:87:70:d0:a2:6b:
        0f:d4:84:1d:d1:be:e8:a5:67:c3:e3:59:05:0d:5d:a5:86:e6:
        e4:9e

Example: Wild-Card Certificate

For 3 node cluster (engrlab-128-101.engrlab.marklogic.com, engrlab-128-164.engrlab.marklogic.com, engrlab-128-130.engrlab.marklogic.com). 

$ openssl x509 -in ML-wildcard.pem -text -noout
Certificate:
    Data:
        Version: 1 (0x0)
        Serial Number: 7 (0x7)
        Signature Algorithm: sha1WithRSAEncryption
        Issuer: C=US, ST=NY, L=NewYork, O=MarkLogic, OU=Engineering, CN=Support CA
        Validity
            Not Before: Apr 24 17:36:09 2016 GMT
            Not After : Jun 10 17:36:09 2018 GMT
        Subject: C=US, ST=NJ, L=Princeton, O=MarkLogic Corporation, OU=Engineering Support, CN=*.engrlab.marklogic.com
 

Introduction
 
MarkLogic Server's 'DatabaseClient' instance represents a database connection sharable across threads. The connection is stateless, except that authentication is done the first time a client interacts with the database via a Document Manager, Query Manager, or other manager. For instance: you may instantiate a DatabaseClient as follows:
 
// Create the database client

DatabaseClient client = DatabaseClientFactory.newClient(host, port,
                                          user, password, authType);

And release it as follows:
// release the client
client.release();

Details on DatabaseClient Usage

To use the Java Client API efficiently, it helps to know a little bit about what goes on behind the scenes.

You specify the enode or load balancer host when you create a database client object.  Internally, the database client object instantiates an Apache HttpClient object to communicate with the host.

The internal Apache HttpClient object creates a connection pool for the host.  The connection pool makes it possible to reuse a single persistent HTTP connection for many requests, typically improving performance.

Setting up the connection pool has a cost, however.

As a result, we strongly recommend that applications create one database client for each unique combination of host, database, and user.  Applications should share the database client across threads.  In addition, applications should keep a reference to the database client for the entire life of the application interaction with that host.


For instance, a servlet might create the database client during initialization and release the database client during destruction. The same servlet may also use two separate database client instances with different permissions, one for read-only users and one with read/write permissions for editors. In the latter case, both client instances are used throughout the life of the servlet and destroyed during client destruction.

Introduction

This KB article is for those customers who are willing to upgrade their DHS (Data Hub Service) Data Hub version from Data Hub 5.1.0 (or earlier) to Data Hub 5.2.x+ on AWS. 

Note: This process only applies for requests to MarkLogic Support to upgrade the Data Hub version on a DHS AWS service.

Details

For customers who want to upgrade their DHS Data Hub version from Data Hub 5.1.0 (or earlier) to Data Hub 5.2.x in DHS AWS, they should be made aware of the following.

The user can still upgrade to Data Hub 5.2.x but with the following caveats:

Old DHS Roles DH 5.2 Roles
Flow Developer data-hub-developer
Flow Operator data-hub-operator
data-hub-monitor
Endpoint Developer data-hub-developer
Endpoint User data-hub-operator
Service Security Admin

data-hub-security-admin
data-hub-admin
pii-reader

    To determine which Data Hub version customers can upgrade to, see Version Compatibility in the DHS AWS documentation.
    - AWS https://docs.marklogic.com/cloudservices/aws/refs/version-compatibility.html

    Introduction

    Postman is a valuable tool for testing the behaviour of HTTP connections and for exploring MarkLogic through its ReST-based APIs.  

    Postman can be downloaded as a Chrome (browser based) application or as a standalone application from https://www.getpostman.com/

    Getting Started

    Step one: Choosing a MarkLogic ReST endpoint to query

    In this example, we're using the standard ReST endpoint that ships with MarkLogic Server version 7 and above.  For this example, everything is already set up to connect immediately on port 8000.  

    We are going to use the ReST API to access one of the many features exposed by MarkLogic's ReST API - using an HTTP GET request to get the configuration properties using ReST:

    If you're running this example locally, the endpoint you will be accessing will be:

    http://localhost:8000/LATEST/config/properties

    If you run this in your browser, you'll see something like this getting returned:

    <rapi:properties xmlns:rapi="http://marklogic.com/rest-api">
    <rapi:content-versions>none</rapi:content-versions>
    <rapi:debug>false</rapi:debug>
    <rapi:document-transform-all>true</rapi:document-transform-all>
    <rapi:document-transform-out/>
    <rapi:update-policy>merge-metadata</rapi:update-policy>
    <rapi:validate-options>true</rapi:validate-options>
    <rapi:validate-queries>false</rapi:validate-queries>
    </rapi:properties>

    We can use Postman to do the same work; first set the HTTP method to GET and enter the API endpoint URL as per the example below:

    Step two: configuring Digest authentication for your user

    Postman will need to perform authentication on your behalf; we can set this up to use Digest authentication to communicate with MarkLogic Server.  

    Start by selecting the Authorization tab and where you see the Type dropdown, select Digest Auth:

    Step three: configure your user credentials

    After "Digest Auth" has been set, input your credentials: username and password and use public for the Realm:

    Step four: submit the request and view the response (as XML)

    Use the Send button and note that you can view the HTTP Headers passed and the HTTP Response body: 

    Step five: understanding content negotiation - returning JSON

    Postman will also allow you to configure headers; we're going to add an Accept header and request that the HTTP response from MarkLogic is JSON.  To do this, use the dropdown box to select the correct mimetype for JSON (application/json):

    Step six: verifying the JSON response data

    As soon as this is set, test to ensure the content returned is JSON by running the request (using the Send button):

    Putting it all together

    1. Using postman to GET a list of API endpoints from MarkLogic

    In this case, we're going to call the following MarkLogic ReST API endpoint to list all configured ReST Application Servers:

    http://localhost:8002/LATEST/rest-apis

    Set up Postman to call this endpoint:

    2. Change the GET to a POST

    Use the dropdown to change the HTTP method from GET to POST:

    Note that you should now get an exception (HTTP 400) in the response body and the message code thrown by MarkLogic Server will be RESTAPI-INVALIDCONTENT:

    As we're now performing a POST, the API is telling us that it's expecting a payload to be sent along with the request; in this case, the POST should be accompanied with information about the particular resource we're going to create.

    If we want to create a new ReST application server, here are a minimal set of parameters that can be sent over to MarkLogic Server:

    <rest-api xmlns="http://marklogic.com/rest-api">
       <name>myTest</name>
       <database>Documents</database>
       <port>9111</port>
    </rest-api>

    In the example above, we're specifying the Application Server name, the default database and the port 

    If we look at the Body tab in Postman, we want to set the body format as raw and ensure that XML (application/xml) is specified as the content type for the body.

    Finally, add your payload content and press Send

    If everything went to plan, you should find that a 201 Created status is returned by MarkLogic Server.

    Further Reading

    Marklogic ReST API docs and blog posts...

    https://docs.marklogic.com/guide/rest-dev/service

    Background:

    Due to concerns around high availability, every node in a MarkLogic Server instance has its own copy of configuration files like databases.xml, hosts.xml, etc. Consequently, it's important for each node to make sure it's working off the correct version of the various configuration files.

    This is accomplished by using the Admin/config-lock.xqy library to make sure the appropriate configuration file is locked before they are updated to make sure that multiple file requests through the Admin API or REST Management API don’t corrupt the configuration files. As a consistency check, before locking, the Admin/config-lock.xql library makes sure that the timestamps of the configuration files are more recent than the timestamps in the lock file. This is done to make sure that if someone else has acquired the lock and updated the files, the lock won’t be returned until the files are consistent.

    Problem Statement:

    If you restore the "App-Services" database from another cluster, you can wind up with timestamps in the lock file that bear no useful relationship to the actual configuration files. This is because the "App-Services" database is where the lock file /config-lock/config-timestamps.xml is located, which contains the timestamps of the configuration files from the last time they were locked - for the restore target cluster. Restoring the "App-Services" database from a different source cluster will overwrite the restore target's /config-lock/config-timestamps.xml file. This causes the error "MANAGE-TIMESTAMPOLD: Config files out of date on host" - which is triggered when you try to perform any subsequent update on the restore target cluster's MarkLogic Server configuration files.

    Note: In general, this error is triggered by any PUT or POST operation to the REST Management API, with the exception of some of the security endpoints, which actually update the security database instead of the configuration files.

    Workaround:

    In the restore target cluster showing the "MANAGE-TIMESTAMPOLD" error, deleting the lock file “/config-lock/config-timestamps.xml” should fix the problem because that same file will be re-created with the correct timestamps.

    Note: it is very important to make sure no one is running Admin updates before deleting the lock file, as the timestamps corresponding to those updates will be lost when you delete the lock file.

    Best Practice:

    The "App-Services" database is one of the default MarkLogic databases, used to track configuration file timestamp information local to the cluster on which it resides. It is not recommended to restore "App-Services" databases across MarkLogic clusters.