Knowledgebase: Performance Tuning
Searching across latest version of managed documents
27 March 2014 02:01 PM

What is DLS?

The Document Library Service (DLS) enables you to create and maintain versions of managed documents in MarkLogic Server. Access to managed documents is controlled using a check-out/check-in model. You must first check out a managed document before you can perform any update operations on the document. A checked out document can only be updated by the user who checked it out; another user cannot update the document until it is checked back in and then checked out by the other user. 

Searching across latest version of managed documents

To track document changes, you can store versions of a document by defining a retention policy in DLS.  However, it is often the latest version of the document that most of the people are intereseted in. MarkLogic provides a function dls:documents-query which helps you access latest versions of the managed documents in the database. There are situations where there are performance overhead in using this function.  When the database has millions of managed documents you may see some performance overhead in accessing all the latest versions. This is an intrinsic issue related to because of large numbers of files and joining across properties.

How can one improve the search performance?

A simple workaround is to add your latest versions in a collection (say "latest"). Instead of the API dls:documents-query, you can then use a collection query on this "latest" collection. Below are two approaches that you can use - while the first approach can be used for new changes (inserts/updates), the second approach should be used to modify the existing managed documents in the database.

1.) To add new inserts/updates to "latest" collection

Below are two files, manage.xqy, and update.xqy that can be used for new inserts/updates.

In manage.xqy, we do an insert and manage, and manipulate the collections such that the numbered document has the "historic" collection and the latest document has the "latest" collection. You have to use xdmp:document-add-collections() and xdmp:document-remove-collections() when doing the insert and manage because it's not really managed until after the transaction is done.

In update.xqy, we do the checkout-update-checkin with the "historic" collection (so that we don't inherit the "latest" collection from the latest document), and then add "latest" and remove "historic" from the latest document. 

(: manage.xqy :)
xquery version "1.0-ml";
import module namespace dls = "" at "/MarkLogic/dls.xqy";
  (xdmp:permission("dls-user", "read"),
   xdmp:permission("dls-user", "update")),
  "/stuff.xml",  "historic")

(: update.xqy :)
xquery version "1.0-ml";
import module namespace dls = "" at "/MarkLogic/dls.xqy";

2.) To add the already existing managed documents to the "latest" collection

To add the latest version of documents already existing in your database to the "latest" collection you can do the following in batches.

xquery version "1.0-ml";
import module namespace dls = "" at "/MarkLogic/dls.xqy";
declare variable $start external ;
declare variable $end   external ;
for $uri in cts:search(fn:collection(), dls:documents-query())[$start to $end]/document-uri(.) 
return xdmp:document-add-collections($uri, ("latest"))

This way you can segregate historical and latest version of the managed documents and then, instead of using dls:documents-query, you can use the "latest" collection to search across the latest version of managed documents.

Note: Although this workaround may work when you want search across the latest version of managed documents, it does not solve all the cases. dls:documents-query is used internally in many dls.xqy calls so not all functionality will be improved.

(3 vote(s))
Not helpful

Comments (0)