Knowledgebase:
Duplicate Documents
30 January 2023 11:21 AM

Introduction

In the more recent versions of MarkLogic Server, there are checks in place to prevent the loading of invalid documents (such as documents with multiple root nodes).  However, documents loaded in earlier versions of MarkLogic Server can now result in duplicate URI or duplicate document errors being reported.

Additionally, under normal operating conditions, a document/URI is saved in a single forest. If somehow the load process gets compromised, then user may see issues like duplicate URI (i.e. same URI in different forests) and duplicate documents (i.e. same document/URI in same forest).

Resolution

If the XDMP-DBDUPURI (duplicate URI) error is encountered, refer to our KB article "Handling XDMP-DBDUPURI errors" for procedures to resolve.

If one doesn't see XDMP-DBDUPURI errors but running fn:doc() on a document returns multiple nodes then it could be a case of duplicate document in same forest.

To check that the problem is actually duplicate documents, one can either do an xdmp:describe(fn:doc(...)) or fn:count(fn:doc((...)). If these commands return more than 1 e.g. xdmp:describe(fn:doc("/testdoc.xml")) returns (fn:doc("/testdoc.xml"), fn:doc("/testdoc.xml")) or fn:count(fn:doc("/testdoc.xml")) returns 2 then the problem is of duplicate documents in the same forest (and not duplicate URIs).

To fix duplicate documents, the document will need to be reloaded.

Before reloading, you can take a look at the two version to see if there is a difference.  Check fn:doc("/testdoc.xml")[1] versus fn:doc("/testdoc.xml")[2] to see if there is a difference, and which one you want to reload.

If there is a difference, that may also that may point the operation that created the situation.

(8 vote(s))
Helpful
Not helpful

Comments (0)