Solutions

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Learn

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Community

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Company

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

 
Knowledgebase:
Potential Pitfall of Encoded Binary in Documents
30 May 2013 01:32 PM

Summary

There are index settings that may be problematic if your documents contain encoded binary data (such as Base64 encoded binary).  This article identifies a couple of these index settings and explains the potential pitafall.

Details

When word lexicons or string range indexes are enabled, each stand in the database's forest will contain a file called the 'atom data' file.  The contents of this file includes all of the relevant unique tokens.  This could include all the unique tokens in the forest (stand).  If your documents contain encoded binary data, all of the encode binary may be replicated as atom data and stored in the atom data file.

Pitfall: There is an undocumented limit on the size of the atom data file of 4GB.  If this limit is exceeded for the content of a forest, then stand merges will begin to fail with the error

    "XDMP-FORESTERR: Error in merge of forest forest-nameSVC-MAPBIG: Mapped file too large to map: NNN bytes: '\path\Forests\forest-name\stand-id\AtomData'"

Workarounds

There are a few options that you can pursue to get around these problems

1. Do not include encoded binary data in your documents.  An alternative is to store the binary content seperately using MarkLogic Server support for binary documents and to include a reference to the binary document in the original.

2. If word lexicons are required, and the encoded binary data is limited to a finite number of elements in your documents, then you can create word query exclusions for those elements. In the MarkLogic Server Admin UI, word query element exclusions can be configured by navigating to -> Configure -> Databases -> {database-name} -> Word Query -> Exclude tab. 

3. If a string range index is defined on an element that contains encoded binary, then you can either remove the string range index or change the document data model so that the element containing the encoded binary is not shared with an element that requires a string range index. 

 

 

(0 vote(s))
Helpful
Not helpful

Comments (0)