Termlist will discard positions at 256MB
31 October 2016 05:49 PM
For terms stored in the index, the position list tracks where they appear within the document. Positions are used to resolve queries where distance between terms matter (for example near queries where a term can appear n words away from another term or phrase within a given element or set of search criteria). There are a number of index options involving positions of document terms. When these indexes are enabled, MarkLogic will record positions in a positions list for each term in the universal index. When positions lists get large, MarkLogic may take a long time to load them from disk. To minimize the impact of large position lists, MarkLogic imposes a maximum size for these lists per term.
MarkLogic 7 and above
Each stand in a forest maintains its own index and its own positions list. The smaller the stands, the less likely you are to encounter maximum positions for a term as smaller stands likely results in smaller term lists. A maximum stand size was introduced in MarkLogic 7. By default, the maximum stand size restricts the size of individual stands to 32768 (32 GB).
If you are running into the warnings message "Termlist will discard positions at 256MB", you may need to manage your data and forests to ensure the index sizes remain manageable.
There is a positions-list-max-size configuration parameter (default is 256MB, with a maximum value of 512MB) where the term list is considered too large and unwieldy. For example: A 512MB term list would take 15 seconds to load from disk at 20MB/sec, so increasing this value from the default may allow for a fast fix to a potential performance problem but it's probably not the most optimal change that could be made.
Scenario: Understanding what messages to look for
In the ErrorLog file, you may notice messages appearing at the "Info" level which look like:
This message is just letting you know that the term list is getting large and the limit is getting near for a particular stand ("Y"). The term list is managed by each stand in all your forests so for each stand, a maximum size is allowed (the default is 256MB). If the positions list starts to exceed this maximum size, positions will be discarded by MarkLogic for that database.
The value is set at the database level:
Configure -> Databases -> [Database] -> positions list max size
Steps to resolving the issue
If it becomes necessary, in order to keep yourself on the best side of the positions-list-max-size limit, you have two choices:
If you start to see the value getting closer to 256MB, the fastest resolution would be to increase the positions list max size to ensure that positions are not discarded and then to think about managing the maximum size of on-disk stands for your database.
ErrorLog message escalation: understanding the risks
If they only ever remain at Info level, they can be safely ignored. The severity level of the logging will escalate twice. However, log messages have been designed to escalate in severity so you know how to watch for warning signs.
At the very least, keeping tabs on when these messages start to appear at Notice level should give you plenty of advance warning.
Monitoring for Warning level messages should also catch this issue before it becomes a critical issue and starts to impact on search results.