Knowledgebase:
Termlist will discard positions at 256MB
31 October 2016 05:49 PM

Introduction

For terms stored in the index, the position list tracks where they appear within the document. Positions are used to resolve queries where distance between terms matter (for example near queries where a term can appear n words away from another term or phrase within a given element or set of search criteria). There are a number of index options involving positions of document terms. When these indexes are enabled, MarkLogic will record positions in a positions list for each term in the universal index. When positions lists get large, MarkLogic may take a long time to load them from disk.  To minimize the impact of large position lists, MarkLogic imposes a maximum size for these lists per term.

MarkLogic 7 and above

Each stand in a forest maintains its own index and its own positions list. The smaller the stands, the less likely you are to encounter maximum positions for a term as smaller stands likely results in smaller term lists.  A maximum stand size was introduced in MarkLogic 7.  By default, the maximum stand size restricts the size of individual stands to 32768 (32 GB).

If you are running into the warnings message "Termlist will discard positions at 256MB", you may need to manage your data and forests to ensure the index sizes remain manageable.

There is a positions-list-max-size configuration parameter (default is 256MB, with a maximum value of 512MB) where the term list is considered too large and unwieldy.  For example: A 512MB term list would take 15 seconds to load from disk at 20MB/sec, so increasing this value from the default may allow for a fast fix to a potential performance problem but it's probably not the most optimal change that could be made.

In MarkLogic 7 & MarkLogic 8,  the default maximum stand size restricts the size of individual stands to 32768 (32 GB). With the maximum stand size setting, we expect it to be less likely for new customers to run into the large positions list problem; For a 32GB stand, a single 256MB term list would take almost 1% of all the disk space taken by that stand, which is unlikely.

Scenario: Understanding what messages to look for

In the ErrorLog file, you may notice messages appearing at the "Info" level which look like:

2016-04-13 03:02:17.951 Info: Termlist for X in Y is 151 MB; will discard positions at 256 MB

This message is just letting you know that the term list is getting large and the limit is getting near for a particular stand ("Y"). The term list is managed by each stand in all your forests so for each stand, a maximum size is allowed (the default is 256MB). If the positions list starts to exceed this maximum size, positions will be discarded by MarkLogic for that database.

Settings

The value is set at the database level: 

Configure -> Databases -> [Database] -> positions list max size

This value can be increased by changing this database-level setting although we do not recommend exceeding 512MB. The main reason for this is due to performance; larger positions lists take longer to load, so there is a performance implication with this setting.

Newer releases of MarkLogic Server set the maximum size of a given stand for new databases to a default size of 32GB (32768). This setting is governed by the Merge Policy for the database:

Configure -> Databases ->[Database] -> Merge Policy -> merge max size

As each stand maintains its own positions list, one way to ensure that you don't hit the maximum size is to ensure your on-disk stands are smaller; having more stands has a performance implication as any query needs to traverse all stands in order to compute the result set of fragments in order to answer the query.

In our example message above, the positions list is 151MB which is still a decent way off from the upper default per-stand limit of 256MB - so no immediate concern. 

Steps to resolving the issue

If it becomes necessary, in order to keep yourself on the best side of the positions-list-max-size limit, you have two choices:

1. You can modify the positions list max size to make these lists larger (remember that the recommended upper limit is 512MB) and this is a single configuration change that is made at database level.

2. You could modify the merge max size to ensure each of your on-disk stands are smaller.

Either approach will have a performance impact so my suggestion would be that neither setting should be changed unless you really need to.

Given the above example, if the largest value you are seeing is 151MB, there is still a decent amount of overhead for all the stands in this database.

If you start to see the value getting closer to 256MB, the fastest resolution would be to increase the positions list max size to ensure that positions are not discarded and then to think about managing the maximum size of on-disk stands for your database.

ErrorLog message escalation: understanding the risks

If they only ever remain at Info level, they can be safely ignored. The severity level of the logging will escalate twice. However, log messages have been designed to escalate in severity so you know how to watch for warning signs.

  1. When you reach 2/3 of the discard threshold, these messages will appear at Notice level in the ErrorLogs.
  2. When you reach 3/4 of the discard threshold, these messages will appear at Warning level in the ErrorLogs.

At the very least, keeping tabs on when these messages start to appear at Notice level should give you plenty of advance warning.

Monitoring for Warning level messages should also catch this issue before it becomes a critical issue and starts to impact on search results.

Further reading

(1 vote(s))
Helpful
Not helpful

Comments (0)