Community

MarkLogic 10 and Data Hub 5.0

Latest MarkLogic releases provide a smarter, simpler, and more secure way to integrate data.

Read Blog →

Company

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up →

 
Knowledgebase:
"Repairing out of order string range index messages" in your ErrorLog
21 May 2020 06:07 AM

Introduction

After upgrading to MarkLogic 10.x from any of the previous versions of MarkLogic, examples of the following Warning and Notice level messages may be observed in the ErrorLogs:

Warning: Lexicon '/var/opt/MarkLogic/Forests/Documents/00000006/c4ea1b602ee84a34+Lexicon' collation='http://marklogic.com/collation/zh-Hant' out of order


Notice: Repairing out of order lexicon /var/opt/MarkLogic/Forests/Documents/00000006/c4ea1b602ee84a34+Lexicon collation 'http://marklogic.com/collation/zh-Hant' version 0 to 602

Warning: String range index /space/Forests/Documents/0006ef0e/c0dc932d1b4bcaae-37c6e3905909f64e+string collation 'http://marklogic.com/collation/' out of order. 

Notice: Repairing out of order string range index /space/Forests/Documents/0006ef0e/c0dc932d1b4bcaae-37c6e3905909f64e+string collation 'http://marklogic.com/collation/' version 0 to 602

Starting with MarkLogic 10.0, the server now automatically checks for any lexicons or string range indexes that may be in need of repair.  Lexicon and Range indexes perform "self-heal" in non-readonly stands whenever a lexicon/range index is opened within the stand.

Reason

This is due to changes introduced to the behavior of MarkLogic's root collation.

Starting with MarkLogic 10.0, the root collation has been modified, as do all collations that derive from it; which means there may be some subtle differences in search ordering.

For more information on the specifics of these changes, please refer to http://www.unicode.org/Public/UCA/6.0.0/CollationAuxiliary.html

This helps the server to support newer collation features, such as reordering entire blocks of script characters (for example: Latin, Greek, and others) with respect to each other. 

Implementing these changes has, under some circumstances, improved the performance of wildcard matching by more effectively limiting the character ranges that search scans (and returns) for wildcard-based matching.

Based on our testing, we believe this new ordering yields better performance in a number of circumstances, although it does create the need to perform full reindexing of any lexicon or string range index using the root collation.

MarkLogic Server will now check Lexicons and String Range Indexes and will try to repair them where necessary.  During the evauation, MarkLogic Server will skip making further changes if any of the following conditions apply:

(a) They are already ordered according to the latest specification provided by ICU (1.8 at the time of writing)

(b) MarkLogic Server has already checked the stand and associated lexicons and indexes

(c) The indexes use codepoint collation (in which case, MarkLogic Server will be unable to change the ordering).

Whenever MarkLogic performs any repairs, it will always log a message at Notice level to inform users of the changes made.  If for any reason, MarkLogic Server is unable to make changes (e.g. a forest is mounted as read-only), MarkLogic will skip the repair process and nothing will be logged.

As these changes have been introduced from MarkLogic 10 onwards, you will most likely observe these messages in cases where recent upgrades (from prior releases of the product) have just taken place.

Repairs are performed on a stand by stand basis, so if a stand does not contain any values that require ordering changes, you will not see any messages logged for that stand.

Also, if we any ordering issues are encountered during the process of a merge of multiple stands, there will only be one message logged for the merge, not one for each individual stand inolved in that merge.

Summary

  • Repairs will take place for any stand that has been found to have a lexicon or string index that has an out-of-order and out-of-date (e.g. utilising a collation described by an earlier version of ICU) collation, unless that stand is mounted as read only.
  • Any repair will generate Notice messages when maintenance takes place.
  • Whenever a lexicon or string Range index is opened, this check/repair will take place for any string range index; lexicon call (e.g. cts:values); range query (e.g. cts:element-range-query) and during merges merges.
  • The check looking for ICU version mismatches plus items that are out-of-order, so any lexicon / string range index with older ordering (and which requires no further changes), no further action will be taken for that stand.

Known side effects

If size of the String range index or Lexicon is very large, repairing can cause some performance overhead and may impact search performance during the repair process.

Solution

These messages can be avoided by issuing a full reindex of your databases immediately after performing your upgrade to MarkLogic 10.

(0 vote(s))
Helpful
Not helpful

Comments (0)