Compatibility of stemmed searches and generic language support | MarkLogic Support

Knowledgebase

108Administration 8App Services 42Errors 145MarkLogic Server 53Performance Tuning

Knowledgebase:

Compatibility of stemmed searches and generic language support 11 September 2014 02:02 PM
Summary In MarkLogic Server v7.0-2, the tokenizer keys, for languages where MarkLogic provides generic language support, were removed so that they now all use the same key. For example, Greek falls into this class of languages. This change was made as part of an optimization for languages in which MarkLogic Server has advanced stemming and tokenization support. Stemmed searches that include characters from languages that do not have advanced language support, performed on MarkLogic Server v7.0-2 or later releases, against content loaded on a version previous to v7.0-2, may not return the expected results. Resolution In order to successfully run these stemmed searches, you can either: Reindexing the database ; or Reinsert the affected documents (i.e. the documents that contain characters in languages for which MarkLogic Server only has generic language support). If these are not possible in your environment, you can always run the query unstemmed. An Example The following example demonstrates the issue On MarkLogic Server version 7.0-1, insert a document (test.xml) that contains the Greek character 'ε'. Run this query `xdmp:estimate( cts:search( doc('test.xml'), 'ε')),` `cts:contains( doc('test.xml'), 'ε')` The query will return the correct results: `1, true` Upgrade MarkLogic Server to version 7.0-3 or later and run the query again The query will return incorrect results: `0, false` Reindex the database and re-run the query The query will return the correct result once again.
(0 vote(s)) Helpful Not helpful

Comments (0)