MarkLogic 10 and Data Hub 5.0

Latest MarkLogic releases provide a smarter, simpler, and more secure way to integrate data.

Read Blog →


Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up →

Knowledgebase: MarkLogic Server
Case Sensitive Search with Stemming
04 June 2012 10:38 AM


Stemming in MarkLogic Server is a case-sensitive operation.

Stemmed, Case Insensitive

When you run a stemmed, case-insensitive search, MarkLogic will map all the word to lowercase and then calculate the stems.

In English, this work fairly well as words are generally lowercase. For other languages (such as German) this doesn't always work as well.

Stemmed, Case Sensitive

When a search is case-sensitive, the stems are different depending on the case of the word.

In English, case sensitive searches with stemming specified are not considered as stemmed searches because, in English, words with upper case letters stem to themselves. You would not expect proper names or acronyms to be stemmed to something else. For example, “Mr. Mark Cutting” should not match "marks cuts.”

For German and other languages where stems exist for mixed case words, case-sensitive with stemming is recommended.


These example queries demonstrate stemmed searches:

xquery version "1.0-ml";
xdmp:document-insert("1.xml", <a>This is test.</a>),
xdmp:document-insert("2.xml", <a>This is TESTING.</a>), 
xdmp:document-insert("3.xml", <a>This is TESTS.</a>), 
xdmp:document-insert("4.xml", <a>This is TEST.</a>);

Case insensitive with stemming
    <options xmlns="">

Matches: test, TESTS, TESTING, & TEST.

Case sensitive with stemming

    <options xmlns="">

Matches: TESTS

(4 vote(s))
Not helpful

Comments (0)