Relevance Scores and Stemmed Searches

Knowledgebase

108Administration 8App Services 42Errors 144MarkLogic Server 52Performance Tuning

Knowledgebase:

12 March 2018 06:36 PM

Stemming:

MarkLogic Server supports stemming in English and other languages. If stemmed searches are enabled in the database configuration, MarkLogic Server automatically searches for words that come from the same stem of the word specified in the query, not just the exact string specified in the query. A stemmed search for a word finds the exact same terms as well as terms that derive from the same meaning and part of speech as the search term.

For e.g. in a stemmed search, a query for 'running' will match 'running', 'run' and 'ran' as they all stem to 'run'. The query is actually stemmed before being resolved, so queries for both 'running' and 'ran' are actually performed as queries for 'run', and they return similar results.

Relevance score for stemmed searches:

Search results in MarkLogic Server return in relevance order; that is, the result that is most relevant to the cts:query expression in the search is the first item in the search return sequence, and the least relevant is the last. (Documentation at http://docs.marklogic.com/guide/search-dev/relevance#chapter gives detailed information of how relevance score is computed).

However, when using stemmed searches, the original query term and its stemmed matches are both ranked equally. That is, higher relevance score is not given to the exact match of the word.

For example, consider the following 3 documents:

run.xml

<root>

</root>

running.xml

<root>

<text>running out of time</text>

</root>

ran.xml

<root>

</root>

The below search query for "running" returns all 3 documents ranked equally.

let $query:= cts:word-query("running")

for $hit in cts:search(doc(), $query,"relevance-trace")

return element hit {

attribute score { cts:score($hit) },

xdmp:node-uri($hit)

}

==>

<hit score="2048">running.xml</hit>

This behavior is desirable in most search applications. However, to give higher score for the original query term, so that it comes up first in the search results, stemmed and unstemmed word-queries should be combined in an or-query.

let $query:=

cts:or-query(

(cts:word-query("running","stemmed"),

cts:word-query("running","unstemmed")))

for $hit in cts:search(doc(), $query)

return element hit {

attribute score { cts:score($hit) },

xdmp:node-uri($hit)

}

==>

<hit score="11264">running.xml</hit>

Note that for the above cts:or-query, 'word searches' option should be enabled for the database, else the query returns an XDMP-WORDSEARCH error.

(6 vote(s))

Helpful

Not helpful

Comments (0)

Sitefinity

NativeChat

MOVEit

Kendo UI

Telerik

DataDirect

Corticon

Kemp LoadMaster

Flowmon

WhatsUp Gold

Kendo UI

Telerik

Test Studio

Fiddler Everywhere

DataDirect

Chef

MOVEit

WS_FTP

OpenEdge

MarkLogic

Semaphore