Effects of case sensitivity of search term on search score | MarkLogic Support

Knowledgebase

108Administration 8App Services 42Errors 144MarkLogic Server 52Performance Tuning

Knowledgebase: MarkLogic Server

Effects of case sensitivity of search term on search score 22 September 2015 01:12 PM
Introduction This article talks about effects of case sensitivity of search term on search score and thus on final order of search results for a secondary query which is using cts:boost-query and weight. The case-insensitive word term is treated as the lower case word term, so there can be no difference in the frequencies and scores of results for any-case/case-insensitive search term and lowercase search term with “case-sensitive” option or when neither "case-sensitive" nor "case-insensitive" is present. If neither "case-sensitive" nor "case-insensitive" is present, text of search term is used to determine case sensitivity. Understanding relevance score In MarkLogic Search results are returned in a relevance order. The most relevant results are first in result sequence and least relevant are last. More details on relevance score and its calculation are available at, https://docs.marklogic.com/guide/search-dev/relevance Of many ways to control this relevance score one way is to use a secondary query to boost relevance score, https://docs.marklogic.com/guide/search-dev/relevance#id_30927 . This article takes advantage of examples using secondary query to boost relevance scores and impact of text case (upper, lower or unspecifed) of search terms on relevance score on order of results returned. A few examples to understand this scenario Consider a few scenarios where below mentioned queries are trying to boost certain search results up using cts:boost-query and weight for word "washington" in returned results. Example 1: Search with lowercase search term and option for case not specified Query1: xquery version "1.0-ml"; declare namespace html = "http://www.w3.org/1999/xhtml"; for $hit in ( cts:search( fn:doc()/test, cts:boost-query(cts:element-word-query(xs:QName("test"),"George" ), cts:element-word-query(xs:QName("test"),"washington",(), 10.0) ) ) ) return element hit { attribute score { cts:score($hit) }, attribute fit { cts:fitness($hit) }, attribute conf { cts:confidence($hit) }, $hit } Results for Query1: <hit score="28276" fit="0.9393904" conf="0.2769644"> <test>Washington, George... </test> </hit> ... ... <hit score="16268" fit="0.7125317" conf="0.2100787"> <test>George washington was the first President of the United States of America...</test> </hit> ... Example 2: Search with lowercase search term and case-sensitive option Query2: xquery version "1.0-ml"; declare namespace html = "http://www.w3.org/1999/xhtml"; for $hit in ( cts:search( fn:doc()/test, cts:boost-query(cts:element-word-query(xs:QName("test"),"George" ), cts:element-word-query(xs:QName("test"),"washington",("case-sensitive"), 10.0) ) ) ) return element hit { attribute score { cts:score($hit) }, attribute fit { cts:fitness($hit) }, attribute conf { cts:confidence($hit) }, $hit } Results for Query2: <hit score="28276" fit="0.9393904" conf="0.2769644"> <test>Washington, George... </test> </hit> ... ... <hit score="16268" fit="0.7125317" conf="0.2100787"> <test>George washington was the first President of the United States of America...</test> </hit> ... Example 3: Search with uppercase search term and option case-insensitive, in cts:boost-query like below with rest of query similar to above queries Query3: cts:boost-query(cts:element-word-query(xs:QName("test"),"George" ), cts:element-word-query(xs:QName("test"),"Washington",("case-insensitive"), 10.0) ) Results for Query3: <hit score="28276" fit="0.9393904" conf="0.2769644"> <test>Washington, George... </test> </hit> ... ... <hit score="16268" fit="0.7125317" conf="0.2100787"> <test>George washington was the first President of the United States of America...</test> </hit> ... Clearly above queries are producing the same scores with same fitness and confidence scores. This is because the case-insensitive word term is treated as the lower case word term, so there can therefore be no difference in the frequencies of those two terms (any-case/case-insensitive and lowercase/case-sensitive), and therefore no difference in scoring. Thus no difference in scores of results for Query3 and Query2. And for cases where case sensitivity is not specified, text of search term is used to determine case sensitivity. For Query3 text of search term contains no uppercase hence it treated as "case-insensitive". Now let us now take look at a query with a word with uppercase and case-sensitive option in query. Example 4: Search with uppercase search term and option case-sensitive, in cts:boost-query like below with rest of query similar to above queries Query4: cts:boost-query(cts:element-word-query(xs:QName("test"),"George" ), cts:element-word-query(xs:QName("test"),"Washington",("case-sensitive"), 10.0) ) Results for Query4: <hit score="44893" fit="0.9172696" conf="0.3489831"> <test>Washington, George was the first... </test> </hit> ... ... <hit score="256" fit="0.0692672" conf="0.0263533"> <test>George washington was the first President of the United States of America...</test> </hit> ... As we can clearly see the scores are changed for results for Query4 and thus final order of results is also updated. Conclusion: While using a secondary query having cts:boost-query and weight, to boost certain search results up, it is important to understand the impact of case of search text on result sequence. A case-insensitive word term is treated as the lower case word term, so there can therefore be no difference in the frequencies of any-case/case-insensitive and lowercase/case-sensitive search terms, and therefore no difference in scoring. For search term with upper case alphabets in text and with “case-sensitive” option scores are boosted up as expected in comparison with a “case-insensitive search”. If neither "case-sensitive" nor "case-insensitive" is present, text of search term is used to determine case sensitivity. If text of search term contains no uppercase, it specifies "case-insensitive". If text of search term contains uppercase, it specifies "case-sensitive".
(1 vote(s)) Helpful Not helpful

Comments (0)