Stemming and element-value-query
28 November 2017 11:59 AM
Stemming is handled differently between a word-query and value-query; a value-query only indexes using basic stemming.
A word may have more than one stem. For example,
To see how this works with a word-query we can use
on a database with basic stemming returns
Since basic stemming uses only the first/shortest stem, this is searching just for the stem 'place'.
will match 'a place of my own' ('placing' and 'place' both stem to 'place') but not 'new placings' ('placings' stems to just 'placing').
However, on a database with advanced stemming the plan is
Here you can see that there are two term queries OR-ed together (note the two different key values). The result is that the same
However, a search with
cts:element-value-query(xs:QName('title'), 'new placing')
whether the database has basic or advanced stemming, showing that multiple stems are not used.
The reason for this is that MarkLogic will only do basic stemming when indexing the keys for a value. Therefore there is a single key for the value. If MarkLogic Server were designed to support multiple stems for values (which is does not), this would expand the indexes dramatically and slow down indexing, merging, and querying. Consider if each word had two stems, then there would be 2^N keys for N words. The size would grow exponentially for addtional stems.
More information on value-queries is available at Understanding Search: value queries.