Knowledgebase: MarkLogic Server
Understanding Search: value queries
28 June 2022 01:38 PM

Value queries

Summary

Here we summarize some characteristics of value queries and compare to other approaches.

Discussion

Characteristics

Punctuation and space tokens are not indexed as words in the universal index. Therefore, word-queries involving whitespace or punctuation will not make use of whitespace or punctuation in index resolution, regardless of space or punctuation sensitivity.

Punctuation and space tokens are also not generally indexed as words in the universal index in value queries either. However, as a special exception there are terms in the universal index for "exact" value queries ("exact" is shorthand for "case-sensitive", "diacritic-sensitive", "punctuation-sensitive", "whitespace-sensitive", "unstemmed", and "unwildcarded"). "exact" value queries should be resolvable properly from the index, but only if you have fast-case-sensitive-searches and fast-diacritic-sensitive-searches enabled in the database.

For field-word or field-value queries you can modify what counts as punctuation or whitespace via tokenizer overrides. This can turn what would have been a phrase into a single word.

Outside of the special case given for exact value queries, all queries involving space or punctuation are phrase queries. Word and value search is not string matching.

Space insensitive and punctuation insensitive do not mean tokenization insensitive. "foo-bar" will not match "foobar" as a value query or a word query, regardless of your punctuation sensitivity. Word and value search is not string matching.

Stemming is handled differently between a word-query and value-query; a value-query only indexes using basic stemming.

String range queries are about string matching. Whether there is a match depends on the collation, but there is no tokenization and no stemming happening.

Exact matches

If you want to do exact queries you can

  • Enable fast-case-sensitive-searches and fast-diacritic-sensitive-searches on your database and run them as value queries.

or

  • Create a field with custom overrides for the significant punctuation or whitespace and run them as field word or field value queries.

or

  • Create a string range index with the appropriate collation (codepoint, most likely) and run them as string range equality queries.
Looking deeper

As with all queries, xdmp:plan can be helpful: it will show you the questions asked of the indexes. If there is information from a query is not reflected in the plan, that will be a case where there might be false positives from index resolution (i.e., unfiltered search).

For example, the plan for cts:search(/, cts:element-value-query(xs:QName("x"), "value-1", "exact")) should include the hyphen if you do have fast-case-sensitive-searches and fast-diacritic-sensitive-searches enabled in the database.

JSON

For purposes of indexing, a JSON property (name-value pair) is roughly equivalent to an XML element.  See the following for more details:

    Creating Indexes and Lexicons Over JSON Documents

    How Field Queries Differ Between JSON and XML

References

(4 vote(s))
Helpful
Not helpful

Comments (0)