Understanding Search: value queries | MarkLogic Support

Knowledgebase

108Administration 8App Services 42Errors 144MarkLogic Server 52Performance Tuning

Knowledgebase: MarkLogic Server

Understanding Search: value queries 28 June 2022 01:38 PM
Value queries Summary Here we summarize some characteristics of value queries and compare to other approaches. Discussion Characteristics Punctuation and space tokens are not indexed as words in the universal index. Therefore, word-queries involving whitespace or punctuation will not make use of whitespace or punctuation in index resolution, regardless of space or punctuation sensitivity. Punctuation and space tokens are also not generally indexed as words in the universal index in value queries either. However, as a special exception there are terms in the universal index for "exact" value queries ("exact" is shorthand for "case-sensitive", "diacritic-sensitive", "punctuation-sensitive", "whitespace-sensitive", "unstemmed", and "unwildcarded"). "exact" value queries should be resolvable properly from the index, but only if you have fast-case-sensitive-searches and fast-diacritic-sensitive-searches enabled in the database. For field-word or field-value queries you can modify what counts as punctuation or whitespace via tokenizer overrides. This can turn what would have been a phrase into a single word. Outside of the special case given for exact value queries, all queries involving space or punctuation are phrase queries. Word and value search is not string matching. Space insensitive and punctuation insensitive do not mean tokenization insensitive. "foo-bar" will not match "foobar" as a value query or a word query, regardless of your punctuation sensitivity. Word and value search is not string matching. Stemming is handled differently between a word-query and value-query; a value-query only indexes using basic stemming. String range queries are about string matching. Whether there is a match depends on the collation, but there is no tokenization and no stemming happening. Exact matches If you want to do exact queries you can Enable fast-case-sensitive-searches and fast-diacritic-sensitive-searches on your database and run them as value queries. or Create a field with custom overrides for the significant punctuation or whitespace and run them as field word or field value queries. or Create a string range index with the appropriate collation (codepoint, most likely) and run them as string range equality queries. Looking deeper As with all queries, `xdmp:plan` can be helpful: it will show you the questions asked of the indexes. If there is information from a query is not reflected in the plan, that will be a case where there might be false positives from index resolution (i.e., unfiltered search). For example, the plan for `cts:search(/, cts:element-value-query(xs:QName("x"), "value-1", "exact"))` should include the hyphen if you do have fast-case-sensitive-searches and fast-diacritic-sensitive-searches enabled in the database. JSON For purposes of indexing, a JSON property (name-value pair) is roughly equivalent to an XML element. See the following for more details: Creating Indexes and Lexicons Over JSON Documents How Field Queries Differ Between JSON and XML References Stemming and element-value-query cts:field-value-query cts:element-value-query Using xdmp:plan to View the Evaluation Plan
(4 vote(s)) Helpful Not helpful

Comments (0)