Solutions

MarkLogic Data Hub Service

Fast data integration + improved data governance and security, with no infrastructure to buy or manage.

Learn More

Learn

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Community

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Company

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

 
Knowledgebase:
Queries constrained to elements
27 April 2016 04:01 AM

Introduction

In this Knowledgebase article, we will discuss a technique which will allow you to scope queries in such a way to ensure that they occur only within a parent element.

Details

cts:element-query

Consider a scenario where you have an XML document structured in this way:

<rootElement>
  <id>7635940284725382398</id>
  <parentElement>
  <childElement1>valuea</childElement1>
  <childElement2>false</childElement2>
  </parentElement>
  <parentElement>
  <childElement1>valuea</childElement1>
<childElement2>truthy</childElement2>
</parentElement>
<parentElement>
<childElement1>valueb</childElement1>
<childElement2>true</childElement2>
</parentElement>
<childElement1>valuec</childElement1>
</rootElement>

And you want to find the document where where a parentElement has a childElement1 with a value of 'valuec'.

A search like

cts:search (/,
    cts:element-value-query(xs:QName('childElement1'), 'valuec', 'exact')
)

will give you the above document, but doesn't consider where the childElement1 value is. This isn't what you want. Search queries perform matching per fragment, so there is no constraint that childElement1 be in any particular spot in the fragment.

Wrapping a cts:element-query around a subquery will constrain the subquery to exist within an instance of the named element. Therefore,

cts:search (/,
    cts:element-query (
        xs:QName ('parentElement'),
        cts:element-value-query(xs:QName('childElement1'), 'valuec', 'exact')
    )
)

will not return the above document since there is no childElement1 with a value of 'valuec' inside a parentElement.

This applies to more-complicated subqueries too. For example, looking for a document that has a childElement1 with a value of 'valuea' AND a childElement2 with a value of 'true' as

cts:search (/, 
    cts:and-query ((
        cts:element-value-query(xs:QName('childElement1'), 'valuea', 'exact'),
        cts:element-value-query(xs:QName('childElement2'), 'true', 'exact')
    ))
)

will return the above document. But you may want these two child element-values both inside the same parentElement. This can be accomplished with

cts:search (/, 
    cts:element-query (
        xs:QName ('parentElement'),
        cts:and-query ((
            cts:element-value-query(xs:QName('childElement1'), 'valuea', 'exact'),
            cts:element-value-query(xs:QName('childElement2'), 'true', 'exact')
        ))
    )
)

This should give you expected results, as it won't return the above document since the two child element-value queries do not match inside the same parentElement instance.

Filtering and indexes

Investigating a bit further, if you run the query with xdmp:query-meters you will see (depending on your database settings) 

    <qm:filter-hits>0</qm:filter-hits>
    <qm:filter-misses>1</qm:filter-misses>

What is happening is that the query can only determine from the current indexes that there is a fragment with a parentElement, and a childElement1 with a value of 'valuea', and a childElement2 with a value of 'true'. Then, after retrieving the document and filtering, it finds that the document is not a complete match and so does not return it (thus filter-misses = 1).

(To learn more about filtering, refer to Understanding the Search Process section in our Query Performance and Tuning Guide.)

At scale you may find this filtering slow, or the query may hit Expanded Tree Cache limits if it retrieves many false positives to filter through.

If you have the correct positions enabled, the indexes can resolve this query without retrieving the document and filtering. In this case, after setting both

element-word-positions

and

element-value-positions

to true on the database and reindexing, xdmp:query-meters now shows

<qm:filter-hits>0</qm:filter-hits>
<qm:filter-misses>0</qm:filter-misses>

(To track element-value-queries inside element-queries you need element-word-positions and element-value-positions enabled. The former is for element-query and the latter is for element-value-query.)

Now this query can be run without filtering. However, if you have a lot of relationship instances in a document, the calculations using positions can become quite expensive to compute.

Position details

Further details: Empty-element positions are problematic. Positions are word positions, and the position of an element is the word position of the first word when the element starts to the word position of the first word after the element ends. Positions of attributes are the positions of their element. If everything is an empty element, you have no words and everything has the same position and so positions cannot discriminate between elements.

Reindexing

Note that if you change these settings you will need to reindex your database, and the usual tradeoffs apply (larger indexes and slower indexing). Please see the following for guidance on adding an index and reindexing in general:

See also:

Reindexing impact
Adding an index in production

(2 vote(s))
Helpful
Not helpful

Comments (0)