Solutions

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Learn

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Community

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Company

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

 
Knowledgebase:
cts:highlight with overlapping matches
06 December 2016 07:18 PM

 

Problem:

When searching for matches using OR'ed word-queries, and in the case where there are overlapping matches, (i.e. one query contains the text of another query), the results of a cts:highlight query are not as desired.

 

For example:

 

let $p := <p>From the memoirs of an accomplished artist</p>

 

let $query :=

 

cts:or-query(

(cts:word-query("accomplished artist"),

cts:word-query("memoirs of an accomplished artist"))

)

 

return cts:highlight($p, $query, <m>{$cts:text}</m>)

 

 The desired outcome of this would be:

               <p>From the <m>memoirs of an accomplished artist</m> </p>

 Whereas, the actual results are:

                <p>From the <m>memoirs of an </m> <m>accomplished artist</m></p>

 

This behavior is by design and the results are expected. It is because cts:highlight  breaks up overlapping  areas into separate matches.

The cts:highlight built-in variables – $cts:queries and $cts:action help in understanding how this works, as well as to work-around this problem.

  $cts:queries --> returns the matching queries for each of the matched texts.

  $cts:action --> can be used with xdmp:set to specify what should happen next

  • "continue" - (default) Walk the next match. If there are no more matches, return all evaluation results.
  • "skip" - Skip walking any more matches and return all evaluation results
  • "break" - Stop walking matches and return all evaluation results

   For eg., replacing the return statement with the following in the original query:

return

 cts:highlight($p, $query,

<m>{$cts:text,<number-of-matches>{count($cts:queries)}</number-of-matches>,

<matched-by>{$cts:queries}</matched-by>}</m>)

 

==>

 

<p>From the

     <m>memoirs of an

     <number-of-matches>1</number-of-matches>

     <matched-by>

      <cts:word-query xmlns:cts="http://marklogic.com/cts">

       <cts:text xml:lang="en">memoirs of an accomplished artist</cts:text>

      </cts:word-query>

    </matched-by>

     </m>

 

   <m>accomplished artist

   <number-of-matches>2</number-of-matches>

    <matched-by>

      <cts:word-query xmlns:cts="http://marklogic.com/cts">

     <cts:text xml:lang="en">memoirs of an accomplished artist</cts:text>

      </cts:word-query>

      <cts:word-query xmlns:cts="http://marklogic.com/cts">

    <cts:text xml:lang="en">accomplished artist</cts:text>

      </cts:word-query>

    </matched-by></m></p>

 

These results give us a better understanding of how the text is being matched. We can see that " accomplished artist" is matched by both the word-queries 'accomplished artist' and 'memoirs of an accomplished artist'; hence the results of cts:highlight seem different.

To work around this problem, we can insert a small piece of code: 

 

let $p := <p>From the memoirs of an accomplished artist</p>

let $query :=

     cts:or-query(

        (cts:word-query("accomplished artist"),

        cts:word-query("memoirs of an accomplished artist")))

 

     return cts:highlight($p,$query,

 

       ( if (count($cts:queries) gt 1) then xdmp:set($cts:action, "continue")

         else

       ( let $matched-text := <x>{$cts:queries}</x>/cts:word-query/cts:text/data(.)

        return <m>{$matched-text}</m> )

        ))

 

==>

 

<p>From the <m>memoirs of an accomplished artist</m></p>

 

 

Please note that this solution relies on assumptions about what's inside the or-query, but this example could be modified to handle other overlapping situations.

 

   

 



      These results giv

      e us a better understanding of how the text is being matched. We can see that " accomplished artist" is matched by both the word-queries, and hence the results of cts:highlight seem different.

      (0 vote(s))
      Helpful
      Not helpful

      Comments (0)