Knowledgebase: Errors
Range index type casting and invalid values
03 July 2023 04:13 PM

Range indexes and invalid values

We will discuss range index type casting and the behavior based the invalid-values setting.

Casting values

We can cast a string to an unsignedLong as
xs:unsignedLong('4235234')
and the return is 4235234 as an unsignedLong.  However, if we try
xs:unsignedLong('4235234x')
it returns an error
XDMP-CAST: (err:FORG0001) xs:unsignedLong("4235234x") -- Invalid cast: "4235234x" cast as xs:unsignedLong
Similarly,
xs:unsignedLong('')
returns an error
XDMP-CAST: (err:FORG0001) xs:unsignedLong("") -- Invalid cast: "" cast as xs:unsignedLong
This same situation can arise when a document contains invalid values.  The invalid-values setting on the range index determines what happens in the case of a value that can't be cast to the type of the range index.

Range indexes---values and types

Understanding Range Indexes discusses range indexes in general, and Defining Element Range Indexes discusses typed values.
Regarding the invalid-values parameter of a range index:
In the invalid values field, choose whether to allow insertion of documents that contain elements or JSON properties on which range index is configured, but the value of those elements cannot be coerced to the index data type. You can choose either ignore or reject. By default, the server rejects insertion of such documents. However, if you choose ignore, these documents can be inserted. This setting does not change the behavior of queries on invalid values after documents are inserted into the database. Performing an operation on an invalid value at query time can still result in an error.

Behavior with invalid values

Create a range index

First, create a range index of type unsignedLong on the id element in the Document database:
import module namespace admin = "http://marklogic.com/xdmp/admin"
    at "/MarkLogic/admin.xqy";
let $config := admin:get-configuration()
let $dbid := xdmp:database('Documents')
let $rangespec := admin:database-range-element-index('unsignedLong', '', 'id', (), fn:false())
return
     admin:save-configuration (admin:database-add-range-element-index($config, $dbid, $rangespec))

Insert a document with a valid id value

We can insert a document with a valid value:
xdmp:document-insert ('test.xml', <doc><id>4235234</id></doc>)
Now if we check the values in the index as
cts:values (cts:element-reference (xs:QName ('id')))
we get the value 4235234 with type unsignedLongWe can search for the document with that value as
cts:search (/, cts:element-range-query (xs:QName ('id'), '=', 4235234), 'filtered')
and the document is correctly returned.

Insert a document with a invalid id value

With the range index still set to reject invalid values, we can try to insert a document with a bad value
xdmp:document-insert ('test.xml', <doc><id>4235234x</id></doc>)
That gives an error as expected:
XDMP-RANGEINDEX: xdmp:eval("xquery version &quot;1.0-ml&quot;;&#10;xdmp:document-insert ('te...", (), <options xmlns="xdmp:eval"><database>16363513930830498097</database>...</options>) -- Range index error: unsignedLong fn:doc("test.xml")/doc/id: XDMP-LEXVAL: Invalid lexical value "4235234x"

and the document is not inserted.

Setting invalid-values to ignore and inserting an invalid value

Now we use the Admin UI to set the invalid-values setting on the range index to ignore.  Inserting a document with a bad value as
xdmp:document-insert ('test.xml', <doc><id>4235234x</id></doc>)
now succeeds.  But remember, as mentioned above, "... if you choose ignore, these documents can be inserted. This setting does not change the behavior of queries on invalid values after documents are inserted into the database. Performing an operation on an invalid value at query time can still result in an error."

Values.  Checking the values in the index 

cts:values (cts:element-reference (xs:QName ('id')))
does not return anything.
Unfiltered search.  Searching unfiltered for a value of 7 as
cts:search (/, cts:element-range-query (xs:QName ('id'), '=', xs:unsignedLong (7)), 'unfiltered')
returns our document (<doc><id>4235234x</id></doc>).  This is a false positive.  When you insert document with an invalid value, that document is returned for any search using the index.
Filtered search.  We can search filtered for a value of 7 to see if the false positive can be removed from the results:
cts:search (/, cts:element-range-query (xs:QName ('id'), '=', xs:unsignedLong (7)), 'filtered')
throws an error 

XDMP-CAST: (err:FORG0001) cts:search(fn:collection(), cts:element-range-query(fn:QName("","id"), "=", xs:unsignedLong("7")), "filtered") -- Invalid cast: xs:untypedAtomic("4235234x") cast as xs:unsignedLong

That's because when the document is used in filtering, the invalid value is cast to match the query and it throws an error as in the earlier cast test.

Adding a new index and reindexing

If you have documents already in the database, and add an index, the reindexer will automatically reindex the documents.

If there are invalid values for one of your indexes index then the reindexer will reindex the document but will issue a Debug-level message about the problem:

2023-06-26 16:44:28.646 Debug: IndexerEnv::putRangeIndex: XDMP-RANGEINDEX: Range index error: unsignedLong fn:doc("/test.xml")/doc/id: XDMP-LEXVAL: Invalid lexical value "4235234x"

The reindexer will not reject or delete the document.  You can use this URI given to find the document and correct the issue.  

Finding documents with invalid values

Since documents with invalid values always are returned by searches, you can use this to find the documents by doing an and-query of two searches that are normally mutually exclusive.  For the document with the invalid value, 

cts:uris ((), (),
    cts:and-query ((
        cts:element-range-query (xs:QName ('id'), '=', 7),
        cts:element-range-query (xs:QName ('id'), '=', 8)
    ))
)

returns /test.xml.

(0 vote(s))
Helpful
Not helpful

Comments (0)