Knowledgebase:
Timeout during large collection delete
11 March 2022 02:57 PM

Summary

During large collection delete using xdmp:collection-delete query may time out. xdmp:collection-delete though efficient by itself, but if individual fragment part of collection are locked by other 'user query' then xdmp:collection-delete will wait for lock release on them and may time out.

Example:

Generally xdmp:collection-delete is fast but performance can be compounded by other processes taking and holding locks. 

To give a really simple (and contrived) illustration of this:

1. Create a sample 250000 Docs for below collection

for $i at $pos in 1 to 250000 return xdmp:document-insert("/"||$pos||".xml", element test {xdmp:random()}, 
<options xmlns="xdmp:document-insert"> 
 <collections>
  <collection>/my/collection</collection>
 </collections>
</options>)

2. Below query to delete a document

xquery version "1.0-ml";
xdmp:document-delete("/1.xml"),
xdmp:sleep(500000)

3. Run the xdmp:collection-delete in a separate Query Console tab.

Above delete in step:2 holds lock and calls xdmp:sleep, which then has to run to completion before xdmp:collection-delete can return, and so collection delete will timeout. 

Above is simple example creating lock through sleep, simulating other user query that could be holding lock.

You can also enable the Lock Trace diagnostic trace event to identify write locks.

Solution

1. Instead of trying to delete an entire collection in a single transaction, you could use the below query for individual collection delete in a spawn call with a certain limit Ex:5000.

xquery version "1.0-ml";

for $i in cts:uris((), ("limit=5000"), cts:collection-query("/my/collection"))
return xdmp:spawn-function(function() { xdmp:document-delete($i) })

In above query, URI lexicon is used to return a subset of those (dealing with 5000 URIs from that collection for testing). We are using a call to xdmp:spawn-function for each of those URIs and this is spawning a separate "task" to delete that document (by URI).

Depending upon total number of documents in collection and 5000 URI per batch, you may end up creating too many Tasks in TaskServer Queue. The Task Server (there is a set queue size), the queue size can be increased if/when necessary. But you should be able to create “tasks” to delete 100,000 documents in one go.

It is also important to note that a document can always be part of a large number of collections, so you use these (and cts:collection-query) to scope searches

2. This particular solution helps in deleting a batch of documents in one task as opposed to the above solution where it deletes individual document. Below code helps in achieving batch delete of documents in a collection instead of running individual delete through spawn function. 

xquery version "1.0-ml";

let $batch_size := 10
let $coll_name := "/my/collection"


let $uris := cts:uris((), (), cts:collection-query( $coll_name)) 
let $total_uris := fn:count($uris)
let $total_batches := (xs:unsignedLong( math:ceil( $total_uris div $batch_size)) -1)

let $result := for $batch_index in (0 to $total_batches)

            let $eval := xdmp:spawn-function(function() { fn:subsequence($uris, ($batch_index * $batch_size)+1, $batch_size) ! xdmp:document-delete(.) })
            
            return $eval

return $result
(0 vote(s))
Helpful
Not helpful

Comments (0)