MarkLogic Fundamentals - How does MarkLogic Server use storage bandwidth?
30 July 2021 12:06 PM
MarkLogic Server is optimized for query performance - if you're coming from a relational database background, you might be surprised by how much storage and storage bandwidth might be used. To better understand this behavior, it's important to recall the following:
Speed over storage savings - While it makes sense to minimize storage footprint from a storage utilization perspective, MarkLogic Server trades space for time to take advantage of rapidly falling storage prices.
Lazy Deletes - To better prioritize query performance, in MarkLogic Server record deletions happen in the form of "lazy deletes" where the record (or "document") is first marked as "obsolete" and consequently hidden from query results. The work of actually deleting any one record is deferred for a later time, when multiple obsolete documents can be removed and your remaining data optimized all at the same time and in bulk during a merge operation.
Index on ingest - MarkLogic Server indexes documents as they're ingested. If your data model and index configuration is where it needs to be, that means your data is ready to be queried as soon as it's in a MarkLogic Server database. If your index configuration isn't quite where you want it, however, revising it means reindexing your entire database, creating lots of obsolete documents and resulting in potentially multiple large merge operations. This is why it's always better in MarkLogic Server to optimize your index settings in smaller environments before propagating those index settings to your bigger environments, and why you'll want to do fewer, bigger index configuration changes instead of many small index configuration changes. Each index configuration change - regardless of size - will trigger a reindex, so you'll want to minimize the number of reindexes you need to perform instead of the minimizing the number of changes in any one reindex.
In addition to reindexing, other aspects of MarkLogic Server that take up significant storage bandwidth include: