Knowledgebase:
Using collection lexicon for tiered storage
28 February 2017 05:13 PM

Tiered Storage

MarkLogic Server allows you to manage your data at different tiers of storage and computation environments, with the top-most tier providing the fastest access to your most critical data and the lowest tier providing the slowest access to your least critical data.

MarkLogic Server tiered storage manages data in partitions. Each partition consists of a group of database forests that share the same name prefix and the same partition range

The range of a partition defines the scope of element or attribute values for the documents to be stored in the partition. This element or attribute is called the partition key. The partition key is based on a range index, collection lexicon, or field set on the database. The partition key is set on the database and the partition range is set on the partition, so there can be several partitions in a database with different ranges. 

MarkLogic Server documentation covers a detailed example on how to use range index as the partition key for tiered storage. 

This article provides a generic and simple example of using a collection lexicon as the partition key.

Collection Lexicon with Tiered Storage

Consider a database 'test-db' with 4 forests that are grouped into 2 partitions. Following are the necessary configuration requirements to setup this database for tiered storage. These are settings that can be configured on the admin UI database configuration page (Admin UI - > databases -> {database-name})

- set 'rebalancer enable' to true
- set 'Locking' to strict
- set 'Rebalancer Assignment Policy' to range
-
set 'Collection lexicon' to
true

Under the assignment policy, choose 'Collection Lexicon' as the 'Range index type'. By doing this we are setting the partition key as 'collection lexicon'

Partitions are based on forest naming conventions. A forest's partition name prefix and the rest of the forest name are separated by a dash (-). For our example, consider the following forest names and the partitions they will be grouped into:

tier1-forest1
tier1-forest2

tier2-forest1
tier2-forest2


As specified by the forest name, all forests with the same prefix are grouped under one partition. So, in this case, forests with prefix tier-1 are grouped under the first partition, forests with prefix tier-2 are grouped as the next partition, and so on.

Note that all of the forests in a database configured for tiered storage must be part of a partition.

The determination of which partition the data that is ingested should be placed in is made by the defined partition range. All the forests in one partition will have a common range. These are defined in the forest configuration page (Admin UI-> forests-> {forest-name}-> range)

For this example, since we are using collection lexicon as the partition key, consider the following ranges for the three partitions -

Tier1
lower bound - accounts
upper bound - files

Tier2
lower bound - journals
upper bound - magazines

Alternatively, partitions can be created using the REST management API or the xquery/Javascript APIs ()

Once this is done, if documents are ingested, for example with a collection "books", that document will be placed into any of the forests in Tier-1.

However, there is one caveat with using the collection lexicon for tiered storage - it works well only if the documents have a single collection. If a document has more than one collection, then the ordering can become random and it can be placed in any of the partitions.

Also, it is best practice to create a 'default partition' (partition without a range), so that any documents which fall out of the defined ranges, will go to the default partition. In the absence of a default partition, the ordering can again be random.

Related Documentation

MarkLogic Administrator's Guide: Tiered Storage

(0 vote(s))
Helpful
Not helpful

Comments (0)