Using collection lexicon for tiered storage | MarkLogic Support

Knowledgebase

108Administration 8App Services 42Errors 144MarkLogic Server 52Performance Tuning

Knowledgebase:

Using collection lexicon for tiered storage 28 February 2017 05:13 PM
Tiered Storage MarkLogic Server allows you to manage your data at different tiers of storage and computation environments, with the top-most tier providing the fastest access to your most critical data and the lowest tier providing the slowest access to your least critical data. MarkLogic Server tiered storage manages data in partitions. Each partition consists of a group of database forests that share the same name prefix and the same partition range. The range of a partition defines the scope of element or attribute values for the documents to be stored in the partition. This element or attribute is called the partition key. The partition key is based on a range index, collection lexicon, or field set on the database. The partition key is set on the database and the partition range is set on the partition, so there can be several partitions in a database with different ranges. MarkLogic Server documentation covers a detailed example on how to use range index as the partition key for tiered storage. This article provides a generic and simple example of using a collection lexicon as the partition key. Collection Lexicon with Tiered Storage Consider a database 'test-db' with 4 forests that are grouped into 2 partitions. Following are the necessary configuration requirements to setup this database for tiered storage. These are settings that can be configured on the admin UI database configuration page (Admin UI - > databases -> {database-name}) - set 'rebalancer enable' to true - set 'Locking' to strict - set 'Rebalancer Assignment Policy' to range - set 'Collection lexicon' to true Under the assignment policy, choose 'Collection Lexicon' as the 'Range index type'. By doing this we are setting the partition key as 'collection lexicon' Partitions are based on forest naming conventions. A forest's partition name prefix and the rest of the forest name are separated by a dash (-). For our example, consider the following forest names and the partitions they will be grouped into: `tier1-forest1` `tier1-forest2` `tier2-forest1` `tier2-forest2` As specified by the forest name, all forests with the same prefix are grouped under one partition. So, in this case, forests with prefix tier-1 are grouped under the first partition, forests with prefix tier-2 are grouped as the next partition, and so on. *Note that all of the forests in a database configured for tiered storage must be part of a partition.* The determination of which partition the data that is ingested should be placed in is made by the defined partition range. All the forests in one partition will have a common range. These are defined in the forest configuration page (Admin UI-> forests-> {forest-name}-> range) For this example, since we are using collection lexicon as the partition key, consider the following ranges for the three partitions - Tier1 lower bound - accounts upper bound - files Tier2 lower bound - journals upper bound - magazines Alternatively, partitions can be created using the REST management API or the xquery/Javascript APIs (). Once this is done, if documents are ingested, for example with a collection "books", that document will be placed into any of the forests in Tier-1. However, there is one caveat with using the collection lexicon for tiered storage - *it works well only if the documents have a single collection. If a document has more than one collection, then the ordering can become random and it can be placed in any of the partitions.* Also, it is best practice to create a 'default partition' (partition without a range), so that any documents which fall out of the defined ranges, will go to the default partition. In the absence of a default partition, the ordering can again be random. Related Documentation MarkLogic Administrator's Guide: Tiered Storage
(0 vote(s)) Helpful Not helpful

Comments (0)