Knowledgebase: MarkLogic Server
MarkLogic Support FAQ
07 March 2022 04:41 PM

Question

Answer

Further Reading

What are the maximum and minimum number of nodes a MarkLogic Cluster can have?

Minimum: 1 node (3 nodes if you want high availability)

Optimum: ~64 nodes

Maximum: 256 nodes

KB Articles:

Documentation:

Are all nodes created equal in MarkLogic?

In MarkLogic, how a node is configured, provisioned, and scaled depends on the type of that node and what roles it might serve:

  • A single node can act as an e-node, d-node, or both ("e/d-node")
  • With respect to high availability/failover, any one node serves as both primary host (for its assigned data forests) and failover host (for its assigned failover forests)
  • With respect to disaster recovery/replication, nodes can serve as either hosts for primary data forests in the primary cluster, or as hosts for replica forests in the replica cluster
  • Bootstrap hosts  are used to establish an initial connection to foreign clusters during database replication. Only the nodes hosting your security forests (both primary security forests as well as their local disk failover copies) need to be bootstrap hosts

KB Articles:

Documentation:

Can I have nodes with mixed specifications within a cluster?

  • Queries in MarkLogic Server use every node in the cluster
  • Fast nodes will wait for slow nodes - especially slow d-nodes
  • Therefore, all nodes - especially all d-nodes - should be of the same hardware specification

KB Articles:

Documentation:

Does MarkLogic support Horizontal Scaling or Vertical Scaling?

  • Both horizontal (more nodes) and vertical scaling (bigger nodes) are possible with MarkLogic Server
  • Do note that high availability (HA) in MarkLogic Server requires at least some degree of horizontal scaling with a minimum of three nodes in a cluster
  • Given the choice between one big node and three smaller nodes, most deployments would be better off with three smaller nodes to take advantage of HA

Documentation:

 

I'm confused about high availability (HA) vs. disaster recovery (DR) - How does MarkLogic do HA?  - How does MarkLogic do DR?

  • High Availability (HA) in MarkLogic Server involves automatic forest failover, which maintains database availability in the face of host failure. Failing back is a manual operation
  • Disaster Recovery (DR) in MarkLogic Server involves a separate copy - with smaller data deltas (database replication) or larger (backup/restore). Switching to and back from DR copies are both manual operations

Documentation:

How many forests can a MarkLogic cluster have?

  • There is a design limit of 1024 forests (including Local Disk Failover forests)
  • If you need more than 1024 forests, look into super-clusters and super-databases

KB Articles

Docuementation:

How to calculate the I/O bandwidth on a ML node?

  • I/O bandwidth of a node can be calculated with the following formula:
    • (# of forests per node*I/O bandwidth per forest)
  • If your node has a 10tb disk capacity
    • # of forests per node: (Disk space/max forest size)
      • Disk space: 10tb
      • Recommended max forest size in ML: 512gb
      • Recommended # of forests for this node: 20 (Disk space/forest size)
    • I/O bandwidth per forest: 20mb/sec read, 20mb/sec write
    • Total I/O bandwidth: 20*20mb/sec (# of forests/I/O per forest)
  • So, If your disk capacity is 10tb, the I/O bandwidth will be:
    • 400mb/sec read, 400mb/sec write
  • Similarly, if your disk capacity is 20tb, the I/O bandwidth will be:
    • 800mb/sec read, 800mb/sec write

KB Articles:

What is the maximum size for a forest in MarkLogic?

  • The rule-of-thumb maximum size for a forest is 512GB
  • It's almost always better to have more small forests instead of one very large forest
  • It's important to keep in mind that forests have hard maximums for:
    • Number of stands
    • Number of fragments

KB Articles:

Documentation:

How many documents per forest/database?

While MarkLogic Server does not have a practical or effective limit on the number of documents in a forest or database, you'll want to watch out for:

  • Size of forests - as bigger forests require more time and computational resources to maintain
  • Maximum number of stands per forest (64) is a hard stop and difficult to unwind - so it's important that your database is merging often enough to stay well under that limit. Most deployments don't come close to this maximum unless they're underprovisioned and therefore merging too slowly or too infrequently
  • Maximum number of fragments per stand (on the order of tens or hundreds of millions). Most deployments typically scale horizontally to more forests (and therefore more stands) well before needing to worry about the number of fragments in a specific stand

KB Articles:

Documentation:

How should I configure my default databases (like security)?

  • The recommended number of local disk failover (LDF) forests for default databases is one for each primary forest
  • For example - each default database (including security) should have one data forest and one LDF forest
  • More LDF copies are not recommended as they're almost never worth the additional administrative complexity and dedicated hardware resources

KB Articles:

What is the recommended record or document size?

100 kb +/- two orders of magnitude (1 kB - 10 MB)

KB Articles:

What is the recommended number of range indexes for a database?

  • On the order of 100 or so
  • If you need many more, revise your data model to take advantage of Template Driven Extraction (TDE)

KB Articles

Documentation

Does it help to do concurrent MLCP jobs in terms of performance?

  • Each MLCP job, starting in version 10.0-4.2, uses the maximum number of threads available on the server as the default thread count
  • Since a single job already uses the all the available threads, concurrent MLCP jobs won't be helpful in terms of performance

KB Articles:

Documentation:

Should we backup default databases?

  • We recommend regular backups for the Security database
  • If actively used, regular backups are recommended for Schemas, Modules, Triggers and other default databases

KB Articles:

Backup/restore best practices?

  • Backups can be CPU/RAM intensive
  • Incremental backups minimize storage, not necessarily time
  • Unless your cluster is over-provisioned compared to most, concurrent backup jobs are not recommended
  • The "Include Replica" setting allows for backup if failed over - but also doubles your backup footprint in terms of storage
  • The "Max Backups" setting is applicable only for full backups

KB Articles:

Documentation:

Do we need to mirror configuration between primary and replica databases? If so, how do we do it?
  • Yes - primary and replica databases should have mirrored configurations. If the replica database's configuration is different, query results from the replica database will also be different

  • Configurations can be mirrored with Configuration Manager (deprecated in 10.0-3), or mlgradle/Configuration Management API (CMA)

KB Articles:

What to consider when configuring the thread_count option for MLCP export?
  • By default the -thread_count is 4 (if -thread_count is not specified)
  • For best performance, you can configure this option to use the maximum number of threads supported by the app server in the group (maximum number of server threads allowed on each host in the group * the number of hosts in the group)
    • E.g.: For a 3-node cluster, this number will be 96 (32*3) where:
      • 32 is the max number of threads allowed on each host
      • 3 is the number of hosts in the cluster

Note: If the -thread_count is configured to use max server threads, it is highly not recommended to use concurrent jobs

KB Articles:

Documentation:

(19 vote(s))
Helpful
Not helpful

Comments (0)