Knowledgebase:
MarkLogic Content Pump (MLCP) FAQ
23 September 2022 08:51 PM

Question

Answer

Further Reading

What is MLCP? MarkLogic Content Pump (MLCP) is an open-source, Java-based command-line tool to import, export and copy data to or from databases.

Documentation:

How do I install MLCP? Refer to our documentation and tutorial for this.

What are the required software for MLCP?

  • MarkLogic Server with XDBC App Server (MarkLogic 8 and later versions come with an XDBC App Server pre-configured on port 8000).
  • Oracle/Sun Java JRE 1.8 or later.

Documentation:

Can I connect to MLCP via Load Balancer?

Yes. You can configure the MLCP tool to connect to a Load Balancer that sits in front of the MarkLogic Server cluster

Documentation:

What are the permissions needed for MLCP operations?
  • 'admin' role or
  • Necessary permissions mentioned in the documentation with additional privileges (for e.g read/update privileges to the database)

Documentation:

Does MLCP offer a way to export triples? MLCP currently doesn’t offer a way to export triples but if you are okay with exporting them as XML files (through a collection name - for managed triples, graph name can be used as a collection name), you can do so by exporting those documents as files through MLCP

KB articles:

Can I configure MLCP to use SSL? Yes, please refer to our "Connecting to MarkLogic Using SSL" documentation for details.
Can I configure Kerberos with MLCP? Yes. Please check Using MLCP With Kerberos for additional details.
How do I ingest data in Data Hub Framework using MLCP? Check the "Ingest Using MLCP" section in our Data Hub Documentation for more details.
Can we use MLCP to read from Amazon S3?

There is currently no direct support between MLCP and Amazon S3.

But you can consider using s3fs-fuse https://github.com/s3fs-fuse/s3fs-fuse to mount the S3 Bucket as a local filesystem and then use MLCP.

Can I filter the data by column values while importing csv via MLCP?   Not in MLCP. But you can use other tools like CORB.

Documentation:

How do I debug/troubleshoot MLCP issues? Check our MLCP Troubleshooting documentation.
Can I export large files in compressed format? Yes, use the -compress option in MLCP's export command

Documentation:

What is -fastload option and when should I use it? The -fastload option can significantly speed up ingestion during import and copy operations, but it can also cause problems if not used properly. Please check the documentation for tradeoffs and other considerations

Documentation:

How does MLCP handle failover?
Failover support in MLCP is only available when running against MarkLogic 9 or later. With older MarkLogic versions, the job will fail if MLCP is connected to a host that becomes unavailable.

Documentation:

Does MLCP support concurrent jobs? No, refer to our knowledge base article for details.

What to consider when configuring the thread_count option for MLCP export?
  • By default the -thread_count is 4 (if -thread_count is not specified)
  • For best performance, you can configure this option to use the maximum number of threads supported by the app server in the group (maximum number of server threads allowed on each host in the group * the number of hosts in the group)
    • E.g.: For a 3-node cluster, this number will be 96 (32*3) where:
      • 32 is the max number of threads allowed on each host
      • 3 is the number of hosts in the cluster

KB Articles:

Documentation:

What are the differences between MLCP and CORB? Check this MarkLogic Stackoverflow discussion for more details.

 

How to handle white space in URI's/folders while loading data in MLCP? Check our "Handling Whitespace in URIs" blog for details.

 

How can I use delimiter in MLCP?

Please check these links for details

Creating Documents from Delimited Text Files

Ingesting Delimited Text with MLCP

Loading tab delimited files

Does MLCP support distributed (Hadoop) mode? No, MLCP in distributed mode has been deprecated since MarkLogic 10.0-4 

How can I invoke MLCP via gradle task?

Check the github "MarkLogic Content Pump (mlcp) and Gradle" documentation for details. 
(4 vote(s))
Helpful
Not helpful

Comments (0)