Solutions

MarkLogic Data Hub Service

Fast data integration + improved data governance and security, with no infrastructure to buy or manage.

Learn More

Learn

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Community

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Company

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

 
Knowledgebase:
Content Pump Import loading tab delimited files
29 August 2013 01:16 PM

Summary

CSV files are a very common data exchange format. It is often used as an export format for spreadsheets, databases or any other application. Depending on the application, you might be able to change the delimiter character to a #hash or *asterix etc. One of the default delimiter definitions is a tab character. Content Pump supports reading and loading such CSV files.

Detail

The Content Pump -delimiter option defines which delimiter will be used to split the columns. Defining a tab as a value for the delimiter option on the command line isn't straight forward.

Loading tab delimited data files with content pump can result in an error massage like the following:

mlcp>bin/mlcp.sh IMPORT -host localhost -port 9000 -username admin -password secret -input_file_path sample.csv -input_file_type delimited_text -delimiter '    ' -mode local
13/08/21 15:10:20 ERROR contentpump.ContentPump: Error parsing command arguments: 
13/08/21 15:10:20 ERROR contentpump.ContentPump: Missing argument for option: delimiter
usage: IMPORT [-aggregate_record_element <QName>]
... 

Depending on the command line shell, a tab needs to be escaped to be understand from the shell script: 

On bash shell, this should work: -delimiter $'\t'
On Bourne shell, this should work: -delimiter 'Ctrl+V followed by tab' 
Alternative way would be to use: -delimiter \x09 

If none of these work, another approach you can try is to use the -options_file /path/to/options-file parameter. The options file can contains all of the same parameters as the command line does. The benefit of using an option file is that the command line is simpler and characters are interpreted as intended. The options file will contain multiple lines where the first line is always the action like IMPORT,  EXPORT etc. followed by a pair of lines. The first line is the option parameter and second the value for the option.

A sample could look like the following:

IMPORT
-host
localhost
-port
9000
-username
admin
-password
secret
-input_file_path
/path/to/sample.csv
-delimiter
' '
-input_file_type
delimited_text


Make sure the file is saved in UTF-8 format to avoid any parsing problems. To define a tab as delimiter, place a real tab between single quotes (i.e. '<tab>')

To use this option file with mlcp execute the following command:

Linux, Mac, Solaris:

mlcp>bin/mlcp.sh -options_file /path/to/sample.options

Windows:

mlcp>bin/mlcp.bat -options_file /path/to/sample.options

The options file can take any paramter which mlcp understands. It is important that the action command is defined on the first line. It is also possible to use both command line parameters and the option file. Command line parameters take precedence over those defined in the options file.

(2 vote(s))
Helpful
Not helpful

Comments (0)