Community

MarkLogic 10 and Data Hub 5.0

Latest MarkLogic releases provide a smarter, simpler, and more secure way to integrate data.

Read Blog →

Company

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up →

 
Knowledgebase:
Valid characters in a MarkLogic Document URI
19 March 2015 01:15 PM

Introduction

A document uniform resource identifier (URI) is a string of characters used to identify a name of a document stored in MarkLogic Server. This article describes which characters are supported by MarkLogic 8 to represent a document URI.

ASCII

MarkLogic 8 allows all characters from printable ASCII characters to be used in a document URI (i.e. decimal range 32-196).

List of allowed special characters within ASCII range

<space> ! " # $ % & ' () * + , - . / : ; < = > ? @ [ \ ] ^ _ ` {  | }  ~ 

Please note ASCII character for space (decimal 32) can be used, however it should not be used as a prefix or a suffix.

Other Character Sets

MarkLogic Server supports UTF 8 encoding. Apart from valid ASCII character set mentioned above, any valid UTF-8 character can be used within a document URI in MarkLogic Server. 

Examples include: Decimal range 384-591 for representing Latin Extended-A;  and decimal range 880-1023 for representing Greek and Coptic.

External Considerations

Few interfaces (such XCC/J) and datatypes might place more restrictions on characters allowed in a MarkLogic document URI. For example, xs:anyURI datatype place more restrictions on a URI and restricts use of & (Decimal code 38) and < (Decimal code 60). Consider the following scenario.

A schema is loaded into database and validations are applied before inserting an xml document into the database,