Encoding and Escaping

This pages covers escaping/encoding of paths, names, and values in the context of JCR-based web applications.

Why?

JCR node names have a certain character set, which is actually very broad and includes almost all of unicode minus some special characters such as /, [, ], |, : and * (used to build paths, address same-name siblings etc. in JCR), and it cannot be "." or ".." (obviously).

For XPath queries, the underlying model is that of the JCR repository as an XML document, hence every path step in the XPath is seen as XML name (ISO9075), which is more restrictive than JCR node names and most importantly does not allow names starting with digits. But they can be escaped.

Furthermore, in XPath queries there is the full text search using "jcr:contains()" and this has its own query string format itself, which in Jackrabbit will be that of Lucene.

Then you might often use JCR for web applications where you map URLs to JCR paths - note that JCR node names allow for more than what URLs allow, most notably the space for example.

There are utility methods for escaping/encoding in the org.apache.jackrabbit.util.ISO9075 and org.apache.jackrabbit.util.Text classes. Although developed under Jackrabbit, they are part of the JCR Commons module (jackrabbit-jcr-commons) which only depends on the JCR API.

Escaping paths

If you're building a path from user-supplied names, you need to escape illegal JCR characters (eg "item:1" becomes "item%3A1"):

String path = "/foo/" + Text.escapeIllegalJcrChars(name);

Such paths are useful for JCR methods like Node.addNode(...), Session.getItem(...) etc., but usually only when you create nodes in the first place. Once the node exists, its name just needs to be passed around, but no escaping should happen for accessing the node, since it will already be in the right form, of course.

Encoding path in queries

If you want to use paths in XPath queries, though, you need to escape according to ISO9075 rules (eg "1hr0" becomes "_x0031_hr0"):

String query = "/jcr:root" + ISO9075.encodePath(node.getPath()) + "/" + ISO9075.encode(name);

For a user-supplied string, this could lead to something like ISO9075.encode(Text.escapeIllegalJcrChars(name)). But in most cases the path given to a query is from a known node, so there is no need for escaping it with Text.escapeIllegalJcrChars(name), so just the ISO9075 encoding is required.

Escaping values in queries

For values inserted into the queries, you should do escaping to prevent incorrect values and query injection. Generally, if you enclose values in single quotes, you just need to replace any literal single quote character with '' (two consecutive single quote characters). There is also a Text.escapeIllegalXpathSearchChars(...) method you should use for calls to jcr:contains(...) (see also JCR-1248).

String q =
  "/jcr:root/foo/element(*, foo)" +
  "[jcr:contains(@title, '" + Text.escapeIllegalXpathSearchChars(searchTerm).replaceAll("'", "''") + "')]" +
  "[@itemID = '" + itemID.replaceAll("'", "''") + "']";

Escaping/encoding in URIs

There are further encoding/decoding methods in the Text class for dealing with URIs in a webapp. The allowed chars for JCR names contains the URI set plus a few others (eg. spaces). Thus the URI set is acutally more constrained. Therefore, if you have a valid URI, you can map it directly onto a JCR path without having to worry about escaping (this is by design). If you go the other way, ie. have a JCR path and want to create an URI for it, you simply use plain URI escaping for it. To make everything simpler in the context of URIs, one suggestion is to only create JCR nodes with names that are valid URIs.

See also

EncodingAndEscaping (last edited 2013-03-06 14:22:47 by AlexanderKlimetschek)