Unicode Composition

This page gathers design notes related to Unicode Composition.

Unicode Composition for Paths

The problem was originally described in the note Unicode Composition for Filenames but has since then been discussed a number of times on the mailinglist.

Different solutions to the issue are described below:


There is now a branch open for the client-side implementation, generally following these design discussions. It embeds the utf8proc library into libsvn_subr instead of using ICU, but otherwise follows the same general pattern.

The plan is to provide the following extensions for SQLite:

Since columns in the database will use non-standard collations, we'll also create a SQLite extension module svnwcdb.sqlext that defines the same collation and operators. A new cmdline tool svnwcdb will launch a SQLite shell with the extension loaded and all other required parameters.

N.B.: LIKE and GLOB are not and should not be used by libsvn_wc because they cannot use indexes. However, for completeness, the svnwcsb.sqlext SQLite extension must override them, otherwise inspecting the working copy database using command-line SQLite tools would not be reliable.

N.B.: with a bit of magic we'll make svnwcdb work with amalgamated SQLite, which happens to be an amazingly good idea if we use amalgamation to override a too-old or broken installed version.

Working Copy Database

Identified issues

Every SQL statement currently used that returns information about the node, e.g., STMT_SELECT_NODE_INFO, must be modified to also return the actual local_relpath it found, since there's no guarantee that the search key will be byte-for-byte identical to the row key. Consequently, functions such as svn_wc__db_read_info must return that column along with all the others. It's an open question whether these changes will have to propagate all the way to the public svn_wc API.

Alternative: These functions already return repos-relpath, which is what should be communicated to the repository and should ideally never come from local disk, except for locally added files

UnicodeComposition (last edited 2013-01-21 21:58:20 by Thomas Åkesson)