Differences between revisions 6 and 7
Revision 6 as of 2012-11-16 05:38:59
Size: 2931
Editor: brane
Revision 7 as of 2013-01-21 21:58:20
Size: 3205
Comment: Extracting the "column approach" into separate page.
Deletions are marked like this. Additions are marked like this.
Line 12: Line 12:
  * UnicodeCollation - experimental approach leveraging the Sqlite ICU extension.   * UnicodeClientColumns - approach with additional/redefined columns that contain path in the client's composition. (extracted from NonNormalizingUnicodeCompositionAwareness)
  * UnicodeCollation - experimental approach leveraging a Sqlite collation, e.g. the Sqlite ICU extension or a Subversion collation.
Line 14: Line 15:
 * NormalizationOfUnicodeComposition - (could be drafted as a competing proposition)  * NormalizationOfUnicodeComposition - (could be drafted as a competing proposition) normalization of all paths in the repository

Unicode Composition

This page gathers design notes related to Unicode Composition.

Unicode Composition for Paths

The problem was originally described in the note Unicode Composition for Filenames but has since then been discussed a number of times on the mailinglist.

Different solutions to the issue are described below:


There is now a branch open for the client-side implementation, generally following these design discussions. It embeds the utf8proc library into libsvn_subr instead of using ICU, but otherwise follows the same general pattern.

The plan is to provide the following extensions for SQLite:

  • A collation for paths that normalizes to NFD before comparing keys
  • A similar replacement for the LIKE and GLOB operator

    • this will remove the need to specify PRAGMA case_sensitive_like=1 since this LIKE operator will always be case-sensitive.

Since columns in the database will use non-standard collations, we'll also create a SQLite extension module svnwcdb.sqlext that defines the same collation and operators. A new cmdline tool svnwcdb will launch a SQLite shell with the extension loaded and all other required parameters.

N.B.: LIKE and GLOB are not and should not be used by libsvn_wc because they cannot use indexes. However, for completeness, the svnwcsb.sqlext SQLite extension must override them, otherwise inspecting the working copy database using command-line SQLite tools would not be reliable.

N.B.: with a bit of magic we'll make svnwcdb work with amalgamated SQLite, which happens to be an amazingly good idea if we use amalgamation to override a too-old or broken installed version.

Working Copy Database

Identified issues

Every SQL statement currently used that returns information about the node, e.g., STMT_SELECT_NODE_INFO, must be modified to also return the actual local_relpath it found, since there's no guarantee that the search key will be byte-for-byte identical to the row key. Consequently, functions such as svn_wc__db_read_info must return that column along with all the others. It's an open question whether these changes will have to propagate all the way to the public svn_wc API.

Alternative: These functions already return repos-relpath, which is what should be communicated to the repository and should ideally never come from local disk, except for locally added files

UnicodeComposition (last edited 2013-01-21 21:58:20 by Thomas Åkesson)