Unicode Normalization for Path, Mergeinfo and Lock Lookup

This specification is the result of a number of ongoing discussions, starting including issue #2464 and the various discussions gathered on the UnicodeComposition page. It has also been strongly influenced by this blog post, which discusses the solution adopted by the ZFS filesystem.


Any solution to the normalization problem must maintain strict backwards compatibility between clients and servers. This implies that:

The solution also may not drastically affect the performance of the server or working copy. For example, the working copy database cannot use a normalization-independent collation for indexing paths, because that limits SQLite's ability to opimize queries.

For repositories that use the FSFS backend, the solution must not affect the layout of the revision files or directory contents. The repository administrator should be given the choice whether to implement the solution, regardless of format version.

All of the above boils down to:

FSX should incorporate the solution as a mandatory feature. BDB will not support it, ever.


Repository and FS API Implementation

In the FSFS back-end, we use paths as keys in three distinct ways:


Client Implementation

UnicodeNormalization (last edited 2013-12-09 09:37:47 by brane)