Differences between revisions 1 and 2
Revision 1 as of 2013-12-02 13:36:23
Size: 1478
Editor: brane
Comment:
Revision 2 as of 2013-12-09 09:37:47
Size: 2235
Editor: brane
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Unicode Normalization for Path and Mergeinfo Lookup = = Unicode Normalization for Path, Mergeinfo and Lock Lookup =
Line 15: Line 15:
All of the above boils down to:
 * The server must accept paths in any representation that can be normalized to the same byte sequence as the as the normalized representation of the stored path (or mergeinfo entry or lock path).
 * The client must send paths (and mergeinfo entries and lock tokens) in exactly the same representation as it received them from the server.
Line 16: Line 20:

== Solution ==

=== Repository and FS API Implementation ===

In the FSFS back-end, we use paths as keys in three distinct ways:
 * during lookup of directory entries physically stored on disk;
 * for writing and reading entries in the node cache;
 * when writing directory entries for new and changed nodes in transactions.

...

=== Client Implementation ===

Unicode Normalization for Path, Mergeinfo and Lock Lookup

This specification is the result of a number of ongoing discussions, starting including issue #2464 and the various discussions gathered on the UnicodeComposition page. It has also been strongly influenced by this blog post, which discusses the solution adopted by the ZFS filesystem.

Constraints

Any solution to the normalization problem must maintain strict backwards compatibility between clients and servers. This implies that:

  • we cannot change the network protocol to require that all paths are normalized;
  • the server cannot store paths, or return them to clients, in a different representation than the one they were originally created with.

The solution also may not drastically affect the performance of the server or working copy. For example, the working copy database cannot use a normalization-independent collation for indexing paths, because that limits SQLite's ability to opimize queries.

For repositories that use the FSFS backend, the solution must not affect the layout of the revision files or directory contents. The repository administrator should be given the choice whether to implement the solution, regardless of format version.

All of the above boils down to:

  • The server must accept paths in any representation that can be normalized to the same byte sequence as the as the normalized representation of the stored path (or mergeinfo entry or lock path).
  • The client must send paths (and mergeinfo entries and lock tokens) in exactly the same representation as it received them from the server.

FSX should incorporate the solution as a mandatory feature. BDB will not support it, ever.

Solution

Repository and FS API Implementation

In the FSFS back-end, we use paths as keys in three distinct ways:

  • during lookup of directory entries physically stored on disk;
  • for writing and reading entries in the node cache;
  • when writing directory entries for new and changed nodes in transactions.

...

Client Implementation

UnicodeNormalization (last edited 2013-12-09 09:37:47 by brane)