Differences between revisions 6 and 7
Revision 6 as of 2012-11-09 16:05:46
Size: 6522
Editor: JulianFoad
Comment: Link to the source code
Revision 7 as of 2012-11-09 17:01:49
Size: 6564
Editor: JulianFoad
Comment: Add a class diagram.
Deletions are marked like this. Additions are marked like this.
Line 43: Line 43:



This page introduces new API. The central part of the API is an abstraction of a generic tree of (usually) versioned nodes -- where each node is a file or directory, with properties.

We already have a generic tree-editing API (svn_delta_editor_t, svn_editor_t). It will be beneficial to have a generic tree-access API as well. The idea is that code can use the same interface to examine the WC base, the WC working state, revision X in a repository, or any other concrete or virtual tree that we may decide to implement. Use of a common API can reduce code duplication, improving consistency of behaviour, simplifying maintenance and reducing bugs. It should be possible to write operations that combine generic trees, such as traversing two trees in parallel, diffing and patching trees. Then we could, for example, extend 'svn diff' to be able to compare any two arbitrary trees, related or not, such as the unversioned tree '/tmp/saved-tests' against the WC tree 'wc/subversion/tests'. Instead of doing this by writing several more variants of the current three subroutines (WC/WC, WC/repos, repos/repos), in principle we would need just one diff subroutine that takes two Tree objects as input. Another goal here is to facilitate writing a three-way merge that fully supports moves and renames.

Key ideas:

  • Provide a generic reference to a tree and to a tree-node.
  • Read a node's basic attributes: kind, properties, text, children.
  • Traverse a tree, visiting each node.
  • Traverse two trees in parallel.
  • Enable the user to associate a node-id with each node in a tree.
  • Traverse two trees, matching up nodes by their node-id; thus enable tracking of renames between related trees.
  • Diff two trees, producing a tree edit (svn_editor_t?).
  • Patch a tree with a tree edit (svn_editor_t?), producing a tree.
  • Allow write access to a tree.

Source Code






Implementations of RA trees, WC trees, disk trees:




Push vs Pull

This is a "pull" interface -- the flow of control pulls data from a tree (when reading, that is), in contrast to using svn_editor_t to read a tree, the data is pushed and therefor the producer is in control of the order of traversal etc.

Push not suitable for diffing etc... ### ...

Tree Classes



Abstract representation of a tree of svn_tree_node_t.

This object presents an interface for referring to, accessing and traversing a tree, usually a versioned tree, of the kind of nodes that Subversion versions.

A tree always has a root node: a completely empty tree is not allowed.

An implementation of this interface could represent, among other trees:

  • a tree in some revision of a repository,
  • the base or working version of a Working Copy tree,
  • an unversioned tree on disk (presumably without properties),
  • a temporary tree in memory, constructed on the fly.


  • find node by relpath


Abstract representation of a tree node.

  • Each node is a file, a directory or a symbolic link, and has a set of properties.


  • relpath (relative to root of the tree)
  • kind (dir, file, symlink)


  • read props
  • read children -> list of svn_tree_node_t

  • read text

Operations On Trees

  • Walk a tree -- visit each directory or each node
  • Walk two trees in parallel
  • Diff two trees, generating an edit stream (svn_editor_t?)
  • Patch a tree with an edit stream (svn_editor_t?), generating a new tree

Tree Walking

Let's use the style of Python's os.walk(dir_path). That is:

  • Visit each directory rather than every node.
  • At each visit, provide both the list of children that are subdirectories and the list of non-directory children ("files").
  • If walking in depth-first (top-down) order, potentially allow the callback to modify the list of subdirectories before we recurse into them.

The Python style is that the walker acts as a "generator" of (dir_path, subdirs[], files[]) tuples. Implementing in C for Subversion, we'll use a callback "visitor" function instead, receiving those things as arguments, and also receiving a baton for closure.

Benefits of a per-dir walk:

  • Efficient -- certain operations can be once per directory rather than once per node.
  • Extensible -- For example, if the callback can modify the list of subdirs to be walked, it can control the depth of recursion and the order of visiting, and even to do things like notify the walker about directories that it has removed or new directories that it has added.
  • Easy to create a wrapper that provides per-node visiting on top of this, whereas wrapping in the opposite direction is not easy.

Open-dir / close-dir:

  • This kind of walking with a single "visitor" callback can't easily provide "open-dir" and "close-dir" notifications.
  • In top-down per-dir walk, "open-dir" is implicit: each visit implies an open-dir; in a bottom-up walk, each visit implies a close-dir.
  • It would be possible to add specific notifications of both "open" and "close" if necessary, with additional callbacks.

Traversal order:

  • Hierarchical (top-down or bottom-up) order makes sense when traversing a single tree.
  • For parallel traversal of two (or more) trees, when moves/renames involved, it doesn't: at most one of the trees can be traversed in hierarchical order, and the other(s) need "random access" to follow moves of corresponding nodes.
  • For parallel traversal, other options include:
    • node-id order

TreeApi (last edited 2012-11-09 17:01:49 by JulianFoad)