How to Add Moves to Svn

Summary

Subversion needs to handle moves and renames better than in version 1.8. This paper presents the rationale and a plan for doing so.

To achieve the behaviour that users expect, it is necessary to distinguish a move from a copy-and-delete, and thus necessary to track moves explicitly. We therefore introduce moves into Subversion as a new semantic operation which preserves node identity and is distinct from copy-and-delete.

We will preserve backward compatibility between move-aware clients and move-unaware repositories, and between move-aware repositories and move-unaware clients. In all cases, simple compatibility will be available by falling back to copy-and-delete. In some cases, heuristic detection of moves may be offered as an option.

Move vs. Rename

We say “move” or “rename” interchangeably for most purposes. Their essential similarities include the concept of a preserved node identity. It can be useful sometimes to draw a distinction. When merging a rename-only (A/foo → A/bar) with a move-only (A/foo → B/foo) we can suggest that the most likely merge resolution would be to apply both the move and the rename (→ B/bar).

Why Not Just Copy and Delete?

I believe we need explicit move semantics, although it is difficult to succinctly explain why.

In many simple cases, a copy and a delete are adequate to represent and convey a move. Clearly, Subversion has worked well enough in this way for many users. However, for many others the current level of support for moves simply does not work.

The big question is: do we definitely need to represent moves as a distinct operation, or should we continue to represent moves as copy and delete but make the interpretation of these cleverer? Is the deficiency in the semantics, or is the deficiency something we can rectify by making the merge algorithm and the conflict resolution smarter?

When is copy-and-delete not adequate? It mainly shows up in merging. We want to merge all the changes destined for node 'foo' in the source branch into the corresponding node in the target branch, even though the corresponding node has been renamed to 'bar'. We need to be able to calculate a tree difference between two or three trees, in which each node in one tree is matched against the corresponding node in another tree, and the correspondence needs to follow renames.

How should the semantics of “move” differ from those of copy and delete? After all, it would not matter whether we distinguish a move from a copy and delete internally if we end up applying the same behaviour for both.

…?

Some previous ideas about how much we need to track moves explicitly:

Arguments against treating any copy in the same way as the copy half of a move:

Combining Changes

The problems with copy-and-delete boil down to various kinds of ambiguity, inconsistency or non-determinism. Many of these are related to the problem of representing a sequence of changes as a single change. It is fundamental in a version control system to be able to update, merge or diff between two widely separated revisions without having to step through all the intermediate revisions in sequence, and so it is necessary to have an unambiguous way of combining successive changes. If we attempt to interpret copy-and-delete as a move, that leads to ambiguous or context-dependent results when combining changes.

In one context, a certain copy and delete can be paired uniquely and thus interpreted as a move, while in another context the same copy and delete are not unique or are not both visible.

Move Semantics

This section specifies the logical semantics of the versioned move operation that is the basis of move tracking, independent of any implementation.

Node-Line-Id

Assume that each versioned path in a revision has an identifier that we will call its node-line-id. The node-line-id need not physically exist: it is a concept used in the definition of moves but not necessarily in the implementation.

The node-line-id is preserved when the content of a node is modified. A new node-line-id is assigned to every new node that is created by addition or by copying, including when it replaces a previous node at the same path. This new node-line-id is unique within the whole repository. Within any given revision, each node-line-id is unique among all the paths.

The node-line-id is similar to the (node-id, copy-id) tuple in the existing Subversion filesystems, except that the lazy-copy mechanism does not assign a new copy-id to a child of a copy until that child (or one of its descendants) is modified. Therefore an unmodified child of a copy has the same (node-id, copy-id) as the corresponding child path of the copy source, whereas (by definition) it has a new node-line-id.

Let the term node-line refer to the set of PATH@REV locations that have a given node-line-id.

Definition of Move

Given a node-line N and two revisions rX and rY (X < Y), the definition of N being moved in rY with respect to rX is:

Properties of Move

Some properties of the move relationship are:

In the notation A@X: A represents a parent directory node-line-id and child name (rather than a full path), A != B, B != C, C != A; and X represents a revision number, X < Y < Z.

Move versus Rename

In the versioned data model semantics, “move” refers to a change of parent node and/or a change of name.

At a higher level of semantics, for example when resolving conflicts during merge, it can be useful to distinguish between renaming and moving to a different parent node.

Can't Move a Child of a Copy

Moving a child of a copy, within the same revision, is not tracked: it is an unversioned operation.

A versioned "move" takes a node that existed in the previous revision and places it in a new location. A copy, however, always creates new nodes, conceptually, even if the internal representation is a "lazy copy" pointer to the old node. Moving a child node therefore is a rearrangement of the new content. It is semantically the same as deleting the child node and creating a copy of it somewhere else. Compare with copying a node and then moving that copy somewhere else.

If we perform a copy and then move a child of it, either in a WC or in a repository, this should create a copy with a deleted child, and then another copy somewhere else which is the "moved" child in its new location. We can describe the relationship between the initial and final states perfectly well without saying "move", in the form "the subtree at path P@X is copied to path P1@Y, except for its child C which is copied to path P2/C@Y instead". Thus there is no loss of semantic information despite the absence of “move” in the result.

Resurrecting a Deleted Node

A possible extension to the move semantics would be to allow a previously deleted node to be resurrected at the same location or at a different location.

The mental model is this. When a node is deleted, it is unlinked from the versioned tree, but its content continues to exist in the repository. When the node is resurrected, a link to that last version of its content is put into the versioned tree, like undoing the delete. The new link can be made anywhere in the versioned tree, like undoing the delete and moving the node at the same time.

This is not merely a blue-sky curiosity: it may necessary in order to ensure logical completeness. For example, let's say the node initially at path A/foo is moved to B/foo and then back to A/foo. If we create a branch C from A, and are continuously merging all changes under A into C, then C/foo will be deleted and will later be recreated. Or if 'svnsync' is replicating only subtree 'A' of repository R1 into repository R2, then in repository R2 we will see A/foo disappear and then reappear. In cases such as these, the 'reappearance' should be modeled as a resurrection; if it is modeled as a plain add or as a copy, it will not have the correct semantics of being the 'same' node that it was before.


System Overview

Move support can be added in phases. The “core components”, outlined in yellow in the following diagrams. must be upgraded to get a basic level of support in which commits and updates support moves. The other components, including merge, can be supported later.

Client side:

SystemBlockDiag-p1.png

Server side:

SystemBlockDiag-p2.png

Core Components

Client ↔ RA ↔ Repos

The only Client → Repos op affected is Commit. (There are also simple commit actions, of which 'move URL URL' is probably the only relevant one.) Commit will use a move-aware delta-editor. We already have local moves in the WC, so we just need to describe those to the editor as moves.

gstein sez: RA_local's Ev2 interface (see svn_ra__get_commit_ev2) will transfer a move from the RA editor all the way to the FS API. See libsvn_fs/editor.c:move_cb(). When the FS grows an API, then RA_local will drive it.

The Repos → Client ops affected are Diff, Update, Switch, Status. These all work in a very similar way. Each one sends a Report and receives a delta-edit. They will use a move-aware delta-editor.

Commit

WC descibes each move from the WC DB to the move-aware editor.

When not using a move-aware editor, WC describes each move from the WC DB as copy & delete in the old way.

gstein sez: why would you ever have a non-move-aware editor? See the ev2-export branch for a commit process that drives an Ev2 editor (meaning: a move-aware editor)

Update/Switch/Status/Diff

WC receives each move from the move-aware editor.

WC performs the move. (non trivial)

Moves do not conflict with edits (of a file and/or of a tree).

When not being driven by a move-aware editor:

Repos ↔ FS ↔ FSFS

TODO...

gstein sez: Repos has an Ev2 commit editor, which ra_local can use. This drives the FS Ev2 commit editor. Currently, the moves are transformed into copy/delete, but that can be fixed "trivially" by adding an FS API (which does copy/delete under the covers) and converting fs/editor.c:move_cb() over to using it. Then the move issue is completely within FS.

Within FSFS

Introduce 'move' as a new, distinct operation. Add move-tracking APIs.

In existing FSFS (format 6)

Alter the node id and copy-id assignment rules.

Adjust implementation of existing APIs to see those moves as copies (for back-compat).

Add new APIs as detailed below.

In a new FSFS

Move will be a first-class operation.

New FS APIs

Provide new APIs that see moves as moves:

TODO...

Reporter

Needs no changes. (The reporter does not report changes, it just reports a base state.)

Tree Editor

For a move-aware editor, svn_editor_t (“Ev2”) is the goal. It was designed to support moves, and to have several other advantages over the old editor.

I have some difficulties with the description of move support in it, which we need to resolve.

There is also a possibility of transparently sending move information over the old svn_delta_editor_t. This can be a useful short-term tool for getting some client and server components upgraded before tackling the protocols and intermediate APIs.

Moves in svn_editor_t

...

Ways to Transmit Moves over Ev1 and similar

Here are some notes on ways to transparently send move information over the old svn_delta_editor_t, or into a similar API such as the FS vtable.

Using these methods, a move-aware producer will drive the existing editor interface in a way that is (more or less) backward compatible with existing consumers.

Add a 'move' operation and negotiate the use of it

Introduce a 'move' method in the vtable.

Use feature negotiation, when driving or receiving an edit over the network to a potentially older client or server), to decide whether the 'move' operation is allowed. When driving an edit, we must send copy & delete instead of move. When receiving an edit, we cannot always assume copy & delete means move because, when a move-aware edit driver sends or receives copy & delete, it does not mean move.

Entry-props method for Ev1 (backward-compatible)

Augment the existing 'delete' and 'copy' ops with move info that is ignored by old consumers. Src path and Dst path of a move are transmitted in entry-props before (or during) the Del and Add.

A move-aware consumer will process a pair of 'add' and 'delete' ops as a move if the additional move info is present, or in the old way if not. Thus the scheme is backward compatible in both directions.

The entry-props can be attached to any convenient path. A convenient choice is the parent dirs of the two operative paths.

Copy-from-rev = -1 method (not compatible)

For certain interfaces, including certain instances of Ev1, we can define a particular consumer will perform a move when it receives

Add(copy-from-path = X, copy-from-rev = -1)

which means “move from X to here”.

The consumer has to handle each Delete in such a way that it can later be converted to a move if and when a move arrives.

This is not backward-compatible with existing editor transports nor with existing consumers. Consumers obviously would fail, and editor transports often assert that the copy-from-rev is valid when copy-from-path is valid.

This method can be used where back-compat is not needed, such as in the RA-local interfaces, if it has any benefit over the entry-props method.

Both methods: careful with the paths

Define the frame of reference for the paths carefully. In a nested move, the delta editor will see the 'move from' path in one frame of reference during the Del operation, and in another frame of reference during the Add op.

In a sane editor drive [1], once a path has been added it is not subsequently moved, so any Add path corresponds directly to a path in the final state of the tree. However, after a Del (mv-away), any of the implicitly deleted children of that subtree may subsequently be the source of a move.

Other Components

Merge

Merge is too big a topic to discuss here.

Diff

Enhance the diff output formats to show moves:

Backward Compatibility

We can and will preserve backward compatibility between move-aware clients and move-unaware repositories, and between move-aware repositories and move-unaware clients. There are two complementary parts to this:

Heuristic Detection of Moves

The server could perform heuristic detection of moves when an old client is committing.

The client could perform heuristic detection of moves when an old server is sending an update (or diff or merge etc.).

We could offer to perform heuristic move detection when upgrading an old repository, almost certainly as an off-line operation. We could potentially implement this in any of: svndumpfilter, svnsync, svnadmin load, svnadmin upgrade, etc.

Repo Format Bump

It is essential that the repository filesystem 'knows' whether move semantics are enabled, because copy-and-delete must then no longer be interpreted heuristically as a move. This could be indicated by bumping the FS format number, if it applies to the whole repository, or potentially we could mark that all revisions after a certain point have move semantics enabled whereas prior revisions don't.

MoveDev/MoveDev (last edited 2013-09-05 11:38:42 by JulianFoad)