Design notes, plans and ideas.

This is not part of the official documentation of Subversion. It is aimed at the developers of Subversion, and may not reflect reality or the project community's plans. For official documentation, see http://subversion.apache.org/docs/ .


What Merges Does Svn’s Merge Tracking Support?

We need to be able to say, “If you do all your merging like <this>, it will work like you expect.” So what is <this>? What are the supported scenarios, limitations and rules, and what can the user expect within and outside those scenarios?

The information in this document is not intended to repeat the functional documentation of the “merge” command itself, but rather to explain the ways in which that command can be used effectively.

Merge Tracking vs. Merging

Merging is a broad term. We’re talking here about here merging changes that have been made on one branch into another branch (1). The whole process of merging includes several steps which may be manual or automated:

Merge tracking is a mechanism by which Subversion remembers which changes you've merged from one branch to another and uses this information to honour a request to merge "all the unmerged changes" from one branch to another. Merge tracking is about deciding which changes to merge, and about recording which changes have been merged. As far as merge tracking is concerned, a "change" means the change that was committed in a particular revision to a particular node or tree.

A related but different issue is how to apply the selected changes to the target branch. The task of applying the changes to the target working copy can be called tree-merging: that is, taking a diff between two trees and applying this diff, node-wise and text-wise, to a third tree. In other words, looking at a given change on branch A, how should we edit branch B so as to make the "same" change? Observe that the current state of branch B probably is not quite the same as branch A was when the change was made on branch A, so we cannot use precisely the same edits. For this task we currently use a built-in algortithm on the tree structure level (deleting, adding and replacing nodes), and a three-way merge algorithm on the file-content of text files (deleting, adding and replacing lines of text).

So merge tracking is a layer on top of tree-merging, with a clear separation of tasks.

This document is concerned primarily with the kinds of merges in which Subversion’s merge tracking takes effect -- mainly the automatic merges (also known as sync or catch-up merges, and reintegrate merges) which merge all of the unmerged changes from the source branch, but also including cherry-pick merges where merge tracking takes effect.

Identifying Logical Changes vs. Commits

and Why Conflict Resolution is Part of Merging

To discuss the nuances of what exactly is tracked in Merge Tracking, we need to think about the difference between a raw change in the repository (a commit) and a concept that we can call a logical change.

This subsection describes a theoretical basis for understanding merging. It does not directly describe the behaviour of Subversion's merge tracking at the moment.

Whenever I commit a change that’s not a merge, we can regard that commit as introducing a new logical change into my project, on a given branch. But when I merge that change to another branch and commit the result, that commit is not creating a new logical change, rather it is creating a new physical representation (in the target branch) of the same logical change. In simple cases the physical change often looks “the same” when viewed as a diff, perhaps with the exception of line numbers and surrounding context lines. In trivial cases, such as a catch-up merge to a target branch that does not yet contain any modifications of its own, the physical change is identical. In general, however, the physical change on the target branch is necessarily different from the source branch change. It is adapted automatically (by Subversion or the user's configured 3-way merge tool) and/or manually (through conflict resolution), to fit the current state of the target branch. (2) (3)

The distinction between a commit and a logical change is important for merge tracking. The whole purpose of merge tracking is to decide whether to select a given change on the source branch to be merged onto the target branch or whether that logical change has already been put there. The question is about whether the same logical change is already present on the target branch. The question is not about the physical representation of that change, since we don't expect the change on the target branch to consist of exactly the same physical edits as it did in the source branch. Nor do we care whether whether the change arrived on the target branch by being merged directly from the original revision on the original branch A where that logical change was first committed to the repository, or whether it arrived in the middle of a merge of a batch of changes from branch C, if branch C received the change in a merge from branch B which was merged from branch A. When we want to merge "everything" from branch C to branch A, the merge tracking needs to be able to consider the possible changes to be merged from branch C and realize that the change at revision 130 on branch C is just a merge from branch B which was itself a merge from branch A, and thus we must not try to merge revision 130 of C to A because we already have the logical equivalents of that change in branch A.

In order to make sense of this, I consider the definition of a tracked change to be:

The meaning or purpose of a logical change is typically described in the log message of the original commit. A realistic example is "Delete function foo() and change all callers to use the similar function bar() instead".

Merge tracking should treat every original commit as that kind of "logical change". This definition doesn't presuppose that every commit is *actually* relevant to every other branch in your project. If and when you want to merge this change to a branch that doesn't have a function foo() and code that calls it, then clearly you have logical conflict between the states of the two branches. This change and almost certainly a series of other changes will either be inapplicable or will need some minor or major editing to resolve the logical conflict to make them applicable. Resolving such conflicts, including omitting changes by hand that the system says logically should be merged, is a normal and expected part of merging.

The merge algorithm cannot possibly "know" by inspection whether the physical change that was committed as a merge is a faithful representation the set of logical changes that the mergeinfo claims it is. As far as merge tracking is concerned, if the mergeinfo says a given logical change was merged, it was merged. User interfaces should try to assist the user in understanding this.

The UI may also provide the user with a way to tell Subversion that he/she has merged a given logical change into the WC manually (not using the 'merge' command), or has merged a change into the WC and then decided to remove it before committing and not record it as merged. The command-line interface for these is "svn merge --record-only".

IMPLEMENTATION

The merge tracking implementation is written in terms of tracking physical changes (commits), not logical changes. In simple cases that gives exactly the same result. In particular, it gives the same result when merging from branch A to branch B if we have only ever been merging from branch A to branch B. When we introduce a third branch C, however, and wish that merge tracking would continue to work when we merge from A to B to C to A to C to B to ..., that's when we need to make the distinction.

Since Subversion 1.5 (through 1.8 at least), the merge tracking stores mergeinfo transitively, so after merging a change from A to B to C, the mergeinfo on branch C will list both the change on branch B that was merged directly to C, and also the change(s) in branch A that were merged directly to branch B and therefore indirectly to branch C. This partly serves to make the merge tracking act as if it is tracking logical changes. A subsequent merge from A to C will not repeat the same changes that are recorded as already merge (indirectly) from A to C, and so in that case it correctly avoids re-merging the logical changes from A that are already on C. But if we try things another way round -- first merge from A to C, and then from A to B to C, the last merge (B to C) will wrongly repeat the changes that already arrived on C in the first merge (directly from A).

Merging Scenarios for Subversion 1.5 through 1.7

This section aims to describe the restrictions that should be observed in order for merge tracking to remain effective in Subversion 1.5 through 1.7.

A, B, C, …

branches

A:3

the change in branch A that was committed as revision 3

A ⇒ B

a high-level relationship assumed between branches A and B, in the indicated direction

A → B

a merge from branch A to branch B

General Concepts

What’s a “Change”?

The unit of change that Subversion tracks is the change that was committed in a specific revision, scoped pathwise to a specific subtree.

Tracked Merge

  1. When the “merge” command updates the mergeinfo of the target branch to record the merge, the result is the kind of merge that Subversion is able to track. In this document, “a merge” generally refers to this kind of merge.
  2. A tracked merge is transitive. In merging A:10 from A to B, if A:10 was itself a merge that brought in one or more recorded changes from elsewhere (say Z:9), then as well as recording A:10 on B, we also record Z:9 on B.

Non-Tracked Merge

  1. You can use the “merge” command to merge changes in such a way that the committed result is not recorded as a merge. In terms of merge tracking, Subversion sees this commit as an original change, not as a merge.
  2. When would a non-tracked merge be useful? Matt Phipps wrote to dev@ to suggest one case: A merge like that would make sense if you wanted to do a Git-style rebase for whatever reason. You would create a new project branch off your development branch, cherry-pick all the non-merge changes from the old project branch onto it (possibly grouping or reordering or deleting commits), then delete the old branch. The new branch shouldn't have mergeinfo from the old branch since the old branch is getting deleted and replaced.
  3. ### WHICH MERGE COMMAND VARIANTS PERFORM A NON-TRACKED MERGE? WHEN DO THEY HAPPEN SILENTLY? One example of a change that is "silently" untracked is a reverse merge from the branch's own history.

Record-Only Merge

  1. A “record-only” merge acts like a normal merge except that it does not make edits to the target branch, it only updates the mergeinfo on the target branch as if that merge had happened. After a record-only merge has been committed, although only the mergeinfo changed in that commit, the merge tracking logic sees that commit as the time when the merge (the one that is now recorded) was performed. It doesn’t notice that there is no physical change in the commit. It doesn’t care whether the physical change was in fact merged into this target branch in the past or will be merged in the future. If and when the physical change is merged as a non-tracked merge, that commit will be regarded as an original change and not as a merge.

The Immediate Source Branch

  1. Merge tracking only notices what changes have been merged from the immediate source branch. Using automatic (catch-up) merges, say we merged A:10 into B as B:13, and A:10 into C as C:14, and now we are about to merge “all unmerged changes” from B to C. When considering what to merge from B, we only look at what revisions on the source branch B are recorded as having been merged into C, and let’s say we find all changes from B are recorded there except B:13. Subversion will try to merge the commit B:13 from B to C, and it will conflict, at least logically and probably physically, because it is just an adjusted version of the original change A:10 which has already been merged into C directly. This is a limitation of the merge tracking algorithm.

The Feature Branch Scenario

This scenario is a way of using merges with a branch that has a limited lifetime and is going to be merged back into the branch that it is based on. A typical usage is to develop a new feature or a non-trivial bug fix in a software project.

Suppose the purpose of branch B is to develop some changes based on branch A. From time to time we will bring all the latest changes from branch A onto B, so that the development branch B is always based on a recent snapshot of the state of branch A. When finished, we will merge the changes that were developed on branch B into branch A. At that time, we may have finished with branch B, or we may want to continue with some further development on it and merge that second phase of development to branch A, and so on, but eventually we expect to finish with branch B.

We call this relationship A ⇒ B the Feature Branch relationship.

  1. Branches A and B must be ancestrally related. Usually, B is created as a copy of A, but any history in which the nodes A and B share a common ancestor is acceptable.
  2. Merges are performed from and to the root of a branch, unless otherwise stated. (See Subtree Merges.)

Catch-Up, aka Sync

  1. An “automatic” merge A → B, at any time, brings B “up to date” with A by merging all revisions from A that are not already (recorded as being) in B, into B. The revisions are not merged individually but in batches, where each batch is as big as possible.

Reintegrate

  1. Reintegrate B → A, when B is sufficiently up to date with A.
  2. After a reintegrate, before any further catch-up or re-integrate merges between A and B, it is necessary to do a record-only merge from A to B. See the Keeping a Reintegrated Branch Alive section in the Book for details.

Cherry-Pick

  1. A → B. You can cherry-pick any individual revision or revision range from A that is not already recorded on B.
  2. B → A. You can cherry-pick any individual revision or revision range from B that is not already recorded on A and is not itself the result of a merge from A. (If it is the result of a merge from A, then ###.)
  3. You should NOT cherry-pick a change from A to B or from B to A that is already recorded on the target branch. (Subversion attempts the merge but does not record it.) (### Are we sure?)

Multiple Branches

A rough summary of the multiple-branch rules would be: "You must not merge changes between any two branches A and B through more than one route."

  1. The Feature Branch relationship can be daisy-chained.

    • A ⇒ B, B ⇒ C, C ⇒ D, …
  2. The Feature Branch relationship can be applied one-to-many.

    • A ⇒ B, A ⇒ B2, A ⇒ B3, …
  3. The Feature Branch relationship cannot be applied many-to-one.

    • NO: A ⇒ B, A2 ⇒ B, A3 ⇒ B, …

    (That would be a very strange and probably nonsensical branching pattern.)
  4. The Feature Branch relationship cannot be applied with cycles in the graph of relationships.

    • NO: A ⇒ B, B ⇒ A

    • NO: A ⇒ B, B ⇒ C, C ⇒ A

  5. The Feature Branch relationship cannot be applied with multiple paths in the graph of relationships.

    • NO: A ⇒ B, B ⇒ C, A ⇒ C

    • NO: A ⇒ B, A ⇒ C, B ⇒ D, C ⇒ D

Reverse Merge

  1. A → B. You can cherry-pick A → B, the reverse of a change “c1” on A that was previously merged (and recorded as merged) to B. Subversion records that the change is no longer present on B, just as if that change had not previously been merged from A to B, and so “c1” is eligible for being included in a catch-up merge again.
  2. A → A. In any branch A you can reverse cherry-pick a change “c1” from A’s own history to undo that change. If the change “c1” was an original change in A, this is recorded as a manual edit and is not tracked as a merge. If the change “c1” was a merge (say change “c0” from branch Z) into A, or included such a merge, ### ? that’s fine too, as long as “c0” is currently recorded on A. Subversion records that the change is no longer present on A, just as if that change had not previously been merged, and so “c1” is eligible for being included in a catch-up merge again. But if “c0” was not currently recorded on A (for example, if it already had been reverse-merged once and not subsequently forward-merged), that is not valid; Subversion will perform but not record the merge.

Subtree Merges

  1. ### ?

The Release Branch Scenario

In this scenario, a branch is expected to have occasional edits, occasional cherry-pick merges to and/or from the branch it’s based on, and there is no special merging to be done at its end of life. The set of rules is much simpler than for the release branch scenario.

Multiple Branches

Cherry-Pick

  1. A → B. You can cherry-pick any individual revision or revision range from A that is not already recorded on B.
  2. B → A. You can cherry-pick any individual revision or revision range from B that is not already recorded on A.
  3. You should NOT cherry-pick a change from A to B or from B to A that is already recorded on the target branch. (Subversion attempts the merge but does not record it.)


Notes

Note 1

Other possible kinds of merging include taking any arbitrary difference between two files or directories -- perhaps on different branches or without any branching history connecting them -- and merge that difference into some other file or directory (perhaps unrelated to either side of that diff). Merge tracking does not try to address such cases, and so we do not consider them here.

Note 2

Although what we mean by the physical manifestation of a change is kind of obvious, in fact it is subtly hard to pin down, because it depends entirely how you represent the physical change.

Note 3

Also, if this change is being merged into a target WC in which other changes have also being merged, and touches some of the same nodes, then similarly this change is also adapted (automatically and/or through manual conflict resolution) to combine with those other changes.


This page was written initially by JulianFoad and may be edited by others.

SupportedMergeScenarios (last edited 2014-06-27 10:00:02 by JulianFoad)