Design notes, plans and ideas by JulianFoad
Merge Tracking Ideas
Ways in which we could improve on the 1.6 mergeinfo and merge tracking scheme. Experimental thoughts, not fully thought out.
Logical Change Tracking
This idea came out of adiscussion on IRC.
The idea is to define a more powerful form of merge tracking, as an upgrade from the current (1.6) merge tracking. The additional power is in tracking “logical changes” as they are merged from one branch to another to another in arbitrary ways, including back-and-forth and circular patterns, and still being able to say what changes we “need” to merge from branch B to branch C.
This is, I think, going some way towards what people sometimes call changeset-based merging.
- Support more flexible branching topologies. It doesn’t matter whether changes have already been merged to the target branch directly from this source branch or via any other chain of branches.
- Enable the “reintegrate” purpose to be served by the same automatic merge algorithm as is used for catch-up.
The emphasis is on being able to detect and describe what logical changes are needed, without necessarily being able to perform the merge automatically in all cases. The current state of thinking is that Subversion would be able perform the merge when a candidate revision on the source branch is a merge of logical changes that are either all needed on the target or none of them are needed. But if this candidate revision is a merge and only some of the logical changes in it are needed, then Subversion would, we suppose, stop and print a helpful description of the situation. But even that level of capability is a big improvement on 1.6, and helpful diagnostics for unhandled situations are entirely the sort of assistance a user should be entitled to expect from a VCS.
We track logical changes.
User-facing goals defined in terms of merging the right set of logical changes.
Each commit is either a logical change or a merge of logical changes.
We introduce the concept of a logical change as the fundamental unit of change that is tracked. A logical change starts life as a committed change that is not part of a merge. When that tree-content change is merged to another branch (adapted if necessary to accommodate any physical and/or semantic differences between the branches), the resulting commit is not a new logical change but rather is a merge. A merge is defined as a committed change that includes a mergeinfo change that brings in one or more logical changes.
A logical change has a unique identifier (let’s say the branch and revision in which it was originally committed) and is always identified by that same identifier, no matter what branches it has been merged through or whether it has been merged together with other logical changes in a single merge.
We must be able to identify the logical changes in the system. To identify the logical changes in existing 1.6 mergeinfo, we will classify each commit as a logical change if it is a change without any mergeinfo change, or else as a merge if it includes a mergeinfo change. If it’s merge, then it brings in some pre-existing logical changes and/or merges. By scanning recursively into mergeinfo history, we can identify all the original logical changes brought in by a merge.
The user-facing goals of merging are defined in terms of getting the right set of logical changes onto the target branch. This is in contrast to the 1.6 scheme which is defined in terms of getting the complete set of commits on the source branch onto the target branch. The difference is that we will select a commit in the source branch only if it is a logical change or if it is a merge that brings in logical changes that we don’t have; and not if it is a merge that brings in logical changes that we do already have.
Each candidate revision to merge from the source branch is merged iff it is, or is a merge that brings in, logical changes that we don’t yet have on the target.
If the candidate revision is a logical change, then we merge it iff we don’t have that logical change on the source branch (as determined by the source branch’s mergeinfo).
If the candidate revision is a merge, then we merge it if all the logical changes it brings in are ones we don’t already have, or we skip it if all the logical changes it brings in are ones we do already have.
If the candidate revision is a merge that brings in a some new logical changes that we don’t have and some that we do have, then Subversion can at least bail out, telling the user which changes are present and which are to be merged. At the moment we don’t anticipate being able to untangle the relevant parts of the physical edit from the source branch, nor to fetch the required logical changes from their origin or from some other branch.
The reintegrate purpose is to bring all logical changes from the source branch that are not already in the target branch. That is exactly the same as for a catch-up merge, and so the same algorithm can be used. Users might still want to specify the “--reintegrate” option because of the additional checks that it performs before merging, but that would be optional and for the user’s benefit not for the system’s benefit. A plain automatic merge would still work in that direction even if the old reintegration constraints are not met.
Migration from 1.6
See above about recursive scanning of 1.6 mergeinfo.
Retro-fitting the principle of logical changes onto an existing 1.6 merge history would seem to be a good fit, as it is already common mantra and practice to separate new logical changes from merges. The consequences if this principle has occasionally not been followed in the past would seem to be predictable and relatively straightforward to recover from. Where the history has been altered by record-only merges or direct editing or removal of mergeinfo, however, this method of classifying old commits may be untenable or need augmenting with user input. (Investigate?)
Other issues to explore / define
Reverse merges — first need to define basic semantics before can contemplate supporting.
Subtree merges — semantics?
Merging into a mixed-rev WC — what special considerations apply? Does it help to remember that a WC being merged into is commonly acting as a proto-revision?
MI storage — semantics and format. per branch? whole history in one place?
Rules (differences from 1.6)
###? The Feature Branch relationship can be applied with cycles in the graph of relationships.
- A ⇒ B, B ⇒ A
- A ⇒ B, B ⇒ C, C ⇒ A
###? The Feature Branch relationship can be applied with multiple paths in the graph of relationships.
- A ⇒ B, B ⇒ C, A ⇒ C
- A ⇒ B, A ⇒ C, B ⇒ D, C ⇒ D
- Editable merge history. (Because it increases reliance on the correctness of mergeinfo, and especially mergeinfo changes, which in the current 1.6 scheme is fragile.)
- Quick(ish) traversal of mergeinfo history. This suggests a new storage model in which all the historic mergeinfo (of a given branch?) is in one place.
A Worked Example
Extension idea: The Multi-Commit Merge
We might want to allow multiple revisions to be recorded as being components of the same merge. For example, sometimes a user will choose to commit the initial result of a merge first, and then do further conflict resolution and commit again. Without this extension, the first of these commits would be tracked as a merge, and the second would wrongly be classified as a new logical change. We could design a way to be able to track these two commits as a single logical merge.
The main benefit of linking the related commits through metadata is so that a multi-path merge (A->B, A->C, B->C) or a cyclic merge (A->B->A) can automatically include or avoid merging the follow-ups depending on whether it is including or avoiding the first commit of the group. With 1.7, the user can tell Subversion to avoid merging those follow-ups to any such branch by performing a record-only merge, but the user can only do this as and when such branches are known, not in advance. If the user does not do that record-only merge, Subversion attempts to merge those follow-ups unconditionally, and the user has to notice and edit the result (which the merge tool might or might not flag as a conflict).
(A minor benefit is it provides a standard way to annotate such follow-ups for display purposes, more formally than using comments in the log messages or other arbitrary means.)
At present, the user can emulate this concept by creating a short-lived side branch for the merge. Commit the initial result of the merge onto the branch, make multiple follow-up commits for conflict resolution and bug fixes, and then reintegrate the branch. Or, if the need for follow-up commits is not anticipated in advance, commit the original merge on its main branch, and then create a side branch from that revision as an when any follow-ups are needed. The advantage of using a side branch, rather than just committing follow-ups in head, is that the related changes are linked by metadata (through copy-history).
Disadvantages: It would require user awareness and tool support to make use of it. I don't think that's avoidable: this isn't something that Subversion could know automatically. Anyway the current record-only merge solution requires user awareness and the tool support for it is clumsy.
Example: Merge A:10 to B, committing the result initially as B:13, then doing some more conflict resolution in B:14 and B:16. Arrange somehow (by user input, for example) for B:14 and B:16 also to be recorded as part of the “same” merge: branch B revs 13, 14, 16 jointly comprise the merge of A:10. In a subsequent merge from B to C, assuming A:10 is already on C, that would prevent B:13, B:14 and B:16 from being merged to C.
### If we design a new merge tracking model, I wonder if it would be worth designing in this capability. At first thought it sounds like something that could be unused and unimplemented at first and then implemented later. Until the merge algorithm pays attention to it and somebody populates it, those follow-ups B:14 and B:16 will simply be merged to C and will conflict (physically and/or semantically) just like happens today.
- ### What are the semantics exactly? Does it matter whether A:10 is an original change or a merge? What gets complex when the merge has multiple source changes?
Distinguish Operative and No-op Source Revs
At present the revs we record are ones that have been “considered” from the source branch — regardless whether they contained an original change or a merge or nothing at all.
- ### My overall impression is this is not a useful avenue, but here’s the thought anyway.
The aim of a catch-up merge is to reach a state in which a single continuous revision range (including all operative and no-op revs) is recorded as having been merged from the immediate source branch. If there are gaps but all the gaps are no-op, the merge algorithm searches those gaps and finds that there is nothing to do, and then (potentially, and in practice actually) fills in those gaps in the recorded mergeinfo.
If we were to distinguish between operative and no-op revs, that would help in displaying mergeinfo in a more user-friendly way. ### Specifics?
This info is already discoverable, it’s just not fast.
This distinction would come “for free” if we start recording logical changes rather than physical changes. But then the question, “Are there any eligible changes to merge?” might be harder to answer.