Location History of a Node

The merge code frequently deals with the location history of a node (a branch root, typically). At the moment it mainly uses svn_mergeinfo_t to represent the location history. But svn_mergeinfo_t isn't ideally suited for this task:

So we should introduce a new data structure that captures the location history of a node. I have made a start on such a thing by introducing a branch_history_t structure and some functions:

At present it uses svn_mergeinfo_t (and additional information) internally, but it might be better to change it to use an array of svn_repos_location_t or something else instead. At least it's wrapped a bit better for now.

A terrible example is the function combine_range_with_segments(). It intersects a svn_merge_range_t with a list of svn_location_segment_t and produces a list of merge_source_t. Three different types.

Prototype new merge code in Python

It would be extremely handy to be able to write new client-layer merge code in Python. The easiest way to start doing this would probably be to write a Python command-line program that implements the equivalent of "svn merge" and uses either the swig-Python bindings or the ctypes-Python bindings to access Subversion library APIs. (In particular, don't try to embed a Python merge module inside the 'svn' C program, unless somebody can show us how to do that quickly and easily.)

Single-file merge should be less of a special case

I'm concerned that the present "single-file merge" code doesn't seem to have all the same stuff in it that the directory merge code has. It would be obviously correct if a single "merge a node" function were called regardless whether the node is a file or a dir.

Some things can be simpler for a single file, of course. It might seem obvious that it doesn't need to think about subtrees, as a file can't have subtrees. Even an assumption like that, however, only holds if we don't allow a merge that replaces a file with a directory or replaces a directory with a file. I think we don't allow such a merge, on the basis that two different kinds by definition are not ancestrally related, but that decision is not self-evident. We might want to structure the code such that the "merge" function at this level can merge any arbitrary change that is encountered at a child level inside a (directory) merge, including replacement of one node with another. But I'm not suggesting we should re-structure the code that way now, I'm just exploring a line of thought.

Use svn_client__pathrev_t more widely

(Difficulty: several quite simple sub-tasks, and some harder bits)

A particular concern is where a URL and a rev are currently being gathered, that at first glance should be the URL and rev of the same location, but on closer inspection may not be. For example, at the beginning of filter_self_referential_mergeinfo() there are calls to svn_client_url_from_path2() and svn_wc__node_get_base_rev(), but while the latter is clearly asking about the WC "base" version, the former is not, and may return a URL different from the base URL if the node is locally copied/added/etc.

Here's an example of how useful working with path-revs can be, beyond mere notational convenience. I thought I saw (but can't find it at the moment) a call to svn_mergeinfo__get_range_endpoints() or similar, and one of the resulting revision numbers was passed up the call stack a bit, and then some function was called to trace the history of the branch back to that revision to find the corresponding URL. If get_range_endpoints() would return the full location then that subsequent look-up wouldn't be needed.

Immediate candidates:

Functions that should go away:

A note from Greg:

One way to store the resulting mergeinfo into the WC

(Difficulty: straightforward in theory, moderately complex in practice)

Several functions support two ways: write mergeinfo straight to the WC, or return it to the caller and let the caller write it to the WC. For example, see "result_catalog" parameter of do_merge(). It is not clear that the two implementations currently produce equivalent results, as presumably they should. We should have just one way. The latter seems the one to choose because that would support all existing usage and also would seem to be a good way to enable better kinds of "dry run" implementation.

MergeCodeImprovements (last edited 2012-05-11 08:18:24 by JulianFoad)