Location History of a Node
The merge code frequently deals with the location history of a node (a branch root, typically). At the moment it mainly uses svn_mergeinfo_t to represent the location history. But svn_mergeinfo_t isn't ideally suited for this task:
It includes a non-inheritable flag on each revision range, which is meaningless in this context.
Because of the non-inheritable flags, all the algebra functions require the caller to choose what to do about inheritability, and this option is hard to understand. Since inheritability is irrelevant in this context, this just makes it hard to choose the right option.
It requires an additional has_revision_zero_history flag to fully capture an arbitrary path's history. Branches used in merging are very unlikely to extend to revision zero in real life, so any code intended to make use of this flag probably isn't being tested and may not work.
- It doesn't identify the repository; that information has to be carried externally.
So we should introduce a new data structure that captures the location history of a node. I have made a start on such a thing by introducing a branch_history_t structure and some functions:
location_on_branch_at_rev(branch_history, rev) => pathrev
branch_history_intersect_range(branch_history, oldest_rev, youngest_rev) => branch_history
branch_history_get_endpoints(branch_history) => oldest_pathrev, youngest_pathrev
At present it uses svn_mergeinfo_t (and additional information) internally, but it might be better to change it to use an array of svn_repos_location_t or something else instead. At least it's wrapped a bit better for now.
A terrible example is the function combine_range_with_segments(). It intersects a svn_merge_range_t with a list of svn_location_segment_t and produces a list of merge_source_t. Three different types.
Prototype new merge code in Python
It would be extremely handy to be able to write new client-layer merge code in Python. The easiest way to start doing this would probably be to write a Python command-line program that implements the equivalent of "svn merge" and uses either the swig-Python bindings or the ctypes-Python bindings to access Subversion library APIs. (In particular, don't try to embed a Python merge module inside the 'svn' C program, unless somebody can show us how to do that quickly and easily.)
Single-file merge should be less of a special case
I'm concerned that the present "single-file merge" code doesn't seem to have all the same stuff in it that the directory merge code has. It would be obviously correct if a single "merge a node" function were called regardless whether the node is a file or a dir.
Some things can be simpler for a single file, of course. It might seem obvious that it doesn't need to think about subtrees, as a file can't have subtrees. Even an assumption like that, however, only holds if we don't allow a merge that replaces a file with a directory or replaces a directory with a file. I think we don't allow such a merge, on the basis that two different kinds by definition are not ancestrally related, but that decision is not self-evident. We might want to structure the code such that the "merge" function at this level can merge any arbitrary change that is encountered at a child level inside a (directory) merge, including replacement of one node with another. But I'm not suggesting we should re-structure the code that way now, I'm just exploring a line of thought.
Use svn_client__pathrev_t more widely
(Difficulty: several quite simple sub-tasks, and some harder bits)
A particular concern is where a URL and a rev are currently being gathered, that at first glance should be the URL and rev of the same location, but on closer inspection may not be. For example, at the beginning of filter_self_referential_mergeinfo() there are calls to svn_client_url_from_path2() and svn_wc__node_get_base_rev(), but while the latter is clearly asking about the WC "base" version, the former is not, and may return a URL different from the base URL if the node is locally copied/added/etc.
Here's an example of how useful working with path-revs can be, beyond mere notational convenience. I thought I saw (but can't find it at the moment) a call to svn_mergeinfo__get_range_endpoints() or similar, and one of the resulting revision numbers was passed up the call stack a bit, and then some function was called to trace the history of the branch back to that revision to find the corresponding URL. If get_range_endpoints() would return the full location then that subsequent look-up wouldn't be needed.
use svn_client__ra_session_from_path2 instead of svn_client__ra_session_from_path
Functions that should go away:
svn_client_url_from_path2 -- deprecate, maybe
I'd recommend being wary in your work about the struct being an *input* param in public APIs. There are seriously heavy internal constraints on the struct members. (eg. pass a struct with a NULL uuid, or even a non-matching one). I don't have a recommendation right now for what happens if the struct is made public. I just wanted to raise a yellow flag. It seems best to keep it very private because of the difficult constraints/preconditions in its members. [...] It may be as simple as only returning const structs in the API. Or opaque. But certainly warnings like, "seriously, don't mess around in here. tweak at your own peril."
One way to store the resulting mergeinfo into the WC
(Difficulty: straightforward in theory, moderately complex in practice)
Several functions support two ways: write mergeinfo straight to the WC, or return it to the caller and let the caller write it to the WC. For example, see "result_catalog" parameter of do_merge(). It is not clear that the two implementations currently produce equivalent results, as presumably they should. We should have just one way. The latter seems the one to choose because that would support all existing usage and also would seem to be a good way to enable better kinds of "dry run" implementation.