There are many client side features that people have desired with Subversion that require the ability to store the state of a working copy without committing to the repository. This document exists to try and flush out a design for the save point feature from which we can build such features.
Before we can look at the design it's helpful to have an idea of what features we want to support in detail.
- Destructive command rollback. We have a variety of commands that either permanently destroy uncommitted local modifications or make it rather inconvenient to put them back the way they were. It would be nice if we could store our state before executing these commands and then provide a command to restore that state if the user decided that the state is undesirable.
Stash. Git has functionality where you can store your working copy state locally in something they call the stash and remove all local modifications to the working copy. This allows you to for instance set aside a piece of work you are doing temporarily to fix some other thing and then after you have committed restore your working copy back to the previous state. Users can simulate something like this somewhat now by taking a diff, reverting, making the other change, committing, and then applying the patch. Except that of course svn patch is not fully capable of applying all modifications the working copy can represent. Something more automatic and that you don't have to keep track of patch files would be far more convenient.
- Workspaces. Otherwise known as local commit/branching. Generally being able to work on changes and then be able to save your state at various points along the way. Allowing these states to be committed as a series of commits or to be squashed into a single commit. Eventually even allowing this local history to be manipulated in more complex ways such as rebasing.
What is a SavePoint
A save point is the stored state of a working copy at a given point in time. It consists of the following pieces of information:
primary key for save point unique between all save points, integer
group of related savepoints this savepoint belongs to, string
optional user defined name of the savepoint, string
the savepoints order within the namespace, integer
boolean, true if the savepoint was created automatically
string describing the modifications in the save point
set of wc db nodes table entires for the nodes (stored in the nodes table or another table, TBD)
data for locally modified files and properties (stored in pristine files, something like that or in some cases stored in wc db)
SavePoints will always belong to a workspace. Within the workspace the member SavePoints will be ordered. Workspace names beginning with "svn:" are reserved for internal Subversion use. Stash will use the "svn:stash" workspace. There will be a current work space, which will default to something possibly "Default". Save points for rolling back destructive commands will be created in the current workspace with the automatic flag set to true. Save Points are considered immutable once created. Workspaces may be mutable but we likely will like to always handle modifying them by creating a new temporary workspace and then copying save points that are unmodified and creating fresh ones with the changed states. Treating them as immutable and recreating them to modify them means if the modification fails for some reason, we have not lost data.
The nodes table of the wc db will have a new column added which optionally specifies a savepoint id. When creating a SavePoint the wc db nodes table data without a savepoint id (that is the current state of the nodes in the working copy) will be duplicated with the savepoint id set. It may be desirable to use a separate table for this e.g. savepoint_nodes. Duplicating the entire nodes table may not be as fast as we desire, in general we desire the creation of save points to be fast since users will likely be doing this often. However, we have decided not to implement optimization of the representation at this time and rather first implement a full copy since it will be simpler for initial implementation and will give us some idea of this implementations performance.
We have not made a final decision on how to store blob data. There are two general locations and one that may be used along side them:
- Use the existing pristine store as is. The advantage of this option is that storage is shared with the pristines. The disadvantage is that it mixes pristine data with data that is not pristine. This risks potential bugs against using less storage.
- Add a new place to store uncommitted blobs on disk. This keeps the stored state data more discrete from the pristines but increases the overall storage required.
- Some smaller files/properties stored directly in the wc db.
The stored data should not be stored in repository normalized form as this would make it impossible to restore the exact file that the user had on disk (e.g. mixed line endings with svn:eol-style set to native, user directly modified data in keyword sections). We feel that it is critical that we are able to restore the precise file data that the user had in their working copy (e.g. user changes to mixed line endings on a file with native svn:eol-style, creates a savepoint and then comes back later intending to change the svn:eol-style property and commit). This may present issues with moving working copies between Windows and *nix platforms due to the native line ending differences (but we may not be that concerned about supporting that).
However, we are not in agreement about what should be stored. One possibility is to simply store the full text. Another possibility is to store the delta. I think that if we store deltas the delta should always be to a pristine and not to another blob we have stored (more on why later).
SavePoints as Local Branch
Workspaces are ordered lists of save points. In essence you have a commit every time a save point is created. Thus you can treat the creation of a new work space and its initial state as the creation of a local branch. Additional save points are added to the work space which are effectively "commits" to that branch. Each following save point in the workspace will be treated as though they are based on the preceeding one. We'll then be able to push the save points to the repository as a single commit (effectively squashing them into a single change) or reply them one by one (using the log messages stored on the save points or prompting when they are missing?). It may be desirable to skip automatic save points when replaying them one by one (user configurable?).
Stash will be presented to users as creating something like a Save Point in a work space but will have a distinct UI. Creating a new stash entry will also revert the working copy back to an unmodified state. Stashes can be crated, renamed, deleted, listed, applied and popped (apply+delete). While the save points in the stash are ordered (to allow pop), they are not considered to be related in any way.
This should be just a matter of deleting save points in the workspace for a range of sequential save points, if we implement our storage correctly. Other scenarios would require rebasing.
- Implement Save Points and the associated UI to create and restore to them (workspaces not exposed to user, always default).
- Implement automatic save points in destructive commands.
- Optimize save point wc db representation if necessary.
- Workspaces (including the ability to submit the save points as commits)
- Implement diff, merge, squash, rebase on work spaces.
- Optimize storage for blobs if we didn't do so in 1.
1-3 (and maybe 4) would likely be a good place to be for the first release with save points. Work spaces while an initial part of the design are delayed in being made available since they'll require a lot more functionality to be useful.
There are many terms we could use for some of the concepts. At this point the terms aren't necessarily final and comments about the terminology are welcome. The terms we're using are intended to generally be user visible terms. Save Point was chosen instead of Check Point (as we've called this functionality in past conversations) since it seems to be more obvious what it does to non-native English speakers. A svn savepoint command can also be abbreviated to svn sp while svn checkpoint could not be abbreviated to svn cp since that's an existing alias for svn copy. We have avoided calling Work Spaces (local) branches since we'd rather not conflate them with the branches we have in repositories. If Subversion ends up with branches as first-class objects using the term would only add to the confusion.
Julian Foad asks:
How does 'stash' behave in relation to checkpoints? It would help to draw a diagram showing sets of workspaces, checkpoints and stashes and the possible transitions between them.
For example, if you're in WS 'bug10' and have made five 'local commits' (savepoints) in it, how do you start a 'quick fix' for bug20 -- do you run just 'svn stash' or have to supply a new name for the new work as in 'svn stash bug20'? What happens to your current WS and what stash is created -- is it called 'bug20', 'bug10', something else? Does 'stash' copy your changes into a new savepoint which is in the 'svn:stash' workspace, but leave you in the workspace you were already in, and revert your changes? Where are your savepoints for bug10 -- five in WS 'bug10' and one in stash 'foo'? Is there any way that stash 'foo' is known to be related to WS 'bug10'? Can you now run 'stash' again (before stash-pop) and if so what happens? Can you now run 'checkpoint' (before stash-pop) and if so what happens? Can you run 'stash pop' from a different WS and if so what happens? Can you 'switch workspace' to the 'svn:stash' workspace?
What aspects of WC state should be preserved and restored by each savepoint? Incompleteness/depth/sparseness? Repo root URLs? Conflicts, including the conflict 'artifact' files? Unversioned files?
What should happen when you switch the WC (or part of it) to another branch and then make a checkpoint? Then what should a 'squashed' commit mean -- just commit the last savepoint no matter what branch it was on, or commit the last-on-each-branch, or something else? Should a multi-commit then commit each savepoint in turn, thus committing to multiple branches?