Design: SavePoints

There are many client side features that people have desired with Subversion that require the ability to store the state of a working copy without committing to the repository. This document exists to try and flush out a design for the save point feature from which we can build such features.

Expected Features

Before we can look at the design it's helpful to have an idea of what features we want to support in detail.

What is a SavePoint

A save point is the stored state of a working copy at a given point in time. It consists of the following pieces of information:

SavePoints will always belong to a workspace. Within the workspace the member SavePoints will be ordered. Workspace names beginning with "svn:" are reserved for internal Subversion use. Stash will use the "svn:stash" workspace. There will be a current work space, which will default to something possibly "Default". Save points for rolling back destructive commands will be created in the current workspace with the automatic flag set to true. Save Points are considered immutable once created. Workspaces may be mutable but we likely will like to always handle modifying them by creating a new temporary workspace and then copying save points that are unmodified and creating fresh ones with the changed states. Treating them as immutable and recreating them to modify them means if the modification fails for some reason, we have not lost data.

Nodes data

The nodes table of the wc db will have a new column added which optionally specifies a savepoint id. When creating a SavePoint the wc db nodes table data without a savepoint id (that is the current state of the nodes in the working copy) will be duplicated with the savepoint id set. It may be desirable to use a separate table for this e.g. savepoint_nodes. Duplicating the entire nodes table may not be as fast as we desire, in general we desire the creation of save points to be fast since users will likely be doing this often. However, we have decided not to implement optimization of the representation at this time and rather first implement a full copy since it will be simpler for initial implementation and will give us some idea of this implementations performance.

Blob data

We have not made a final decision on how to store blob data. There are two general locations and one that may be used along side them:

The stored data should not be stored in repository normalized form as this would make it impossible to restore the exact file that the user had on disk (e.g. mixed line endings with svn:eol-style set to native, user directly modified data in keyword sections). We feel that it is critical that we are able to restore the precise file data that the user had in their working copy (e.g. user changes to mixed line endings on a file with native svn:eol-style, creates a savepoint and then comes back later intending to change the svn:eol-style property and commit). This may present issues with moving working copies between Windows and *nix platforms due to the native line ending differences (but we may not be that concerned about supporting that).

However, we are not in agreement about what should be stored. One possibility is to simply store the full text. Another possibility is to store the delta. I think that if we store deltas the delta should always be to a pristine and not to another blob we have stored (more on why later).

SavePoints as Local Branch

Workspaces are ordered lists of save points. In essence you have a commit every time a save point is created. Thus you can treat the creation of a new work space and its initial state as the creation of a local branch. Additional save points are added to the work space which are effectively "commits" to that branch. Each following save point in the workspace will be treated as though they are based on the preceeding one. We'll then be able to push the save points to the repository as a single commit (effectively squashing them into a single change) or reply them one by one (using the log messages stored on the save points or prompting when they are missing?). It may be desirable to skip automatic save points when replaying them one by one (user configurable?).

Stash

Stash will be presented to users as creating something like a Save Point in a work space but will have a distinct UI. Creating a new stash entry will also revert the working copy back to an unmodified state. Stashes can be crated, renamed, deleted, listed, applied and popped (apply+delete). While the save points in the stash are ordered (to allow pop), they are not considered to be related in any way.

Squash

This should be just a matter of deleting save points in the workspace for a range of sequential save points, if we implement our storage correctly. Other scenarios would require rebasing.

Implementation Plan

  1. Implement Save Points and the associated UI to create and restore to them (workspaces not exposed to user, always default).
  2. Implement automatic save points in destructive commands.
  3. Optimize save point wc db representation if necessary.
  4. Stash
  5. Workspaces (including the ability to submit the save points as commits)
  6. Implement diff, merge, squash, rebase on work spaces.
  7. Optimize storage for blobs if we didn't do so in 1.

1-3 (and maybe 4) would likely be a good place to be for the first release with save points. Work spaces while an initial part of the design are delayed in being made available since they'll require a lot more functionality to be useful.

Terminology

There are many terms we could use for some of the concepts. At this point the terms aren't necessarily final and comments about the terminology are welcome. The terms we're using are intended to generally be user visible terms. Save Point was chosen instead of Check Point (as we've called this functionality in past conversations) since it seems to be more obvious what it does to non-native English speakers. A svn savepoint command can also be abbreviated to svn sp while svn checkpoint could not be abbreviated to svn cp since that's an existing alias for svn copy. We have avoided calling Work Spaces (local) branches since we'd rather not conflate them with the branches we have in repositories. If Subversion ends up with branches as first-class objects using the term would only add to the confusion.


Comments

Julian Foad asks:

How does 'stash' behave in relation to checkpoints? It would help to draw a diagram showing sets of workspaces, checkpoints and stashes and the possible transitions between them.

For example, if you're in WS 'bug10' and have made five 'local commits' (savepoints) in it, how do you start a 'quick fix' for bug20 -- do you run just 'svn stash' or have to supply a new name for the new work as in 'svn stash bug20'? What happens to your current WS and what stash is created -- is it called 'bug20', 'bug10', something else? Does 'stash' copy your changes into a new savepoint which is in the 'svn:stash' workspace, but leave you in the workspace you were already in, and revert your changes? Where are your savepoints for bug10 -- five in WS 'bug10' and one in stash 'foo'? Is there any way that stash 'foo' is known to be related to WS 'bug10'? Can you now run 'stash' again (before stash-pop) and if so what happens? Can you now run 'checkpoint' (before stash-pop) and if so what happens? Can you run 'stash pop' from a different WS and if so what happens? Can you 'switch workspace' to the 'svn:stash' workspace?

What aspects of WC state should be preserved and restored by each savepoint? Incompleteness/depth/sparseness? Repo root URLs? Conflicts, including the conflict 'artifact' files? Unversioned files?

What should happen when you switch the WC (or part of it) to another branch and then make a checkpoint? Then what should a 'squashed' commit mean -- just commit the last savepoint no matter what branch it was on, or commit the last-on-each-branch, or something else? Should a multi-commit then commit each savepoint in turn, thus committing to multiple branches?

SavePoints (last edited 2014-07-14 08:09:39 by JulianFoad)