Differences between revisions 9 and 10
Revision 9 as of 2018-02-09 13:40:27
Size: 2601
Editor: brane
Comment:
Revision 10 as of 2018-02-09 13:42:34
Size: 2590
Editor: brane
Comment:
Deletions are marked like this. Additions are marked like this.
Line 11: Line 11:
'''Definition:''' A ''data configuration space'' is a [[http://en.wikipedia.org/wiki/Metric_space|metric space]] (𝔹, 𝒹) where: '''Definition:''' A ''data configuration space'' is a [[http://en.wikipedia.org/wiki/Metric_space|metric space]] (𝔹, D) where:
Line 13: Line 13:
 * 𝒹: 𝔹 × 𝔹 ⟶ ℝ is a metric on 𝔹.  * D: 𝔹 × 𝔹 ⟶ ℝ is a metric on 𝔹.
Line 15: Line 15:
'''Definition:''' A ''configuration history'' is a sequence of elements of 𝔹; in other words, it is a ''curve'' in (𝔹, 𝒹). '''Definition:''' A ''configuration history'' is a sequence of elements of 𝔹; in other words, it is a ''curve'' in (𝔹, D).
Line 26: Line 26:
  * from wich it follows that: ∀c ∈ 𝔹: L(𝑒, c) = ''S'',,c,,.   * from which it follows that: ∀c ∈ 𝔹: L(C, c) = ''S'',,c,,.

Configuration Spaces and Histories

Before we dive into versioned filesystem design and implementation details, let's first take a look at the concepts we're talking about. We'll begin by discussing what a configuration management system actually operates on.

N.B.: Although the following text uses mathematical concepts and notation, it's not intended to be a rigorous dissertation on configuration theory. We'll often make statements and draw conclusions without providing proof. Therefore, dear reader, you can either trust we're right, or do your own homework to prove (or disprove) our assertions. We will, however, try to conscientiously flag⚑ all leaps of faith in the text.

Definition: A data configuration is a unique pattern of data that can be represented as exactly one distinct sequence of binary digits.

  • The representation of the empty configuration E is the zero-length sequence.

  • The size , Sc, of a configuration c is the length of its representative sequence of binary digits.

Definition: A data configuration space is a metric space (𝔹, D) where:

  • 𝔹 is the set of all data configurations;
  • D: 𝔹 × 𝔹 ⟶ ℝ is a metric on 𝔹.

Definition: A configuration history is a sequence of elements of 𝔹; in other words, it is a curve in (𝔹, D).

Defining the Metric

We'll now show that we can indeed define a metric for 𝔹. One example is the Levenshtein distance,

  • L: 𝔹 × 𝔹 ⟶ ℕ0

applied to the binary-digit sequence representation of data configurations. It is easy to show⚑ that L has all the required properties of a metric. In addition, it has the following properties:

  • The minimum distance between any two distinct configurations is 1.
  • The maximum distance between any two distinct configurations is the greater of their sizes: ∀a,b ∈ 𝔹: L(a, b) ≤ max(Sa, Sb);

    • from which it follows that: ∀c ∈ 𝔹: L(C, c) = Sc.

  • All distances are non-negative integers.

In the rest of this text, we'll specifically talk about (𝔹, L) — that is, the data configuration space with a Levenshtein distance metric; also referred to as edit distance. Even though the term "edit distance" is less precise in general, it fits the concept of edits (or edit transformations), which we'll discuss later on.

The History Curve

FS2/Theory (last edited 2018-02-09 13:42:34 by brane)