Differences between revisions 4 and 5
Revision 4 as of 2012-07-18 18:01:22
Size: 3701
Editor: JensAlfke
Comment: Added lots of explanatory text.
Revision 5 as of 2013-07-19 00:24:31
Size: 3699
Comment: Move statement outside of code block
Deletions are marked like this. Additions are marked like this.
Line 42: Line 42:
}}}
Line 44: Line 45:

}}}

The _purge operation removes all references to the deleted revisions -- and their parents -- from the database. (This is very different from a normal delete, which actually adds a "tombstone" revision.) In a sense it edits history, similarly to a Git "reset": the revisions will no longer appear in the revision tree. It's as though the database had never heard of them at all.

Reasons To (And Not To) Purge

So, you've included your credit card details, your mother's maiden name and the PIN's to all your major credit cards in a CouchDB document by mistake. You'd like to undo this. Usually, you can simply update the document, removing the confidential data, and then compact the database. However, let's say you really messed up and included this secret information in the document's id field. You remember that CouchDB will remember all the latest {id, rev} pairs it's ever seen (so that replication can make all replicas eventually consistent). Are you paddling down an unpleasant stream with no means of steering? Fortunately not! You can purge.

If you are using _purge to recover space, you are almost certainly using CouchDB inappropriately. The most common reason developers use _purge inappropriately is when managing short-lived data (log entries, message queues, etc). A better remedy is to periodically switch to a new database and delete the old one (once the entries in it have all expired).

Eligibility For Purging

A revision parameter to _purge must be a leaf in the revision tree. This means it must be the current revision, or one of the current conflicting revisions. This is because a revision that has already been replaced by another is not a leaf node of the revision tree, so removing it would break the integrity of the tree.

When a revision is purged, its ancestors are purged if possible. Ancestors will be kept if necessary to preserve the integrity of the tree; this only happens if there have been conflicts and they are either unresolved or haven't yet been compacted away.

The _purge Command

To perform a purge operation you must send a JSON object, where the keys are the IDs to purge and each value is a list of the revisions to purge. Typically you'd just specify the current revision ID, which will purge the entire document unless there are conflicts. To purge an entire document while it's in conflict, you need to send each conflicting revision ID.

For example:

POST /mydb/_purge
Content-Type: application/json

{
  "c7a59f0d08d28928a62124fa16000ea7" : [
    "2-02be3a9b23b4402160ad678e208afb8e"
    ]
}

The response will contain the purge sequence number, and a list of the document IDs and revisions successfully purged.

{
   "purged" : {
      "c7a59f0d08d28928a62124fa16000ea7" : [
         "2-02be3a9b23b4402160ad678e208afb8e"
      ]
   },
   "purge_seq" : 1
}

The purge sequence number is simply a persistent per-database counter that is incremented every time a _purge operation is performed. It's used internally to invalidate view indexes.

Side Effects

If you have purged more than one document between querying your views, you will find that they will rebuild from scratch. This is because you have removed the information necessary to perform a correct incremental update.

If the purged revisions still exist in a another replica of the database, a replication with that database will pull them over again and restore them. To globally remove the revisions, the purge needs to be performed on all the replicas as well, ensuring that replication is stopped during this operation, to avoid them being replicated back again.

Purge_Documents (last edited 2013-07-19 00:24:31 by RussellBranca)