You need to be added to the ContributorsGroup to edit the wiki. But don't worry! Just email any Mailing List or grab us on IRC and let us know your user name.

HTTP Bulk Document API

Fetch Multiple Documents With a Single Request

_all_docs implements the HTTP_view_API where each key is the doc _id, and each value is a JSON object containing the rev. This means that:

Simple example to fetch the keys bar and baz and include the complete document in the result set:

curl -d '{"keys":["bar","baz"]}' -X POST http://127.0.0.1:5984/foo/_all_docs?include_docs=true

Result:

{"total_rows":3,"offset":0,"rows":[
{"id":"bar","key":"bar","value":{"rev":"1-4057566831"},"doc":{"_id":"bar","_rev":"1-4057566831","name":"jim"}},
{"id":"baz","key":"baz","value":{"rev":"1-2842770487"},"doc":{"_id":"baz","_rev":"1-2842770487","name":"trunky"}}
]}

Example with startkey and endkey (be aware of correct url encoding!):

curl 'http://127.0.0.1:5984/foo/_all_docs?include_docs=true&startkey="ba"&endkey="bb"'

Result:

{"total_rows":3,"offset":0,"rows":[
{"id":"bar","key":"bar","value":{"rev":"1-4057566831"},"doc":{"_id":"bar","_rev":"1-4057566831","name":"jim"}},
{"id":"baz","key":"baz","value":{"rev":"1-2842770487"},"doc":{"_id":"baz","_rev":"1-2842770487","name":"trunky"}}
]}

Modify Multiple Documents With a Single Request

CouchDB provides a bulk insert/update feature. To use this, you make a POST request to the URI /{dbname}/_bulk_docs, with the request body being a JSON document containing a list of new documents to be inserted or updated.

For example (with curl):

$ DB="http://127.0.0.1:5984/mydb"
$ curl -d '{"docs":[{"key":"baz","name":"bazzel"},{"key":"bar","name":"barry"}]}' -X POST $DB/_bulk_docs
$ curl -d @your_file.json -X POST $DB/_bulk_docs

Doc formats below are as per CouchDB 0.9.x.

{
  "docs": [
    {"_id": "0", "integer": 0, "string": "0"},
    {"_id": "1", "integer": 1, "string": "1"},
    {"_id": "2", "integer": 2, "string": "2"}
  ]
}

If you omit the per-document _id specification, CouchDB will generate unique IDs for you, as it does for regular POST requests to the database URI.

The response to such a bulk request would look as follows (reformatted for clarity):

[
    {"id":"0","rev":"1-62657917"},
    {"id":"1","rev":"1-2089673485"},
    {"id":"2","rev":"1-2063452834"}
]

Updating existing documents requires setting the _rev member to the revision being updated. To delete a document set the _deleted member to true.

{
  "docs": [
    {"_id": "0", "_rev": "1-62657917", "_deleted": true},
    {"_id": "1", "_rev": "1-2089673485", "integer": 2, "string": "2"},
    {"_id": "2", "_rev": "1-2063452834", "integer": 3, "string": "3"}
  ]
}

Note that CouchDB will return in the response an id and revision for every document passed as content to a bulk insert, even for those that were just deleted.

If the _rev does not match the current version of the document, then that particular document will not be saved and will be reported as a conflict, but this does not prevent other documents in the batch from being saved.

[
    {"id":"0","error":"conflict","reason":"Document update conflict."},
    {"id":"1","rev":"2-1579510027"},
    {"id":"2","rev":"2-3978456339"}
]

Transactional Semantics with Bulk Updates

In previous releases of CouchDB, bulk updates were transactional - in particular, all requests in a bulk update failed if any request failed or was in conflict. There were a couple of problems with this approach:

With release 0.9 of CouchDB, bulk update semantics have been changed so that a CouchDB server behaves consistently in a single-node, replicated, and/or partitioned environment. Note that this change makes explicit the fact that CouchDB is not a relational store and does not guarantee relational consistency between documents. As a developer you need to be aware of these semantics and design your data model and your application with this in mind.

There are now two bulk update models supported:

{
  "all_or_nothing": true,
  "docs": [
    {"_id": "0", "_rev": "1-62657917", "integer": 10, "string": "10"},
    {"_id": "1", "_rev": "2-1579510027", "integer": 2, "string": "2"},
    {"_id": "2", "_rev": "2-3978456339", "integer": 3, "string": "3"}
  ]
}

In this case, all three documents will be saved, and the response will show success for all of them. However if the document with id 0 had a conflict, both versions will be present in the database, with an arbitrary choice made as to which appears in views. You can check for this status using a GET with ?conflicts=true

If any updates fails validation, all updates will fail.

All or nothing transactions should not be used to enforce referential integrity, as some or all updated documents might become losing conflicts during the update. The transaction should be used to make sure all information is captured in an atomic operation, but conflicts may need to be addressed later. Applications that rely on this functionality should be able to tolerate some documents missing or being in a conflicted state until conflict resolution can occur.

Bulk updates work independently of replication, meaning document revisions originally saved as part of an all or nothing transaction will be replicated individually, not as part of a bulk transaction. This means other replica instances may only have a subset of the transaction, and if an update is rejected by the remote node during replication (e.g. not authorized error) the remote node may never have the complete transaction.

Note that POSTing a single document with "all_or_nothing":true behaves completely differently from a regular PUT, since it will save conflicting versions rather than rejecting a conflict.

$ DB="http://127.0.0.1:5984/tstconf"
$ curl -X PUT "$DB"
$ curl -X PUT -d '{"name":"fred"}' "$DB/person"
$ curl -X POST -H 'Content-Type: application/json' -d '{"all_or_nothing":true,"docs":[{"_id":"person","_rev":"1-877727288","name":"jim"}]}' "$DB/_bulk_docs"
$ curl -X POST -H 'Content-Type: application/json' -d '{"all_or_nothing":true,"docs":[{"_id":"person","_rev":"1-877727288","name":"trunky"}]}' "$DB/_bulk_docs"
$ curl "$DB/person?conflicts=true"

Result:

{"ok":true}
{"ok":true,"id":"person","rev":"1-877727288"}
[{"id":"person","rev":"2-3595405"}]
[{"id":"person","rev":"2-2835283254"}]
{"_id":"person","_rev":"2-3595405","name":"jim","_conflicts":["2-2835283254"]}

Posting Existing Revisions

The replicator uses a special mode of _bulk_docs. The documents it writes to the destination database already have revision IDs that need to be preserved for the two databases to be in sync (otherwise it would not be possible to tell that the two represent the same revision.) To prevent the database from assigning them new revision IDs, a "new_edits":false property is added to the JSON request body.

Note that this changes the interpretation of the _rev parameter in each document: rather than being the parent revision ID to be matched against, it's the existing revision ID that will be saved as-is into the database. And since it's important to retain revision history when adding to the database, each document body in this mode should have a _revisions property that lists its revision history; the format of this property is described on the HTTP_Document_API page under "Accessing Previous Revisions". For example:

$ curl -X POST -d '{"new_edits":false,"docs":[{"_id":"person","_rev":"2-3595405","_revisions":{"start":2,"ids":["3595405","877727288"],"name":"jim"}]}' "$OTHER_DB/_bulk_docs"

This command will replicate one of the revisions created above, into a separate database $OTHER_DB. It will have the same revision ID as in $DB, "2-3595405", and it will be known to have a parent revision with ID 1-877727288. (Even though $OTHER_DB will not have the body of that revision, the history will help it detect conflicts in future replications.)

As with _all_or_nothing, this mode can create conflicts; in fact, this is where the conflicts created by replication come from.

HTTP_Bulk_Document_API (last edited 2012-05-05 17:45:52 by ThomasVanderStichele)