Differences between revisions 24 and 25
Revision 24 as of 2013-06-05 21:38:51
Size: 7371
Comment: official docs link
Revision 25 as of 2013-09-19 19:30:39
Size: 7372
Comment: fix official documentation link to bulk *write* API
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
See also the official documentation for the [[http://docs.couchdb.org/en/latest/api/database.html#get-db-all-docs|read]] and [[http://docs.couchdb.org/en/latest/api/database.html#post-db-all-docs|write]] portions of this topic. See also the official documentation for the [[http://docs.couchdb.org/en/latest/api/database.html#get-db-all-docs|read]] and [[http://docs.couchdb.org/en/latest/api/database.html#post-db-bulk-docs|write]] portions of this topic.

The official documentation has moved to http://docs.couchdb.org — The transition is not 100% complete, but http://docs.couchdb.org should be seen as having the latest info. In some cases, the wiki still has some more or older info on certain topics inside CouchDB.

You need to be added to the ContributorsGroup to edit the wiki. But don't worry! Just email any Mailing List or grab us on IRC and let us know your user name.

HTTP Bulk Document API

See also the official documentation for the read and write portions of this topic.

Fetch Multiple Documents With a Single Request

_all_docs implements the HTTP_view_API where each key is the doc _id, and each value is a JSON object containing the rev. This means that:

  • By adding ?include_docs=true you can get the documents themselves, not just their id and rev

  • By adding ?startkey="xx"&endkey="yy" you can get the documents with keys in a certain range

  • By POSTing to _all_docs you can get a set of documents with arbitrary keys

Simple example to fetch the keys bar and baz and include the complete document in the result set:

curl -d '{"keys":["bar","baz"]}' -X POST http://127.0.0.1:5984/foo/_all_docs?include_docs=true

Result:

   1 {"total_rows":3,"offset":0,"rows":[
   2 {"id":"bar","key":"bar","value":{"rev":"1-4057566831"},"doc":{"_id":"bar","_rev":"1-4057566831","name":"jim"}},
   3 {"id":"baz","key":"baz","value":{"rev":"1-2842770487"},"doc":{"_id":"baz","_rev":"1-2842770487","name":"trunky"}}
   4 ]}
  • The rows are returned in the same order as the supplied "keys" array.
  • The row for a deleted document will have the revision ID of the deletion, and an extra key "deleted":true in the "value" property.
  • The row for a nonexistent document will just contain an "error" property with the value "not_found".

Example with startkey and endkey (be aware of correct url encoding!):

curl 'http://127.0.0.1:5984/foo/_all_docs?include_docs=true&startkey="ba"&endkey="bb"'

Result:

   1 {"total_rows":3,"offset":0,"rows":[
   2 {"id":"bar","key":"bar","value":{"rev":"1-4057566831"},"doc":{"_id":"bar","_rev":"1-4057566831","name":"jim"}},
   3 {"id":"baz","key":"baz","value":{"rev":"1-2842770487"},"doc":{"_id":"baz","_rev":"1-2842770487","name":"trunky"}}
   4 ]}

Modify Multiple Documents With a Single Request

CouchDB provides a bulk insert/update feature. To use this, you make a POST request to the URI /{dbname}/_bulk_docs, with the request body being a JSON document containing a list of new documents to be inserted or updated.

For example (with curl):

$ DB="http://127.0.0.1:5984/mydb"
$ curl -d '{"docs":[{"key":"baz","name":"bazzel"},{"key":"bar","name":"barry"}]}' -X POST $DB/_bulk_docs
$ curl -d @your_file.json -X POST $DB/_bulk_docs

Doc formats below are as per CouchDB 0.9.x.

   1 {
   2   "docs": [
   3     {"_id": "0", "integer": 0, "string": "0"},
   4     {"_id": "1", "integer": 1, "string": "1"},
   5     {"_id": "2", "integer": 2, "string": "2"}
   6   ]
   7 }

If you omit the per-document _id specification, CouchDB will generate unique IDs for you, as it does for regular POST requests to the database URI.

The response to such a bulk request would look as follows (reformatted for clarity):

   1 [
   2     {"id":"0","rev":"1-62657917"},
   3     {"id":"1","rev":"1-2089673485"},
   4     {"id":"2","rev":"1-2063452834"}
   5 ]

The results are returned in the same order as the supplied "docs" array.

Updating existing documents requires setting the _rev member to the revision being updated. To delete a document set the _deleted member to true.

   1 {
   2   "docs": [
   3     {"_id": "0", "_rev": "1-62657917", "_deleted": true},
   4     {"_id": "1", "_rev": "1-2089673485", "integer": 2, "string": "2"},
   5     {"_id": "2", "_rev": "1-2063452834", "integer": 3, "string": "3"}
   6   ]
   7 }

Note that CouchDB will return in the response an id and revision for every document passed as content to a bulk insert, even for those that were just deleted.

If the _rev does not match the current version of the document, then that particular document will not be saved and will be reported as a conflict, but this does not prevent other documents in the batch from being saved.

   1 [
   2     {"id":"0","error":"conflict","reason":"Document update conflict."},
   3     {"id":"1","rev":"2-1579510027"},
   4     {"id":"2","rev":"2-3978456339"}
   5 ]

Other possible values for error include forbidden (403) and unauthorized (401), both likely caused by a validation failure.

If the new_edits=false query parameter is used in the request (to push existing revisions instead of creating new ones) the response will not include entries for any of the successful revisions (since their rev IDs are already known to the sender), only for the ones that had errors. Also, the conflict error will never appear, since in this mode conflicts are allowed.

Transactional Semantics with Bulk Updates

In short, there are none (by design). However, you can ask CouchDB to check that all the documents in your _bulk_docs request pass all your validation functions. If even one fails, none of the documents are written. You can select this mode by including "all_or_nothing":true in your request. With this mode, if all documents pass validation, then all documents will be updated, even if that introduces a conflict for some or all of the documents.

Bulk updates work independently of replication, the documents updated in a _bulk_docs request will not be replicated as a group, and will not even necessarily be replicated in the same order as they were in the request.

Posting Existing Revisions

The replicator uses a special mode of _bulk_docs. The documents it writes to the destination database already have revision IDs that need to be preserved for the two databases to be in sync (otherwise it would not be possible to tell that the two represent the same revision.) To prevent the database from assigning them new revision IDs, a "new_edits":false property is added to the JSON request body.

Note that this changes the interpretation of the _rev parameter in each document: rather than being the parent revision ID to be matched against, it's the existing revision ID that will be saved as-is into the database. And since it's important to retain revision history when adding to the database, each document body in this mode should have a _revisions property that lists its revision history; the format of this property is described on the HTTP_Document_API page under "Accessing Previous Revisions". For example:

$ curl -X POST -d '{"new_edits":false,"docs":[{"_id":"person","_rev":"2-3595405","_revisions":{"start":2,"ids":["3595405","877727288"]},"name":"jim"}]}' "$OTHER_DB/_bulk_docs"

This command will replicate one of the revisions created above, into a separate database $OTHER_DB. It will have the same revision ID as in $DB, "2-3595405", and it will be known to have a parent revision with ID 1-877727288. (Even though $OTHER_DB will not have the body of that revision, the history will help it detect conflicts in future replications.)

As with _all_or_nothing, this mode can create conflicts; in fact, this is where the conflicts created by replication come from.

HTTP_Bulk_Document_API (last edited 2013-09-19 19:30:39 by NathanVanderWilt)