Differences between revisions 25 and 26
Revision 25 as of 2013-09-19 19:30:39
Size: 7372
Editor: 97
Comment: fix official documentation link to bulk *write* API
Revision 26 as of 2018-04-13 05:42:32
Size: 0
Editor: JoanTouzet
Comment: migrated
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
<<Include(EditTheWiki)>>

= HTTP Bulk Document API =

See also the official documentation for the [[http://docs.couchdb.org/en/latest/api/database.html#get-db-all-docs|read]] and [[http://docs.couchdb.org/en/latest/api/database.html#post-db-bulk-docs|write]] portions of this topic.

<<TableOfContents()>>

== Fetch Multiple Documents With a Single Request ==
{{{_all_docs}}} implements the [[HTTP_view_API]] where each ''key'' is the doc _id, and each ''value'' is a JSON object containing the rev. This means that:

 * By adding {{{?include_docs=true}}} you can get the documents themselves, not just their id and rev
 * By adding {{{?startkey="xx"&endkey="yy"}}} you can get the documents with keys in a certain range
 * By POSTing to _all_docs you can get a set of documents with arbitrary keys

Simple example to fetch the keys {{{bar}}} and {{{baz}}} and include the complete document in the result set:

{{{
curl -d '{"keys":["bar","baz"]}' -X POST http://127.0.0.1:5984/foo/_all_docs?include_docs=true
}}}
Result:

{{{#!highlight javascript
{"total_rows":3,"offset":0,"rows":[
{"id":"bar","key":"bar","value":{"rev":"1-4057566831"},"doc":{"_id":"bar","_rev":"1-4057566831","name":"jim"}},
{"id":"baz","key":"baz","value":{"rev":"1-2842770487"},"doc":{"_id":"baz","_rev":"1-2842770487","name":"trunky"}}
]}
}}}

 * The rows are returned in the same order as the supplied "keys" array.
 * The row for a deleted document will have the revision ID of the deletion, and an extra key "deleted":true in the "value" property.
 * The row for a nonexistent document will just contain an "error" property with the value "not_found".

Example with startkey and endkey ('''be aware''' of correct url encoding!):

{{{
curl 'http://127.0.0.1:5984/foo/_all_docs?include_docs=true&startkey="ba"&endkey="bb"'
}}}
Result:

{{{#!highlight javascript
{"total_rows":3,"offset":0,"rows":[
{"id":"bar","key":"bar","value":{"rev":"1-4057566831"},"doc":{"_id":"bar","_rev":"1-4057566831","name":"jim"}},
{"id":"baz","key":"baz","value":{"rev":"1-2842770487"},"doc":{"_id":"baz","_rev":"1-2842770487","name":"trunky"}}
]}
}}}

== Modify Multiple Documents With a Single Request ==
CouchDB provides a bulk insert/update feature. To use this, you make a ''POST'' request to the URI ''/{dbname}/_bulk_docs'', with the request body being a JSON document containing a list of new documents to be inserted or updated.

For example (with curl):

{{{
$ DB="http://127.0.0.1:5984/mydb"
$ curl -d '{"docs":[{"key":"baz","name":"bazzel"},{"key":"bar","name":"barry"}]}' -X POST $DB/_bulk_docs
$ curl -d @your_file.json -X POST $DB/_bulk_docs
}}}
Doc formats below are as per CouchDB 0.9.x.

{{{#!highlight javascript
{
  "docs": [
    {"_id": "0", "integer": 0, "string": "0"},
    {"_id": "1", "integer": 1, "string": "1"},
    {"_id": "2", "integer": 2, "string": "2"}
  ]
}
}}}
If you omit the per-document ''_id'' specification, CouchDB will generate unique IDs for you, as it does for regular ''POST'' requests to the database URI.

The response to such a bulk request would look as follows (reformatted for clarity):

{{{#!highlight javascript
[
    {"id":"0","rev":"1-62657917"},
    {"id":"1","rev":"1-2089673485"},
    {"id":"2","rev":"1-2063452834"}
]
}}}

The results are returned in the same order as the supplied "docs" array.

Updating existing documents requires setting the ''_rev'' member to the revision being updated. To delete a document set the ''_deleted'' member to true.

{{{#!highlight javascript
{
  "docs": [
    {"_id": "0", "_rev": "1-62657917", "_deleted": true},
    {"_id": "1", "_rev": "1-2089673485", "integer": 2, "string": "2"},
    {"_id": "2", "_rev": "1-2063452834", "integer": 3, "string": "3"}
  ]
}
}}}
Note that CouchDB will return in the response an id and revision for every document passed as content to a bulk insert, even for those that were just deleted.

If the _rev does not match the current version of the document, then that particular document will ''not'' be saved and will be reported as a conflict, but this does not prevent other documents in the batch from being saved.

{{{#!highlight javascript
[
    {"id":"0","error":"conflict","reason":"Document update conflict."},
    {"id":"1","rev":"2-1579510027"},
    {"id":"2","rev":"2-3978456339"}
]
}}}

Other possible values for ''error'' include ''forbidden'' (403) and ''unauthorized'' (401), both likely caused by a validation failure.

If the ''new_edits=false'' query parameter is used in the request (to push existing revisions instead of creating new ones) the response will not include entries for any of the successful revisions (since their rev IDs are already known to the sender), only for the ones that had errors. Also, the ''conflict'' error will never appear, since in this mode conflicts are allowed.


=== Transactional Semantics with Bulk Updates ===

In short, there are none (by design). However, you can ask CouchDB to check that all the documents in your {{{_bulk_docs}}} request pass all your validation functions. If even one fails, none of the documents are written. You can select this mode by including {{{"all_or_nothing":true}}} in your request. With this mode, if all documents pass validation, then all documents will be updated, even if that introduces a conflict for some or all of the documents.

Bulk updates work independently of replication, the documents updated in a {{{_bulk_docs}}} request will not be replicated as a group, and will not even necessarily be replicated in the same order as they were in the request.

=== Posting Existing Revisions ===

The replicator uses a special mode of {{{_bulk_docs}}}. The documents it writes to the destination database already have revision IDs that need to be preserved for the two databases to be in sync (otherwise it would not be possible to tell that the two represent the same revision.) To prevent the database from assigning them new revision IDs, a {{{"new_edits":false}}} property is added to the JSON request body.

Note that this changes the interpretation of the {{{_rev}}} parameter in each document: rather than being the ''parent'' revision ID to be matched against, it's the ''existing'' revision ID that will be saved as-is into the database. And since it's important to retain revision history when adding to the database, each document body in this mode should have a {{{_revisions}}} property that lists its revision history; the format of this property is described on the [[HTTP_Document_API]] page under "Accessing Previous Revisions". For example:

{{{
$ curl -X POST -d '{"new_edits":false,"docs":[{"_id":"person","_rev":"2-3595405","_revisions":{"start":2,"ids":["3595405","877727288"]},"name":"jim"}]}' "$OTHER_DB/_bulk_docs"
}}}

This command will replicate one of the revisions created above, into a separate database {{{$OTHER_DB}}}. It will have the same revision ID as in {{{$DB}}}, {{{"2-3595405"}}}, and it will be known to have a parent revision with ID {{{1-877727288}}}. (Even though {{{$OTHER_DB}}} will not have the body of that revision, the history will help it detect conflicts in future replications.)

As with {{{_all_or_nothing}}}, this mode can create conflicts; in fact, this is where the conflicts created by replication come from.