Differences between revisions 95 and 96
Revision 95 as of 2012-12-24 18:40:48
Size: 27575
Editor: RobertNewson
Comment: fix even moar
Revision 96 as of 2013-06-05 21:58:30
Size: 27689
Comment: official docs link
Deletions are marked like this. Additions are marked like this.
Line 4: Line 4:

See also the [[http://docs.couchdb.org/en/latest/api/documents.html|official documentation]] for this topic.

The official documentation has moved to http://docs.couchdb.org — The transition is not 100% complete, but http://docs.couchdb.org should be seen as having the latest info. In some cases, the wiki still has some more or older info on certain topics inside CouchDB.

You need to be added to the ContributorsGroup to edit the wiki. But don't worry! Just email any Mailing List or grab us on IRC and let us know your user name.

HTTP Document API

See also the official documentation for this topic.

This is an introduction to the CouchDB HTTP document API.

Naming/Addressing

Documents stored in a CouchDB have a DocID. DocIDs are case-sensitive string identifiers that uniquely identify a document. Two documents cannot have the same identifier in the same database, they are considered the same document.

http://localhost:5984/test/some_doc_id
http://localhost:5984/test/another_doc_id
http://localhost:5984/test/BA1F48C5418E4E68E5183D5BD1F06476

The above URLs point to some_doc_id, another_doc_id and BA1F48C5418E4E68E5183D5BD1F06476 in the database test.

Documents

A CouchDB document is simply a JSON object. You can use any JSON structure with nesting. You can fetch the document's revision information by adding ?revs_info=true to the get request.

Here are two simple examples of documents:

{
 "_id":"discussion_tables",
 "_rev":"D1C946B7",
 "Sunrise":true,
 "Sunset":false,
 "FullHours":[1,2,3,4,5,6,7,8,9,10],
 "Activities": [
   {"Name":"Football", "Duration":2, "DurationUnit":"Hours"},
   {"Name":"Breakfast", "Duration":40, "DurationUnit":"Minutes", "Attendees":["Jan", "Damien", "Laura", "Gwendolyn", "Roseanna"]}
 ]
}

{
 "_id":"some_doc_id",
 "_rev":"D1C946B7",
 "Subject":"I like Plankton",
 "Author":"Rusty",
 "PostedDate":"2006-08-15T17:30:12-04:00",
 "Tags":["plankton", "baseball", "decisions"],
 "Body":"I decided today that I don't like baseball. I like plankton."
}

Special Fields

Note that any top-level fields within a JSON document containing a name that starts with a _ prefix are reserved for use by CouchDB itself. Also see Reserved_words. Currently (0.10+) reserved fields are:

Field Name

Description

_id

The unique identifier of the document (mandatory and immutable)

_rev

The current MVCC-token/revision of this document (mandatory and immutable)

_attachments

If the document has attachments, _attachments holds a (meta-)data structure (see section on HTTP_Document_API#Attachments)

_deleted

Indicates that this document has been deleted and previous revisions will be removed on next compaction run

_revisions

Revision history of the document

_revs_info

A list of revisions of the document, and their availability

_conflicts

Information about conflicts

_deleted_conflicts

Information about conflicts

_local_seq

Sequence number of the revision in the database (as found in the _changes feed)

To request a special field be returned along with the normal fields you get when you request a document, add the desired field as a query parameter without the leading underscore in a GET request:

curl -X GET 'http://localhost:5984/my_database/my_document?conflicts=true'

This request will return a document that includes the special field '_conflicts' which contains all the conflicting revisions of "my_document".

[Exception: The query parameter for the _revisions special field is 'revs', not 'revisions'.]

Document IDs

Document IDs don't have restrictions on what characters can be used. Although it should work, it is recommended to use non-special characters for document IDs. Using special characters you have to be aware of proper URL en-/decoding. Documents prefixed with _ are special documents:

Document ID prefix

Description

_design/

are DesignDocuments

_local/

are not being replicated (local documents) and used for Replication checkpointing.

You can have / as part of the document ID but if you refer to a document in a URL you must always encode it as %2F. One special case is _design/ documents, those accept either / or %2F for the / after _design, although / is preferred and %2F is still needed for the rest of the DocID.

Working With Documents Over HTTP

GET

To retrieve a document, simply perform a GET operation at the document's URL:

GET /somedatabase/some_doc_id HTTP/1.0

Here is the server's response:

HTTP/1.1 200 OK
Etag: "946B7D1C"
Date: Thu, 17 Aug 2006 05:39:28 +0000GMT
Content-Type: application/json
Content-Length: 256
Connection: close

{
 "_id":"some_doc_id",
 "_rev":"946B7D1C",
 "Subject":"I like Plankton",
 "Author":"Rusty",
 "PostedDate":"2006-08-15T17:30:12Z-04:00",
 "Tags":["plankton", "baseball", "decisions"],
 "Body":"I decided today that I don't like baseball. I like plankton."
}

Accessing Previous Revisions

See DocumentRevisions for additional notes on revisions.

The above example gets the current revision. You may be able to get a specific revision by using the following syntax:

GET /somedatabase/some_doc_id?rev=946B7D1C HTTP/1.0

To find out what revisions are available for a document, you can do:

GET /somedatabase/some_doc_id?revs_info=true HTTP/1.0

This returns the current revision of the document, but with an additional _revs_info field, whose value is an array of objects, one per revision. For example:

{
  "_revs_info": [
    {"rev": "3-ffffff", "status": "available"},
    {"rev": "2-eeeeee", "status": "missing"},
    {"rev": "1-dddddd", "status": "deleted"},
  ]
}

Here, available means the revision content is stored in the database and can still be retrieved. The other values indicate that the content of that revision is not available.

Alternatively, the _revisions field, used by the replicator, can return an array of revision IDs more efficiently. The numeric prefixes are removed, with a "start" value indicating the prefix for the first (most recent) ID:

{
  "_revisions": {
    "start": 3,
    "ids": ["fffff", "eeeee", "ddddd"]
  }
}

You can fetch the bodies of multiple revisions at once using the parameter open_revs=["rev1","rev2",...], or you can fetch all leaf revisions using open_revs=all (see Replication_and_conflicts). The JSON returns an array of objects with an "ok" key pointing to the document, or a "missing" key pointing to the rev string.

[
{"missing":"1-fbd8a6da4d669ae4b909fcdb42bb2bfd"},
{"ok":{"_id":"test","_rev":"2-5bc3c6319edf62d4c624277fdd0ae191","hello":"foo"}}
]

A HEAD request returns basic information about the document, including its current revision.

HEAD /somedatabase/some_doc_id HTTP/1.0

HTTP/1.1 200 OK
Etag: "946B7D1C"
Date: Thu, 17 Aug 2006 05:39:28 +0000GMT
Content-Type: application/json
Content-Length: 256

PUT

To create new document you can either use a POST operation or a PUT operation. To create/update a named document using the PUT operation, the URL must point to the document's location.

The following is an example HTTP PUT. It will cause the CouchDB server to generate a new revision ID and save the document with it.

PUT /somedatabase/some_doc_id HTTP/1.0
Content-Length: 245
Content-Type: application/json

{
  "Subject":"I like Plankton",
  "Author":"Rusty",
  "PostedDate":"2006-08-15T17:30:12-04:00",
  "Tags":["plankton", "baseball", "decisions"],
  "Body":"I decided today that I don't like baseball. I like plankton."
}

Here is the server's response.

HTTP/1.1 201 Created
Etag: "946B7D1C"
Date: Thu, 17 Aug 2006 05:39:28 +0000GMT
Content-Type: application/json
Connection: close

{"ok": true, "id": "some_doc_id", "rev": "946B7D1C"}

To update an existing document, you also issue a PUT request. In this case, the JSON body must contain a _rev property, which lets CouchDB know which revision the edits are based on. If the revision of the document currently stored in the database doesn't match, then a 409 conflict error is returned.

If the revision number does match what's in the database, a new revision number is generated and returned to the client.

For example:

PUT /somedatabase/some_doc_id HTTP/1.0
Content-Length: 245
Content-Type: application/json

{
  "_id":"some_doc_id",
  "_rev":"946B7D1C",
  "Subject":"I like Plankton",
  "Author":"Rusty",
  "PostedDate":"2006-08-15T17:30:12-04:00",
  "Tags":["plankton", "baseball", "decisions"],
  "Body":"I decided today that I don't like baseball. I like plankton."
}

Here is the server's response if what is stored in the database is revision 946B7D1C of document some_doc_id.

HTTP/1.1 201 Created
Etag: "2774761002"
Date: Thu, 17 Aug 2006 05:39:28 +0000GMT
Content-Type: application/json
Connection: close

{"ok":true, "id":"some_doc_id", "rev":"2774761002"}

And here is the server's response if there is an update conflict (what is currently stored in the database is not revision 946B7D1C of document some_doc_id).

HTTP/1.1 409 Conflict
Date: Thu, 17 Aug 2006 05:39:28 +0000GMT
Content-Length: 33
Connection: close

{"error":"conflict","reason":"Document update conflict."}

In case for some reason you need to insert a conflicting revision, PUT supports the new_edits=false query option, which is described in detail under HTTP_Bulk_Document_API#Posting_Existing_Revisions.

There is a query option batch=ok which can be used to achieve higher throughput at the cost of lower guarantees. When a PUT (or a document POST as described below) is sent using this option, it is not immediately written to disk. Instead it is stored in memory on a per-user basis for a second or so (or the number of docs in memory reaches a certain point). After the threshold has passed, the docs are committed to disk. Instead of waiting for the doc to be written to disk before responding, CouchDB sends an HTTP 202 Accepted response immediately.

batch=ok is not suitable for crucial data, but it ideal for applications like logging which can accept the risk that a small proportion of updates could be lost due to a crash. Docs in the batch can also be flushed manually using the _ensure_full_commit API.

POST

The POST operation can be used to create a new document with a server generated DocID. To do so, the URL must point to the database's location. To create a named document, use the PUT method instead.

It is recommended that you avoid POST when possible, because proxies and other network intermediaries will occasionally resend POST requests, which can result in duplicate document creation. If your client software is not capable of guaranteeing uniqueness of generated UUIDs, use a GET to /_uuids?count=100 to retrieve a list of document IDs for future PUT requests. Please note that the /_uuids-call does not check for existing document ids; collision-detection happens when you are trying to save a document.

The following is an example HTTP POST. It will cause the CouchDB server to generate a new DocID and revision ID and save the document with it.

POST /somedatabase/ HTTP/1.0
Content-Length: 245
Content-Type: application/json

{
  "Subject":"I like Plankton",
  "Author":"Rusty",
  "PostedDate":"2006-08-15T17:30:12-04:00",
  "Tags":["plankton", "baseball", "decisions"],
  "Body":"I decided today that I don't like baseball. I like plankton."
}

Here is the server's response:

HTTP/1.1 201 Created
Date: Thu, 17 Aug 2006 05:39:28 +0000GMT
Content-Type: application/json
Connection: close

{"ok":true, "id":"123BAC", "rev":"946B7D1C"}

As of 0.11 CouchDB supports handling of multipart/form-data encoded updates. This is used by Futon and not considered a public API. All such requests must contain a valid Referer header.

DELETE

To delete a document, perform a DELETE operation at the document's location, passing the rev parameter with the document's current revision. If successful, it will return the revision id for the deletion stub.

DELETE /somedatabase/some_doc?rev=1582603387 HTTP/1.0

As an alternative you can submit the rev parameter with the etag header field If-Match.

DELETE /somedatabase/some_doc HTTP/1.0
If-Match: "1582603387"

And the response:

HTTP/1.1 200 OK
Etag: "2839830636"
Date: Thu, 17 Aug 2006 05:39:28 +0000GMT
Content-Type: application/json
Connection: close

{"ok":true,"rev":"2839830636"}

Note: Deleted documents remain in the database forever, even after compaction, to allow eventual consistency when replicating. If you delete using the DELETE method above, only the _id, _rev and a deleted flag are preserved. If you deleted a document by adding "_deleted":true then all the fields of the document are preserved. This is to allow, for example, recording the time you deleted a document, or the reason you deleted it.

COPY

Note that this is a non-standard extension to HTTP.

You can copy documents by sending an HTTP COPY request. This allows you to duplicate the contents (and attachments) of a document to a new document under a different document id without first retrieving it from CouchDB. Use the Destination header to specify the document that you want to copy to (the target document).

It is not possible to copy documents between databases and it is not (yet) possible to perform bulk copy operations.

COPY /somedatabase/some_doc HTTP/1.1
Destination: some_other_doc

If you want to overwrite an existing document, you need to specify the target document's revision with a rev parameter in the Destination header:

COPY /somedatabase/some_doc HTTP/1.1
Destination: some_other_doc?rev=rev_id

The response in both cases includes the target document's revision:

HTTP/1.1 201 Created
Server: CouchDB/0.9.0a730122-incubating (Erlang OTP/R12B)
Etag: "355068078"
Date: Mon, 05 Jan 2009 11:12:49 GMT
Content-Type: text/plain;charset=utf-8
Content-Length: 41
Cache-Control: must-revalidate

{"ok":true,"id":"some_other_doc","rev":"355068078"}

MOVE

For a ~6 month period CouchDB trunk between versions 0.8 and 0.9 included the nonstandard MOVE method. Since MOVE is really just COPY & DELETE and CouchDB can not reasonably guarantee atomicity between the COPY & MOVE operations on a single or on multiple nodes, this was removed before the release of CouchDB 0.9.

Bulk Docs

For information about editing multiple documents at the same time, see HTTP_Bulk_Document_API

All Documents

all_docs

To get a listing of all documents in a database, use the special _all_docs URI. This is a specialized View so the Querying Options of the HTTP_view_API apply here.

GET /somedatabase/_all_docs HTTP/1.0

Will return a listing of all documents and their revision IDs, ordered by DocID (case sensitive):

HTTP/1.1 200 OK
Date: Thu, 17 Aug 2006 05:39:28 +0000GMT
Content-Type: application/json
Connection: close

{
  "total_rows": 3, "offset": 0, "rows": [
    {"id": "doc1", "key": "doc1", "value": {"rev": "4324BB"}},
    {"id": "doc2", "key": "doc2", "value": {"rev":"2441HF"}},
    {"id": "doc3", "key": "doc3", "value": {"rev":"74EC24"}}
  ]
}

Use the query argument descending=true to reverse the order of the output table:

Will return the same as before but in reverse order:

HTTP/1.1 200 OK
Date: Thu, 17 Aug 2006 05:39:28 +0000GMT
Content-Type: application/json
Connection: close

{
  "total_rows": 3, "offset": 0, "rows": [
    {"id": "doc3", "key": "doc3", "value": {"rev":"74EC24"}},
    {"id": "doc2", "key": "doc2", "value": {"rev":"2441HF"}},
    {"id": "doc1", "key": "doc1", "value": {"rev": "4324BB"}}
  ]
}

The query string parameters startkey, endkey and limit may also be used to limit the result set. For example:

GET /somedatabase/_all_docs?startkey="doc2"&limit=2 HTTP/1.0

Will return:

HTTP/1.1 200 OK
Date: Thu, 17 Aug 2006 05:39:28 +0000GMT
Content-Type: application/json
Connection: close

{
  "total_rows": 3, "offset": 1, "rows": [
    {"id": "doc2", "key": "doc2", "value": {"rev":"2441HF"}},
    {"id": "doc3", "key": "doc3", "value": {"rev":"74EC24"}}
  ]
}

Use endkey if you are interested in a specific range of documents:

GET /somedatabase/_all_docs?startkey="doc2"&endkey="doc3" HTTP/1.0

This will get keys inbetween and including doc2 and doc3; e.g. doc2-b and doc234.

Both approaches can be combined with descending:

GET /somedatabase/_all_docs?startkey="doc2"&limit=2&descending=true HTTP/1.0

Will return:

HTTP/1.1 200 OK
Date: Thu, 17 Aug 2006 05:39:28 +0000GMT
Content-Type: application/json
Connection: close

{
  "total_rows": 3, "offset": 1, "rows": [
    {"id": "doc3", "key": "doc3", "value": {"rev":"74EC24"}},
    {"id": "doc2", "key": "doc2", "value": {"rev":"2441HF"}}
  ]
}

If you add include_docs=true to a request to _all_docs not only metadata but also the documents themselves are returned.

_changes

This allows you to see all the documents that were updated and deleted, in the order these actions are done:

GET /somedatabase/_changes HTTP/1.0

Will return something of the form:

HTTP/1.1 200 OK
Date: Fri, 8 May 2009 11:07:02 +0000GMT
Content-Type: application/json
Connection: close

{"results":[
{"seq":1,"id":"fresh","changes":[{"rev":"1-967a00dff5e02add41819138abb3284d"}]},
{"seq":3,"id":"updated","changes":[{"rev":"2-7051cbe5c8faecd085a3fa619e6e6337"}]},
{"seq":5,"id":"deleted","changes":[{"rev":"2-eec205a9d413992850a6e32678485900"}],"deleted":true}
],
"last_seq":5}

All the view parameters work on _changes, such as startkey, include_docs etc. However, note that the startkey is exclusive when applied to this view. This allows for a usage pattern where the startkey is set to the sequence id of the last doc returned by the previous query. As the startkey is exclusive, the same document won't be processed twice.

See /database/_changes for more details.

Attachments

Documents can have attachments just like email. There are two ways to use attachments: the first one is a REST API that addresses individual attachments by URLs; the second is inline with your document.

A note on attachment names: Attachments may have embedded / characters that are sent unescaped to CouchDB. You can use this to provide a subtree of attachments under a document. A DocID must have any / escaped as %2F. So if you have document a/b/c with an attachment d/e/f.txt, you would be able to access it at http://couchdb/db/a%2fb%2fc/d/e/f.txt .

Standalone Attachments

CouchDB allows to create, change and delete attachments without touching the actual document.

You need to specify a MIME type using the Content-Type header. CouchDB will serve the attachment with the specified Content-Type when asked.

To create an attachment on an existing document:

PUT /somedatabase/document/attachment?rev=123 HTTP/1.0
Content-Length: 245
Content-Type: image/jpeg

<JPEG data>

Note: You'll need to include the "rev" query parameter if you're adding (or updating) an attachment on an existing document. This is true anytime you update a document. If you don't know the value of _rev on the document, you can retreive it with a lightweight HEAD request or by GETing the whole document.

CouchDB replies:

{"ok": true, "id": "document", "rev": "765B7D1C"}

Note that you can do this on a non-existing document. The document and attachment will be created implicitly for you. A revision id must not be specified in this case.

To change an attachment:

PUT /somedatabase/document/attachment?rev=765B7D1C HTTP/1.0
Content-Length: 245
Content-Type: image/jpeg

<JPEG data>

CouchDB replies:

{"ok": true, "id": "document", "rev": "766FC88G"}

To delete an attachment:

DELETE /somedatabase/document/attachment?rev=765B7D1C HTTP/1.0

CouchDB replies:

{"ok":true,"id":"document","rev":"519558700"}

To retrieve an attachment:

GET /somedatabase/document/attachment HTTP/1.0

CouchDB replies

Content-Type:image/jpeg

<JPEG data>

Inline Attachments

On creation, attachments go into a special _attachments attribute of the document. They are encoded in a JSON structure that holds the name, the content_type and the base64 encoded data of an attachment. A document can have any number of attachments.

When retrieving documents, the attachment's actual data is not included by default, only the metadata. The actual data can be fetched by a GET to the attachment's URL as described in the previous section.

Creating a document with an attachment:

{
  "_id":"attachment_doc",
  "_attachments":
  {
    "foo.txt":
    {
      "content_type":"text\/plain",
      "data": "VGhpcyBpcyBhIGJhc2U2NCBlbmNvZGVkIHRleHQ="
    }
  }
}

Please note that any base64 data you send has to be on a single line of characters, so pre-process your data to remove any carriage returns and newlines.

Requesting said document:

GET /database/attachment_doc

CouchDB replies:

{
  "_id":"attachment_doc",
  "_rev":1589456116,
  "_attachments":
  {
    "foo.txt":
    {
      "stub":true,
      "content_type":"text\/plain",
      "length":29
    }
  }
}

Note that the "stub":true attribute denotes that this is not the complete attachment. Also, note the length attribute added automatically. When you update the document you must include the attachment stubs or CouchDB will delete the attachment.

Requesting the attachment:

GET /database/attachment_doc/foo.txt

CouchDB returns:

This is a base64 encoded text

Automatically decoded!

Multiple Attachments

Creating a document with two attachments:

{
  "_id":"attachment_doc",
  "_attachments":
  {
    "foo.txt":
    {
      "content_type":"text\/plain",
      "data": "VGhpcyBpcyBhIGJhc2U2NCBlbmNvZGVkIHRleHQ="
    },

   "bar.txt":
    {
      "content_type":"text\/plain",
      "data": "VGhpcyBpcyBhIGJhc2U2NCBlbmNvZGVkIHRleHQ="
    }
  }
}

Alternatively, you can upload a document with attachments more efficiently in MIME multipart/related format. This avoids having to Base64-encode the attachments, saving CPU and bandwidth. To do this, set the "Content-Type" header of the PUT request to "multipart/related".

  • The first MIME body is the document itself, which should have its own Content-Type of "application/json". It should include an _attachments metadata object in which each attachment object has a key follows with value true.

  • The subsequent MIME bodies are the attachments. As of this writing (Dec. 2011) CouchDB ignores headers and identifies the attachments only by their order (corresponding to the order in which the metadata objects appear in the _attachments object.) This can be problematic as few non-Erlang JSON encoders allow you to specify the order in which keys are written out.

Here's an HTTP request that uploads a document with two attachments:

PUT /test_suite_db/multipart HTTP/1.1
Content-Type: multipart/related;boundary="abc123"

--abc123
content-type: application/json

{"body":"This is a body.",
"_attachments":{
  "foo.txt": {
    "follows":true,
    "content_type":"text/plain",
    "length":21
    },
  "bar.txt": {
    "follows":true,
    "content_type":"text/plain",
    "length":20
    },
  }
}
  
--abc123

this is 21 chars long
--abc123

this is 20 chars lon
--abc123--

Getting Attachments With a Document

If you need to fetch a document and all its attachments in one request, add an ?attachments=true URL parameter to the GET request. The resulting JSON will include the base64-encoded contents in the "data" property of each attachment, just as in an inline upload.

A more efficient way is to ask for a MIME multipart/related response. The benefit is that the attachments are not base64-encoded, and they aren't in the middle of the JSON so they're easier to stream directly to disk (especially if your platform has a MIME parsing library). To get this response format, just add an "Accept:" header to the request with value "multipart/related". (You still need the ?attachments=true parameter too.)

Unfortunately, at present (Dec. 2011) the multipart response has no headers on any of the attachment bodies, so the only way to tell which one is which is to match them with the order of the keys in the JSON "_attachments" object. This may not be easy in languages other than Erlang, as most JSON parsers don't preserve the order of keys.

Needless to say, this type of request can lead to arbitrarily large responses and can be expensive for the server. Don't use it unless you need to; if you only want one or two attachments, use individual GETs to fetch them.

Getting Only Changed Attachments

If you already have a local copy of an earlier revision of a document and its attachments, you may want to fetch only the attachments that have changed since a particular revision. To do this, add a ?atts_since parameter to the URL, whose value is a (URL-encoded) JSON array of one or more revision IDs. The response will include the content of only those attachments that changed since the given revision(s). (You can ask for this either in the default JSON or as multipart/related, as previously described.)

Incidentally, you can always find out in which revision an attachment last changed by looking at its metadata's revpos property, whose value is the numeric generation number (not the full revision ID) of the revision that introduced that version of the attachment.

Compression of Attachments

As of version 0.11, CouchDB, by default, will automatically compress certain attachment types. That is, based on the Content-Type header of the request CouchDB may perform compression of the data. This is done to reduce the amount of data being shuffled around during replication, and in most cases it's probably what you want. However, if uploading large files (e.g. a 200M CSV) you may want to tweak this configuration in order to avoid compression and therefore reduce the network latency of the request.

This setting can be found in the attachments section of the configuration.

ETags/Caching

CouchDB sends an ETag Header for document requests. The ETag Header is simply the document's revision in quotes.

For example, a GET request:

GET /database/123182719287

Results in a reply with the following headers:

Cache-Control: must-revalidate
Date: Wed, 12 Jan 2011 15:46:41 GMT
ETag: "1-f5db357d8a173066a562eee57eee0532"
Server: CouchDB/1.0.1 (Erlang OTP/R13B)
Content-Length: 126
Content-Type: text/plain;charset=utf-8

PUT and POST requests also return an ETag header for either newly created or updated documents. The If-None-Match request header is supported for GET and HEAD.

HTTP_Document_API (last edited 2013-06-05 21:58:30 by NathanVanderWilt)