Update semantics

<!> Solr4.0 (I hope)

Please note that the features described here have not yet been committed.

Motivation

Solr is missing advanced features when using it as a NoSQL database and not just as a search index. When talking about using it as a NoSQL database instead of just as a search index, I primarily mean cases where you use Solr in a way where you have (potentially many) threads concurrently inserting, updating and deleting documents. Using Solr as a search index, is more about first indexing your entire world into Solr using one thread (or many threads, but without the possibility that they mess with data indexed by another thread), and afterwards solely using it for searching.

Some of the features missing are:

INSERT INTO docs (id, column2, column3,...) VALUES (1234, value2, value3,...)

UPDATE docs SET column2=value2, column3=value3, ... WHERE id=1234 AND version=5678

Solution

Description

The above features could have been implemented by providing you with different ways of "updating" documents in Solr, than by using the "update-add-docs" operation. But instead the "update-add-docs" operation is still the only operation you have for inserting/updating documents in Solr, but now you have a way of controlling the exact semantics you want Solr to do behind the scenes. First of all you need to decide which sematics-mode you want to use - you have the following options

Errors

When using consistency or the consistency-features of classic-consistency-hybrid (setting _version_ to something different than 0) the update of single documents can fail in a non-fatal way, while the update of other documents in the same request succeeded. Errors are sent back to the Solr client in the response, and it is up to the client to react in a resonable way. The information in a single error consist of

A request can result in zero, one or many errors.

As you know by now, in order to link errors in responses properly with the documents in the request, you need to also add a "part reference" to all of you documents in the request. If you dont explicitly provide a "part reference" for a document in a multi-document request, and the handling of this particular document results in an error, the "part reference" in the response will just be a random UUID and you will not be able to match errors in the response with documents in the request.

In the context of "update-add-docs" requests, the following error-codes/error-types are relevant

Using it

This section describes how to use the features as a Solr user.

Configuration of Solr server

You control the semantics-mode by adding a semanticsMode tag inside your DirectUpdateHandler2-based updateHandler. In solrconfig.xml:

  <updateHandler class="solr.DirectUpdateHandler2">
    ...
    <semanticsMode>put classic, consistency or classic-consistency-hybrid here</semanticsMode>
    ...
  </updateHandler>

classic is default if you dont add a semanticsMode tag.

To use consistency or the consistency-features of classic-consistency-hybrid (really no reason to configure classic-consistency-hybrid if you plan to never use the consistency-features (always sending 0 as _version_)) there are a few additional requirements to your Solr schema (schema.xml) and configuration (solrconfig.xml).

 <field name="id" type="string" indexed="true" stored="true" required="true"/>
 <uniqueKey>id</uniqueKey>

 <field name="_version_" type="long" indexed="true" stored="true" />

 <updateHandler class="solr.DirectUpdateHandler2">
    ...
    <updateLog class="solr.FSUpdateLog">
      <str name="dir">${solr.data.dir:}</str>
    </updateLog>
    ...
 </updateHandler>

Raw HTTP requests and responses

This subsection describes the relevant parts of content and structure of the raw HTTP request and response. This is especially interesting if you are not using a java-based client.

Constructing requests

XML

Provide "part references" and _version_-field-values in your XML documents (see more here) like this:

<add>
  <doc partref="refA">
    ... set doc fields ...
    <field name="_version_">-1</field>
  </doc>
  ... add other docs ...
  <doc partref="refN">
    ... set doc fields ...
    <field name="_version_">1234567890</field>
  </doc>
</add>

JSON

Provide "part references" and _version_-field-values in your JSON documents (see more here) like this:

[
 { 'nonfield.partref' : 'refA' ... set doc fields ..., '_version_' : -1}
 ... add other docs ...
 { 'nonfield.partref' : 'refN' ... set doc fields ..., '_version_' : 1234567890}
]

The nonfield.partref can be placed anywhere among the other fields - not just as the first entry.

CSV

Provide "part references" and _version_-field-values in your CSV documents (see more here) like this:

nonfield.partref,... set doc field names ...,_version_
refA,... set doc field values ...,-1
... add other docs ...
refN,... set doc field values ...,1234567890

The nonfield.partref can be placed anywhere among the other fields - not just as the first entry.

Checking responses for partial errors

Status line and response header

General

Of course, if the update of all documents in the request succeeds, the HTTP response status line will look like this

HTTP/1.1 200 OK

If errors occur the HTTP response status line will in general look like this

HTTP/1.1 <error-code> <error-message>

and a response header like this will be included

X-solr-error-type: <error-type>

Single document updates

If you only sent one document for update the error which occured handling that single document will be encoded in the HTTP response status line (code and message) and the "X-solr-error-type" header. Example

HTTP/1.1 409 Attempt to update (_version_ > 0 specified explicitly in document) document failed. Document does not exist
...
X-solr-error-type: org.apache.solr.common.partialerrors.update.DocumentDoesNotExist

Multi document updates

If you sent multiple documents for update and the handling of some (might be all) of them resulted in errors the HTTP response status line and "X-solr-error-type" header will look like this

HTTP/1.1 412 Some parts of the request resulted in errors. Need to check response for partial errors. Documents sent for update with no corresponding partial error succeeded.
...
X-solr-error-type: org.apache.solr.common.partialerrors.PartialErrors

Body

In this subsection lets pretend we sent a multi-document request like the one shown (several times in different formats) above, and that all documents succeeded except the two with "part reference" refA and refN. The body of the HTTP response will look like show in the following subsections

XML

<?xml version="1.0" encoding="UTF-8"?>
<response>
  ... responseHeader etc. ...
  <arr name="partialerrors">
    <lst>
      <int name="error-code">409</int>
      <str name="error-type">org.apache.solr.common.partialerrors.update.DocumentAlreadyExists</str>
      <str name="error-msg">Attempt to insert (_version_ &lt;= 0 specified explicitly in document) document failed. Document already exists</str>
      <str name="partRef">refA</str>
    </lst>
    <lst>
      <int name="error-code">409</int>
      <str name="error-type">org.apache.solr.common.partialerrors.update.DocumentDoesNotExist</str>
      <str name="error-msg">Attempt to update (_version_ &gt; 0 specified explicitly in document) document failed. Document does not exist</str>
      <str name="partRef">refN</str>
    </lst>
  </arr>
</response>

JSON

{
  ... responseHeader etc. ...,
  "partialerrors":[
    {"error-code":409,
     "error-type":"org.apache.solr.common.partialerrors.update.DocumentAlreadyExists",
     "error-msg":"Attempt to insert (_version_ <= 0 specified explicitly in document) document failed. Document already exists",
     "partRef":"refA"
    },{
     "error-code":409,
     "error-type":"org.apache.solr.common.partialerrors.update.DocumentDoesNotExist",
     "error-msg":"Attempt to update (_version_ > 0 specified explicitly in document) document failed. Document does not exist",
     "partRef":"refN"
    }
  ]
}

Other types

By now you should have gotten the picture and be able to figure our how partial-errors will be encoded in the HTTP response body if you requested response as Ruby or PHP or Python or ...

SolrJ requests and responses

If you are using a java-based client, you do not need to know that much about the details of the raw HTTP communication as described above. The SolrJ client framework is there to help you

Constructing requests

   1 List<SolrInputDocument> docs = new ArrayList<SolrInputDocuments>();
   2 SolrInputDocument docA = new SolrInputDocument();
   3 //... set docA fields ...
   4 docA.addField(SolrInputDocument.VERSION_FIELD, -1);
   5 docs.add(docA);
   6 //... setup other docs ...
   7 SolrInputDocument docN = new SolrInputDocument();
   8 //... set docN fields ...
   9 docN.addField(SolrInputDocument.VERSION_FIELD, 1234567890);
  10 docs.add(docN);

Note that it is not normal (as in the example above) to create a new SolrInputDocuments with _version_ (SolrInputDocument.VERSION_FIELD) field set to a hardcoded number above 0 (indicating update and not insert). A SolrInputDocument with _version_ above 0 will usually have been populated from a document fetched (search og realtime-get) from Solr for modification and restorage (update). No need to deal explicitly with "part references" - SolrInputDocument will handle it automatically for you.

Sending requests

   1 SolrServer server = //... somehow you have a SolrJ client (instance of SolrServer) ...
   2 
   3 UpdateResponse response = server.add(docs, ... your SolrParams ...);

Checking responses for partial errors

If you send multiple documents in your request it is possible that the insert/update-operation will fail for some documents (due to "unique key constraints", "version checking" etc) while it will not for other documents. Therefore you need to deal with partial errors

   1 UpdateResponse response;
   2 try {
   3     response = server.add(docs, ... your SolrParams ...);
   4 } catch (org.apache.solr.common.partialerrors.PartialErrors e) {
   5     response = (UpdateResponse)e.getSpecializedResponse();
   6     DocumentUpdatePartialError err;
   7     err = response.getPartialError(docA);
   8     //... if and only if err is not null the insert/update of docA failed ...
   9     //... check for errors for other docs ...
  10     err = response.getPartialError(docN);
  11     //... if and only if err is not null the insert/update of docN failed ...
  12 }

The possible classes (subclasses of DocumentUpdatePartialError) of err are java-Exception classes correspondig to the error-types mentioned above, where the package of the class corresponds to the error-type-namespace and where the name of the class correspond to the error-type-name.

If you only send one document in your request you can catch the java-Exception corresponding to the error-type directly

   1 try {
   2     UpdateResponse response = server.add(docA, ... your SolrParams ...);
   3 } catch (org.apache.solr.common.partialerrors.update.DocumentDoesNotExist e) {
   4     //... do something ...
   5 } catch (org.apache.solr.common.partialerrors.update.DocumentAlreadyExists e) {
   6     //... do something ...
   7 } catch (org.apache.solr.common.partialerrors.update.VersionConflict e) {
   8     //... do something ...
   9 } catch (org.apache.solr.common.partialerrors.WrongUsage e) {
  10     //... do something ...
  11 }

Per Steffensen/Update semantics (last edited 2012-07-05 14:14:47 by Per Steffensen)