Update semantics

(warning) Solr4.0 (I hope)

Please note that the features described here have not yet been committed.

Motivation

Solr is missing advanced features when using it as a NoSQL database and not just as a search index. When talking about using it as a NoSQL database instead of just as a search index, I primarily mean cases where you use Solr in a way where you have (potentially many) threads concurrently inserting, updating and deleting documents. Using Solr as a search index, is more about first indexing your entire world into Solr using one thread (or many threads, but without the possibility that they mess with data indexed by another thread), and afterwards solely using it for searching.

Some of the features missing are:

  • Insert semantics as we know it from RDBMSs: Do not insert a document if it already exists in Solr. A document is defined to exist in Solr, if a document with the same value in uniqueKey-field already exists. Very much like the following SQL does NOT insert (instead it fails with a unique-key-constraint error) if there is a unique key constraint on column "id" and a row with id=1234 already exists
INSERT INTO docs (id, column2, column3,...) VALUES (1234, value2, value3,...)
  • Update semantics with version control (for optimistic locking) as we know it from RDBMSs: Do not add document if the document does not already exist and do not update if it has been changed since it was loaded for update by the client doing the update. Very much like the following SQL does NOT update a row with id=1234 if the version of the document in the RDBMS at the time of update is not (any longer) 5678. This feature is used by popular O/R-mappers (like Hibernate) to provide a version-conflict error if the object (row/document) you loaded for update has changed since you loaded it when you try to store your updated version.
UPDATE docs SET column2=value2, column3=value3, ... WHERE id=1234 AND version=5678

Solution

Description

The above features could have been implemented by providing you with different ways of "updating" documents in Solr, than by using the "update-add-docs" operation. But instead the "update-add-docs" operation is still the only operation you have for inserting/updating documents in Solr, but now you have a way of controlling the exact semantics you want Solr to do behind the scenes. First of all you need to decide which sematics-mode you want to use - you have the following options

  • classic: Solr uses the same update semantics as it has always done, without failing on "unique key conflict" during create/insert and without failing on "version conflicts" during update. This is default, so out of the box Solr works as always.
  • consistency: You are forced to (indirectly) state if your intent is to insert or update. If your intent is to insert, the "update-add-docs" operation will fail if a document with the same uniqueKey-value already exists. If your intent is to update, the "update-add-docs" opeartion will fail if the document (a document with the same uniqueKey-value) does not already exist, or if the value of the version-field does not match the value in the already existing document. You state your intent by setting the value of the version field
    • version <= 0 (or not set): Intent is to insert
    • version > 0: Intent is to update
  • classic-consistency-hybrid: An hybrid between classic and consistency. Only difference from consistency is that you get classic semantics if you set version to 0 (or dont set it)

Errors

When using consistency or the consistency-features of classic-consistency-hybrid (setting version to something different than 0) the update of single documents can fail in a non-fatal way, while the update of other documents in the same request succeeded. Errors are sent back to the Solr client in the response, and it is up to the client to react in a resonable way. The information in a single error consist of

  • A code: Corresponding to a HTTP reponse status code
  • A type: The type of error occured. It consists of
    • A namespace: The context of the type
    • A name: The name of the type, uniquely identifying the error (at least within the namespace)
  • A message: Some additional text describing details about the error
  • A part reference: A reference to the document in the update-request to which this error relates. It is called "part reference" instead of "document reference" because the error propagation method used is designed to be usable for reporting all kind of partial errors during the handling of requests. The "part reference" is only present in the error if the request contained multiple parts (multiple documents)

A request can result in zero, one or many errors.

As you know by now, in order to link errors in responses properly with the documents in the request, you need to also add a "part reference" to all of you documents in the request. If you dont explicitly provide a "part reference" for a document in a multi-document request, and the handling of this particular document results in an error, the "part reference" in the response will just be a random UUID and you will not be able to match errors in the response with documents in the request.

In the context of "update-add-docs" requests, the following error-codes/error-types are relevant

  • Error-code=409, error-namespace=org.apache.solr.common.partialerrors.update, error-name=DocumentDoesNotExist: Indicating that the document you tried to consistency-update does not exist (anymore)
  • Error-code=409, error-namespace=org.apache.solr.common.partialerrors.update, error-name=DocumentAlreadyExists: Indicating that the document you tried to consistency-create already exists (or at least a document with the same uniqueKey value)
  • Error-code=409, error-namespace=org.apache.solr.common.partialerrors.update, error-name=VersionConflict: Indicating that the document you tried to consistency-update has changed since you fetched it for update (version number has changed)
  • Error-code=422, error-namespace=org.apache.solr.common.partialerrors, error-name=WrongUsage: Indicating that you are using the features in a wrong way - e.g. if you try to do a consistency-insert/update but there is no uniqueKey defined in your Solr schema or no value for the uniqueKey-field of the document is specified in the request.

Using it

This section describes how to use the features as a Solr user.

Configuration of Solr server

You control the semantics-mode by adding a semanticsMode tag inside your DirectUpdateHandler2-based updateHandler. In solrconfig.xml:
{{{#!xml
<updateHandler class="solr.DirectUpdateHandler2">
...
<semanticsMode>put classic, consistency or classic-consistency-hybrid here</semanticsMode>
...
</updateHandler>
}}}
classic is default if you dont add a semanticsMode tag.

To use consistency or the consistency-features of classic-consistency-hybrid (really no reason to configure classic-consistency-hybrid if you plan to never use the consistency-features (always sending 0 as version)) there are a few additional requirements to your Solr schema (schema.xml) and configuration (solrconfig.xml).

  • You need to have a uniqueKey-field in your schema. In schema.xml e.g.:
    {{

    Unknown macro: {#!xml <field name="id" type="string" indexed="true" stored="true" required="true"/> <uniqueKey>id</uniqueKey>}}

    }

  • If you want to do consistency-updates including version control (sending values for version bigger than 0), you need to have a "version"-field in your schema. In schema.xml:
    {{

    Unknown macro: {#!xml <field name="_version_" type="long" indexed="true" stored="true" />}}

    }

  • You need to enable updateLog in your DirectUpdateHandler2-based updateHandler. In solrconfig.xml:
    {{{#!xml
    <updateHandler class="solr.DirectUpdateHandler2">
    ...
    <updateLog class="solr.FSUpdateLog">
    <str name="dir">${solr.data.dir:}</str>
    </updateLog>
    ...
    </updateHandler>
    }}}

Raw HTTP requests and responses

This subsection describes the relevant parts of content and structure of the raw HTTP request and response. This is especially interesting if you are not using a java-based client.

Constructing requests

XML

Provide "part references" and version-field-values in your XML documents (see more here) like this:

{{{#!xml
<add>
<doc partref="refA">
... set doc fields ...
<field name="version">-1</field>
</doc>
... add other docs ...
<doc partref="refN">
... set doc fields ...
<field name="version">1234567890</field>
</doc>
</add>
}}}

JSON

Provide "part references" and version-field-values in your JSON documents (see more here) like this:

{{

Unknown macro: {#!json\[ { 'nonfield.partref' }

}

The nonfield.partref can be placed anywhere among the other fields - not just as the first entry.

CSV

Provide "part references" and version-field-values in your CSV documents (see more here) like this:

nonfield.partref,... set doc field names ...,_version_
refA,... set doc field values ...,-1
... add other docs ...
refN,... set doc field values ...,1234567890

The nonfield.partref can be placed anywhere among the other fields - not just as the first entry.

Checking responses for partial errors

Status line and response header

====== General ======

Of course, if the update of all documents in the request succeeds, the HTTP response status line will look like this

HTTP/1.1 200 OK

If errors occur the HTTP response status line will in general look like this

HTTP/1.1 <error-code> <error-message>

and a response header like this will be included

X-solr-error-type: <error-type>

====== Single document updates ======

If you only sent one document for update the error which occured handling that single document will be encoded in the HTTP response status line (code and message) and the "X-solr-error-type" header. Example

HTTP/1.1 409 Attempt to update (_version_ > 0 specified explicitly in document) document failed. Document does not exist
...
X-solr-error-type: org.apache.solr.common.partialerrors.update.DocumentDoesNotExist

====== Multi document updates ======

If you sent multiple documents for update and the handling of some (might be all) of them resulted in errors the HTTP response status line and "X-solr-error-type" header will look like this

HTTP/1.1 412 Some parts of the request resulted in errors. Need to check response for partial errors. Documents sent for update with no corresponding partial error succeeded.
...
X-solr-error-type: org.apache.solr.common.partialerrors.PartialErrors

======= Body =======

In this subsection lets pretend we sent a multi-document request like the one shown (several times in different formats) above, and that all documents succeeded except the two with "part reference" refA and refN. The body of the HTTP response will look like show in the following subsections

======== XML ========

{{{#!xml
<?xml version="1.0" encoding="UTF-8"?>
<response>
... responseHeader etc. ...
<arr name="partialerrors">
<lst>
<int name="error-code">409</int>
<str name="error-type">org.apache.solr.common.partialerrors.update.DocumentAlreadyExists</str>
<str name="error-msg">Attempt to insert (version <= 0 specified explicitly in document) document failed. Document already exists</str>
<str name="partRef">refA</str>
</lst>
<lst>
<int name="error-code">409</int>
<str name="error-type">org.apache.solr.common.partialerrors.update.DocumentDoesNotExist</str>
<str name="error-msg">Attempt to update (version > 0 specified explicitly in document) document failed. Document does not exist</str>
<str name="partRef">refN</str>
</lst>
</arr>
</response>
}}}

======== JSON ========

{{{#!json
{
... responseHeader etc. ...,
"partialerrors":[
{"error-code":409,
"error-type":"org.apache.solr.common.partialerrors.update.DocumentAlreadyExists",
"error-msg":"Attempt to insert (version <= 0 specified explicitly in document) document failed. Document already exists",
"partRef":"refA"
},{
"error-code":409,
"error-type":"org.apache.solr.common.partialerrors.update.DocumentDoesNotExist",
"error-msg":"Attempt to update (version > 0 specified explicitly in document) document failed. Document does not exist",
"partRef":"refN"
}
]
}
}}}

======== Other types ========

By now you should have gotten the picture and be able to figure our how partial-errors will be encoded in the HTTP response body if you requested response as Ruby or PHP or Python or ...

SolrJ requests and responses

If you are using a java-based client, you do not need to know that much about the details of the raw HTTP communication as described above. The SolrJ client framework is there to help you

Constructing requests

List<SolrInputDocument> docs = new ArrayList<SolrInputDocuments>();
SolrInputDocument docA = new SolrInputDocument();
//... set docA fields ...
docA.addField(SolrInputDocument.VERSION_FIELD, -1);
docs.add(docA);
//... setup other docs ...
SolrInputDocument docN = new SolrInputDocument();
//... set docN fields ...
docN.addField(SolrInputDocument.VERSION_FIELD, 1234567890);
docs.add(docN);

Note that it is not normal (as in the example above) to create a new SolrInputDocuments with version (SolrInputDocument.VERSION_FIELD) field set to a hardcoded number above 0 (indicating update and not insert). A SolrInputDocument with version above 0 will usually have been populated from a document fetched (search og realtime-get) from Solr for modification and restorage (update). No need to deal explicitly with "part references" - SolrInputDocument will handle it automatically for you.

Sending requests

SolrServer server = //... somehow you have a SolrJ client (instance of SolrServer) ...

UpdateResponse response = server.add(docs, ... your SolrParams ...);

Checking responses for partial errors

If you send multiple documents in your request it is possible that the insert/update-operation will fail for some documents (due to "unique key constraints", "version checking" etc) while it will not for other documents. Therefore you need to deal with partial errors

UpdateResponse response;
try {
    response = server.add(docs, ... your SolrParams ...);
} catch (org.apache.solr.common.partialerrors.PartialErrors e) {
    response = (UpdateResponse)e.getSpecializedResponse();
    DocumentUpdatePartialError err;
    err = response.getPartialError(docA);
    //... if and only if err is not null the insert/update of docA failed ...
    //... check for errors for other docs ...
    err = response.getPartialError(docN);
    //... if and only if err is not null the insert/update of docN failed ...
}

The possible classes (subclasses of DocumentUpdatePartialError) of err are java-Exception classes correspondig to the error-types mentioned above, where the package of the class corresponds to the error-type-namespace and where the name of the class correspond to the error-type-name.

If you only send one document in your request you can catch the java-Exception corresponding to the error-type directly

try {
    UpdateResponse response = server.add(docA, ... your SolrParams ...);
} catch (org.apache.solr.common.partialerrors.update.DocumentDoesNotExist e) {
    //... do something ...
} catch (org.apache.solr.common.partialerrors.update.DocumentAlreadyExists e) {
    //... do something ...
} catch (org.apache.solr.common.partialerrors.update.VersionConflict e) {
    //... do something ...
} catch (org.apache.solr.common.partialerrors.WrongUsage e) {
    //... do something ...
}
  • No labels