XML Messages for Updating a Solr Index

Solr accepts POSTed XML messages that Add/Replace, Commit, Delete, and Delete by query, using the url /update (there is also a CSV interface). Here is the XML syntax that Solr expects to see:

The Update Schema

(Not to be confused with schema.xml.)

add/replace documents

Simple Example:

Subversion contains many complex examples of <add> document messages.

Note: multiple documents may be specified in a single <add> command.

Optional attributes for "add"

Optional attributes on "doc"

Optional attributes for "field"

Examples of adding docs with various optional attributes

Example of "add" with optional boost attribute:

Example of "add" with optional update attribute:

Example of "add" with optional update attribute to set multiple values on a multi-valued field:

Example of "add" with optional update attribute to set a field to null (i.e. delete a field):

"commit" and "optimize"

A commit operation makes index changes visible to new search requests. A hard commit also calls fsync on the index files to ensure they have been flushed to stable storage and no data loss will result from a power failure.

A soft commit is much faster since it only makes index changes visible and does not fsync index files or write a new index descriptor. If the JVM crashes or there is a loss of power, changes that occurred after the last hard commit will be lost. Search collections that have near-real-time requirements (that want index changes to be quickly visible to searches) will want to soft commit often but hard commit less frequently.

An optimize is like a hard commit except that it forces all of the index segments to be merged into a single segment first. Depending on the use cases, this operation should be performed infrequently (like nightly), if at all, since it is very expensive and involves reading and re-writing the entire index. Segments are normally merged over time anyway (as determined by the merge policy), and optimize just forces these merges to occur immediately.

Example:

Optional attributes for "commit" and "optimize"

Optional attributes for "commit"

Optional attributes for "optimize"

Example of "commit" and "optimize" with optional attributes

Passing commit and commitWithin parameters as part of the URL

Update handlers can also get commit related parameters as part of the update URL. This example adds a small test document and causes an explicit commit to happen immediately after:

curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" --data-binary '<add><doc><field name="id">testdoc</field></doc></add>'

This example will cause the index to be optimized down to at most 10 segments, but won't wait around until it's done (waitFlush=false):

curl 'http://localhost:8983/solr/update?optimize=true&maxSegments=10&waitFlush=false'

<!> Solr3.4 This example adds a small test document with a CommitWithin instruction which tells Solr to make sure the document is committed no later than 10 seconds later (this method is generally preferred over explicit commits):

curl http://localhost:8983/solr/update?commitWithin=10000 -H "Content-Type: text/xml" --data-binary '<add><doc><field name="id">testdoc</field></doc></add>'

"delete" documents by ID and by Query

Delete by id deletes the document with the specified ID. (ID here means the value of the uniqueKey field declared in the schema (in these examples, employeeId).

Delete by query deletes all the documents that match the specified query.

Example:

Note: The "delete by query" uses the Lucene query parser by default, so if you're trying to understand the results of delete by query, you might submit a URL like this:

In Solr 1.2, delete query is much less efficient than delete by id, because Solr has to do much of the commit logic each time it receives a delete by query request. In Solr 1.3, however, most of the overhead will have been removed.

<!> Solr1.4 Both delete by id and delete by query can be specified at the same time.

Example:

Optional attributes for "delete"

"rollback"

<!> Solr1.4 Expert:

Example:

The rollback command rollbacks all add/deletes made to the index since the last commit. It neither calls any event listeners nor creates a new searcher. This is an expert-level API that should only be used if the application is taking complete responsibility for update concurrency, replication, and sharding.

"prepareCommit"

<!> Solr4.0 Expert:

The prepareCommit command is an expert-level API that calls Lucene's IndexWriter.prepareCommit().

Example:

curl 'http://localhost:8983/solr/update?prepareCommit=true'

Updating a Data Record via curl

You can use curl to send any of the above commands. For example:

curl http://<hostname>:<port>/solr/update -H "Content-Type: text/xml" --data-binary '<add>
<doc boost="2.5"> <field name="employeeId">05991</field>
<field name="office" boost="2.0">Bridgewater</field> </doc> </add>'

curl http://<hostname>:<port>/solr/update -H "Content-Type: text/xml" --data-binary '<commit waitFlush="false" waitSearcher="false"/>'

Until a commit has been issued, you will not see any of the data in searches either on the master or the slave. After a commit has been issued, you will see the results on the master, then after a snapshot has been pulled by the slave, you will see it there also.

Updating via GET

Short update requests can also be sent using a GET request (needs to be url-encoded) like:

(delete specific doc)
http://localhost:8983/solr/update?stream.body=%3Cdelete%3E%3Cquery%3Eoffice:Bridgewater%3C/query%3E%3C/delete%3E 

(delete all docs)
http://localhost:8983/solr/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E  

(commit)
http://localhost:8983/solr/update?stream.body=%3Ccommit/%3E

Add and delete in a single batch

Mixing add and delete elements in a single batch will throw an Illegal to have multiple roots (start tag in epilog?) exception (SOLR-2277):

curl http://127.0.0.1:8983/solr/update/?commit=true -H "Content-Type: text/xml" --data-binary '<add><doc><field name="id">17</field></doc></add><delete><id>1234</id></delete>'; 

Instead, the add and delete elements must be enclosed in within an update element:

curl http://127.0.0.1:8983/solr/update/?commit=true -H "Content-Type: text/xml" --data-binary '<update><add><doc><field name="id">17</field></doc></add><delete><id>1234</id></delete></update>';

UpdateXmlMessages (last edited 2013-09-26 20:56:56 by MichaelLabib)