It can be confusing at times to determine how to use the IndexReader, IndexWriter, and IndexSearcher. The semantics of some of the methods are tricky as in a sense when you create one of these objects you're starting a transaction and isolated from what is happening in other objects.

If you know you're going to update multiple documents, then the fastest approach is to batch things, e.g.:

  1. Open reader;
  2. Delete all old documents;
  3. Close reader;
  4. Open writer;
  5. Add all new documents;
  6. Close writer.

If, before step one, you open another IndexReader, then you can continue to use it for searches while the update is in progress. If you then, after step six, open a new IndexReader to use for searches, then no searches will ever see the intermediate state when documents have been deleted but not yet re-added.

If you're doing updates (as opposed to just additions) then you probably want to do something like:

  1. keep a single open IndexReader used by all searches

  2. Every few minutes, process updates as follows:
    1. open a second IndexReader

    2. delete all documents that will be updated
    3. close this IndexReader, to flush deletions

    4. open an IndexWriter

    5. add all documents that are updated
    6. close the IndexWriter

    7. replace the IndexReader used for searches (1, above)

Here are some links where these ideas came from:

http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene-user@jakarta.apache.org&msgNo=7191

http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene-user@jakarta.apache.org&msgId=1190557

http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene-user@jakarta.apache.org&msgNo=3206

UpdatingAnIndex (last edited 2009-09-20 21:47:55 by localhost)