Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: cleaned up some of the language, added docValues=false to the last section.

...

(warning) Using Solr as a data source to build a new index is only possible if all fields (except perhaps copyField destinations) must have stored=true in their field definitions in the schemathe index meets the requirements for Atomic Update.  Please see the newest reference guide for full details on exactly what that means.

If you have no other choice but to use a Solr index as the data source for another index, and you have stored every field except perhaps copyField destinations, you have a few possible options:

...

Above we said "don't use Solr itself as a datasource" ... but one way to deal with data availability problems is to set up a completely separate Solr instance whose schema has install with a config designed for ONLY data storage, not searching.  The schema would have stored="true" and indexed, indexed="false", and docValues="false" for all fields, and would only use basic types like int, long, and string. It would not have any copyFields. Of course you should put extensive verification in place to guarantee the integrity of that "intermediate" Solr instanceYou would need to make sure that the separate Solr install receives all new documents and changes that are applied to your primary Solr installation.

This is the approach used by a large and very well-known library organization for their Solr installation, because getting access to the source databases for the individual entities within the organization is very difficult. This way they can reindex the online Solr at any time without having to get special permission from all those entities. When they index new content, it goes into a copy of Solr both their primary installation as well as the installation configured for storage only, not in-depth searching.

In this scenario you may have to re-index *twice* in order to utilize that method. Once to your intermediate Solr server(s), then from there to your server(s) that you're using for search.

...