<!> Solr4.2

(warning) NOT verified in SolrCloud! This code has not been tested in SolrCloud setups! (warning)

Overview

Considerable work has been done to move Solr towards being suitable for a large number of homogeneous cores where you require fast/frequent loading/unloading of cores. This page describes the current state of the code as of Solr 4.4/5.0

The requirements of such a system are:

  1. Very efficient loading of cores - Solr would be more efficient if it did not have to read and parse and create Schema, SolrConfig objects for each core every time the core has to be loaded (deferred).
  2. Lazy load cores - Provide a way to START/STOP core.
  3. Automatic loading of cores - Start a core automatically if a request comes in for a "stopped" core.
  4. LRU Core Loading/Unloading - As there are a large number of cores, all the cores cannot be kept loaded always. There has to be an upper limit beyond which we need to unload a few cores.
  5. Allowing a cores to be defined in a tree structure - If the number of cores is too high, all the cores' dataDirs cannot live in the same directory. There is an upper limit on the number of directories you can create in a directory w/o affecting performance.

Configuration

We are going to "core discovery" mode for defining cores, see the page Solr.xml 4.4 and beyond. The basic idea is that we're removing the <cores> and individual <core> tags from solr.xml, those tags will be unsupported as of 5.0. Instead, we'll start from <solr_home>, walk the directory tree looking for "core.properties" files which will define the location of each core. The parameters that define how this feature works will therefore be in different places depending on the mode.

There are two new attributes of a core (defaults in bold) and one new attribute for controlling how many transient cores are loaded at once.

So the idea is that there's really no reason to tie in "lazy loading" with whether the core can be swapped out or not, so by splitting up the two options we give the user control over how these are handled. Use cases below:

old-style

The following configuration applies to the patch given in SOLR-1293.

<?xml version='1.0' encoding='UTF-8'?>
<solr persistent='true'>
  <cores adminPath="/admin/cores"
          transientCacheSize="4"
          adminHandler="org.apache.solr.handler.admin.LotsOfCoresAdminHandler"
          shareSchema="true">
    <core name="core0" instanceDir="/opt/solr" loadOnStartup="false" transient="true"/>
  </cores>
</solr>

new-style (core discovery)

<?xml version='1.0' encoding='UTF-8'?>
<solr persistent='true'>
  much omitted
  <int name="transientCacheSize">64</int>
</solr>

Then in an individual core.properties file

loadOnStartup=true|false
transient=true|false

From the original discussion

Hmmm, haven't thought about the various status commands very deeply. There is an update to the 'status' command. Adding a parameter 'verbose=false' will return a minimal status report of the cores. The default status command uses Luke on the core's index to get very detailed information which is expensive if the status is queried very frequently.

Further work

Issues for reference, the work is done

Other features which may be needed for such a system include: