<!> Solr4.2

Overview

Solr, currently, is not very suitable for a large number of homogeneous cores where you require fast/frequent loading/unloading of cores. Usually a core is required to be loaded just to fire a search query or to just index one document.

The requirements of such a system are:

  1. Very efficient loading of cores - Solr cannot afford to read and parse and create Schema, SolrConfig objects for each core every time the core has to be loaded

  2. Lazy load cores - Provide a way to START/STOP core.
  3. Automatic loading of cores - Start a core automatically if a request comes in for a "stopped" core.
  4. LRU Core Loading/Unloading - As there are a large number of cores, all the cores cannot be kept loaded always. There has to be an upper limit beyond which we need to unload a few cores.
  5. Automatic allotment of dataDir for cores - If the number of cores is too high, all the cores' dataDirs cannot live in the same directory. There is an upper limit on the number of directories you can create in a directory w/o affecting performance. Erick Erickson claims that this is taken care of by file walking, the resulting tree structure can be as deep as required to limit the number of files in any particular directory, so all cores live under <solr_home>.

Issues

Other features which may be needed for such a system include:

Configuration

As I'm digging into this, things are changing. What follows is fluid, it may change as this progresses.

There are two new attributes of a <core> tag (defaults in bold) and one new attribute for <cores>

So the idea is that there's really no reason to tie in "lazy loading" with whether the core can be swapped out or not, so by splitting up the two options we give the user control over how these are handled. Use cases below:

The following configuration applies to the patch given in SOLR-1293.

<?xml version='1.0' encoding='UTF-8'?>
<solr persistent='true'>
  <cores adminPath="/admin/cores"
          transientCacheSize="4"
          adminHandler="org.apache.solr.handler.admin.LotsOfCoresAdminHandler"
          shareSchema="true"
          shareConfig="true">
    <core name="core0" instanceDir="/opt/solr" loadOnStartup="false" transient="true"/>
  </cores>
</solr>

Persistence

This is a sticky wicket. As currently written, the Solr.xml file has a global 'persist="true|false"' option. The base problem is maintenance.

From the original page, under discussion

Hmmm, haven't thought about the various status commands very deeply. There is an update to the 'status' command. Adding a parameter 'verbose=false' will return a minimal status report of the cores. The default status command uses Luke on the core's index to get very detailed information which is expensive if the status is queried very frequently.

Further work

= status = As I mentioned, this is still very fluid. Please feel free to make comments, either on the dev list, via the JIRAS above etc.

LotsOfCores (last edited 2013-04-14 15:35:21 by ErickErickson)