<!> Solr1.5

Overview

Solr, currently, is not very suitable for a large number of homogeneous cores where you require fast/frequent loading/unloading of cores. Usually a core is required to be loaded just to fire a search query or to just index one document.

The requirements of such a system are:

  1. Very efficient loading of cores - Solr cannot afford to read and parse and create Schema, SolrConfig objects for each core every time the core has to be loaded

  2. Lazy load cores - Provide a way to START/STOP core.
  3. Automatic loading of cores - Start a core automatically if a request comes in for a "stopped" core.
  4. LRU Core Loading/Unloading - As there are a large number of cores, all the cores cannot be kept loaded always. There has to be an upper limit beyond which we need to unload a few cores.
  5. Automatic allotment of dataDir for cores - If the number of cores is too high, all the cores' dataDirs cannot live in the same directory. There is an upper limit on the number of directories you can create in a directory w/o affecting performance.

Issues

Other features which may be needed for such a system include:

Configuration

The following configuration applies to the patch given in SOLR-1293.

<?xml version='1.0' encoding='UTF-8'?>
<solr persistent='true'>
  <cores adminPath="/admin/cores"
          maxCores="4"
          adminHandler="org.apache.solr.handler.admin.LotsOfCoresAdminHandler"
          shareSchema="true"
          shareConfig="true"
          baseDataDir="/opt/solr/data"
          numBuckets="4"
          commonInstanceDir="/opt/solr"
          cleanOnUnload="true">
    <core name="core0" instanceDir="/opt/solr" loadOnStart="false"/>
  </cores>
</solr>

Common Properties

With the above configuration, the only parameter required for creating a core is the core name.

Per-Core Properties

New CoreAdmin Commands

LotsOfCoresAdminHandler supports two new core admin commands:

Example: http://host:80/admin/cores?action=start

There is an update to the 'status' command. Adding a parameter 'verbose=false' will return a minimal status report of the cores. The default status command uses Luke on the core's index to get very detailed information which is expensive if the status is queried very frequently.

Further work

LotsOfCores (last edited 2009-10-29 12:16:42 by ShalinMangar)