Contents
Overview
Solr, currently, is not very suitable for a large number of homogeneous cores where you require fast/frequent loading/unloading of cores. Usually a core is required to be loaded just to fire a search query or to just index one document.
The requirements of such a system are:
Very efficient loading of cores - Solr cannot afford to read and parse and create Schema, SolrConfig objects for each core every time the core has to be loaded
- Lazy load cores - Provide a way to START/STOP core.
- Automatic loading of cores - Start a core automatically if a request comes in for a "stopped" core.
- LRU Core Loading/Unloading - As there are a large number of cores, all the cores cannot be kept loaded always. There has to be an upper limit beyond which we need to unload a few cores.
- Automatic allotment of dataDir for cores - If the number of cores is too high, all the cores' dataDirs cannot live in the same directory. There is an upper limit on the number of directories you can create in a directory w/o affecting performance.
Issues
SOLR-1293 - Support for large number of cores and faster loading/unloading of cores. This issue has many child issues focusing on individual changes:
SOLR-919 - Cache and reuse SolrConfig
SOLR-920 - Cache and reuse IndexSchema
SOLR-921 - SolrResourceLoader must cache short name vs fully qualified name
SOLR-880 - SolrCore should have a STOP option and a lazy startup option
SOLR-1028 - Automatic core loading unloading for multicore
SOLR-943 - Make it possible to specify dataDir in solr.xml
SOLR-1306 - Support pluggable persistence/loading of solr.xml details
SOLR-1106 - Pluggable CoreAdminHandler (Action ) architecture that allows for custom handler access to CoreContainer / request-response
SOLR-1416 - Reduce contention in CoreContainer#getCore()
SOLR-1530 - Open IndexSearcher lazily
SOLR-1531 - Provide an option to remove the data directory on core unload
SOLR-1533 - Partition data directories into multiple "bucket" directories
Other features which may be needed for such a system include:
- Changes to SolrJ for new start/stop commands and better error codes/messages.
Configuration
The following configuration applies to the patch given in SOLR-1293.
<?xml version='1.0' encoding='UTF-8'?>
<solr persistent='true'>
<cores adminPath="/admin/cores"
maxCores="4"
adminHandler="org.apache.solr.handler.admin.LotsOfCoresAdminHandler"
shareSchema="true"
shareConfig="true"
baseDataDir="/opt/solr/data"
numBuckets="4"
commonInstanceDir="/opt/solr"
cleanOnUnload="true">
<core name="core0" instanceDir="/opt/solr" loadOnStart="false"/>
</cores>
</solr>
Common Properties
maxCores - Maximum number of cores to be loaded at any given point in time. If this limit is crossed, the least recently used core is stopped and the new one is started.
adminHandler - Value should be fixed as in the above example. The adminHandler is pluggable in Solr now.
shareSchema - Ensures that only one instance of IndexSchema is created in the Solr
shareConfig - Ensures that only one instance of SolrConfig is created in the Solr
baseDataDir - This is the place where the indexes are created. There is no need to pass the dataDir as an request parameter. Solr automatically assigns a data directory for that core in this base directory
numBuckets - This shows the number of buckets created in 'baseDataDir'. A core will be assigned into one of the buckets randomly. Keep it '0' or omit this attribute if buckets are not required
commonInstanceDir - This can be the default instanceDir for all the cores created. The 'instanceDir' parameter can be omitted while creating a core if this attribute has been specified in solr.xml
cleanOnUnload - Clean up (delete) the index when a core is unloaded.
With the above configuration, the only parameter required for creating a core is the core name.
Per-Core Properties
loadOnStart - (boolean true/false)Specifies whether the core should be started when Solr starts up. This parameter can be passed along while creating a core .
New CoreAdmin Commands
LotsOfCoresAdminHandler supports two new core admin commands:
- start - If a core is stopped it can be started using this command
- stop - if a core is running it can be stopped
Example: http://host:80/admin/cores?action=start
There is an update to the 'status' command. Adding a parameter 'verbose=false' will return a minimal status report of the cores. The default status command uses Luke on the core's index to get very detailed information which is expensive if the status is queried very frequently.
Further work
- Alias/Unalias commands are not fully tested with this patch. In particular, aliases are not persisted for cores.
- We highly recommend that the 'alias' feature in Solr not be used due to the high synchronization overhead it brings.
- Alternatively, we should work towards reducing the synchronization involved