Contents
Overview
Solr, currently, is not very suitable for a large number of homogeneous cores where you require fast/frequent loading/unloading of cores. Usually a core is required to be loaded just to fire a search query or to just index one document.
The requirements of such a system are:
Very efficient loading of cores - Solr cannot afford to read and parse and create Schema, SolrConfig objects for each core every time the core has to be loaded
- Lazy load cores - Provide a way to START/STOP core.
- Automatic loading of cores - Start a core automatically if a request comes in for a "stopped" core.
- LRU Core Loading/Unloading - As there are a large number of cores, all the cores cannot be kept loaded always. There has to be an upper limit beyond which we need to unload a few cores.
Automatic allotment of dataDir for cores - If the number of cores is too high, all the cores' dataDirs cannot live in the same directory. There is an upper limit on the number of directories you can create in a directory w/o affecting performance. Erick Erickson claims that this is taken care of by file walking, the resulting tree structure can be as deep as required to limit the number of files in any particular directory, so all cores live under <solr_home>.
Issues
SOLR-1293 - Support for large number of cores and faster loading/unloading of cores. This issue has many child issues focusing on individual changes:
- Open issues
SOLR-919 - Cache and reuse SolrConfig
SOLR-4478 - Specify configuration sets
- Fixed/Closed issues
SOLR-880 - SolrCore should have a STOP option and a lazy startup option (part of SOLR-1028)
SOLR-920 - Cache and reuse IndexSchema
SOLR-921 - SolrResourceLoader must cache short name vs fully qualified name
SOLR-943 - Make it possible to specify dataDir in solr.xml
SOLR-1028 - Automatic core loading unloading for multicore
SOLR-1106 - Pluggable CoreAdminHandler (Action ) architecture that allows for custom handler access to CoreContainer / request-response
SOLR-1306 - Support pluggable persistence/loading of solr.xml details. Wound up discovering cores instead
SOLR-1416 - Reduce contention in CoreContainer#getCore(). Closed, haven't seen evidence we need to do this.
SOLR-1530 - Open IndexSearcher lazily
SOLR-1531 - Provide an option to remove the data directory on core unload. Already done in [[https://issues.apache.org/jira/browse/SOLR-2610|SOLR-2610]
SOLR-1533 - Partition data directories into multiple "bucket" directories. Will be handled by SOLR-1306.
SOLR-3980 - list not loaded (lazily loaded) cores for clients. See SOLR-4196
SOLR-4083 - Move to a directory-based configuration. Part of other JIRAs
SOLR-4196 - While it looks unrelated, "untangling solr.xml" turns out to contain, very probably, a huge amount of the actual work.
SOLR-4401 - Add stress test to JUnit tests
- Open issues
Other features which may be needed for such a system include:
- Changes to SolrJ for new start/stop commands and better error codes/messages.
Configuration
As I'm digging into this, things are changing. What follows is fluid, it may change as this progresses.
There are two new attributes of a <core> tag (defaults in bold) and one new attribute for <cores>
<cores> has a two new attributes:
- transientCacheSize=[NNN]. If this limit is crossed, old cores marked 'transient="true"' are removed to make room on an LRU basis.
If this is absent, the default is Integer.MAX_VALUE, an unbounded cache. Only cores with "transient=true" are put in this cache, so specifying this attribute without having any cores marked as "transient" has no effect, just wastes a LinkedHashMap<String, SolrCore> of the specified size which will never be used.
Having this size be less than the number of cores marked 'transient="true"' AND 'loadOnStartup="true"' should work, but it's wasteful since a bunch of cores will be loaded on startup then immediately unloaded after the cache fills up.
- NOTE: when solr.xml is read, the information for all transient cores is put in a separate list. So having more transient cores than the size of the cache will be handled correctly. The "list of cores" is unbounded.
- transientCacheSize=[NNN]. If this limit is crossed, old cores marked 'transient="true"' are removed to make room on an LRU basis.
<core> has two new attributes:
loadOnStartup=["true"|"false"]. Whether the core should be completely loaded upon startup.
transient=["true"|"false"]. Whether the core is allowed to be swapped out or not.
So the idea is that there's really no reason to tie in "lazy loading" with whether the core can be swapped out or not, so by splitting up the two options we give the user control over how these are handled. Use cases below:
- loadOnStartup=true transient=false: Current case. Spend all the time necessary to fully load the cores on startup.
- loadOnStartup=true transient=true: There are some cores you want loaded when the server first starts up, but that you'll allow to be swapped out. It's wasteful to specify more cores like this than your transientCacheSize value.
- loadOnStartup=false transient=false: Probably the least useful combination, but it naturally falls out of the code. You'd specify this combination if, for some reason, starting Solr up quickly was more important than the inconvenience of having to wait randomly for cores to be loaded when a request was made.
- loadOnStartup=false transient=true: This is really the use-case. There are a large number of cores in your system that are short-duration use. You want Solr to load them as necessary, but unload them when the cache gets full on an LRU basis.
The following configuration applies to the patch given in SOLR-1293.
<?xml version='1.0' encoding='UTF-8'?>
<solr persistent='true'>
<cores adminPath="/admin/cores"
transientCacheSize="4"
adminHandler="org.apache.solr.handler.admin.LotsOfCoresAdminHandler"
shareSchema="true"
shareConfig="true">
<core name="core0" instanceDir="/opt/solr" loadOnStartup="false" transient="true"/>
</cores>
</solr>
Persistence
This is a sticky wicket. As currently written, the Solr.xml file has a global 'persist="true|false"' option. The base problem is maintenance.
From the original page, under discussion
START/STOP commands. Actually, it doesn't seem like there's anything that could be done with these that isn't accomplished by CREATE/UNLOAD. Perhaps alias START->CREATE and STOP->UNLOAD with suitable defaults? I.e. STOP would never delete the index.
shareSchema - Ensures that only one instance of IndexSchema is created in the Solr. Given the recent additions that allow one to specify a schema file on a per-core basis, does this make sense any more?
shareConfig - Ensures that only one instance of SolrConfig is created in the Solr. Given the recent additions that allow one to specify a config file on a per-core basis, does this make sense any more?
cleanOnUnload - Clean up (delete) the index when a core is unloaded. Not implemented yet, for my particular use-case it probably won't be. I can see the utility though. Doesn't seem very hard code-wise.
Hmmm, haven't thought about the various status commands very deeply. There is an update to the 'status' command. Adding a parameter 'verbose=false' will return a minimal status report of the cores. The default status command uses Luke on the core's index to get very detailed information which is expensive if the status is queried very frequently.
Further work
- Alias/Unalias commands are not fully tested with this patch. In particular, aliases are not persisted for cores.
- We highly recommend that the 'alias' feature in Solr not be used due to the high synchronization overhead it brings.
- Alternatively, we should work towards reducing the synchronization involved
= status = As I mentioned, this is still very fluid. Please feel free to make comments, either on the dev list, via the JIRAS above etc.