This is a list of ideas for improving Solr.
All users should feel free to add new ideas to this page, or add links to other wiki pages containing more involved designs.
Users should also feel free to open "New Feature", "Improvement", or "Wish" issues in
Jira -- particularly if they already have code that makes progress towards the idea.
Many of the ideas on this page have been discussed on the
Solr mailing lists, you should search there for more information.
Simple Non-Invasive Tasks
This section is for ideas that are relatively straight forward or don't involve major changes to the Solr codebase. People who are eager to "give back" to the Solr community but don't have a lot of familiarity with the Solr code base may be interested in taking on these Tasks...
Build a development tool that can run against the lucene jar (and contrib jars) and generate (stub) Factories for any Tokenizers or TokenFilters found that don't already exist in the solr code base ... this can be used anytime we update the version of lucene used, and the stubs could then be manually tweaked if the classes need additional options in their constructors.
the bf param of dismax should be smarter then just splitting on whitespace (ie: don't split in the middle of parens)
make delete support a list of Ids. .. perhaps let is support a mix of multiple ids or queries?, ie... <delete><id>42</id><query>foo:bar</query><query>bax:yak</query><id>65</id><id>78</id></delete>
"Build Instructions" should probably be moved out of README.txt and into a BUILD.txt and expanded on.
Get someone smarter then hoss to review our jar/war MANIFEST file...
is it as valid as it can be?
should we include Class-Path info specifying the lib jars, or is it better to leave it out?
Create a Powered By Solr icon that people can include in their applications if they so choose.
all source code in "tgz" based releases should be processed by ant's <fixcrlf> in case someone prepares a release on a windows box
alternate ways of indexing (it currently requires an HTTP POST of an XML document
a DateTime field (or Query Parser extension) that allows flexible input for easier human entered queries
allow alternate format for date output to ease client creation of date objects?
[DONEish?] good multi-field querying support integrated with standard request handler, or as a separate handler
see: DisMaxRequestHandler
support for max disjunction and minNrShouldMatch in query parser (really a Lucene item)
UnitsFilter... 17" => 17 inch, etc
Admin query interface: add highlighting options, query writer options, facet options (see also
SOLR-67) Documentation
example schema.xml should showcase fields inheriting/overriding optiosn from fieldtypes, with comments explaining what's going on. SchemaXml should make it clear which "core" attributes of all FieldTypes can be overridden by individual fields
Add to the existing tutorial, or write new tutorials discussing out of hte box features...
highlighting
dismax handler
the various output formats
simple facet counts
result XML format - needed, or self-explanatory?
Java Docs
good overview.html
package.html for every package
class level documentation for every class
detailed method javadocs for every method in all of the "pluggable" classes and every method in a key class used when writing a request handler...
SolrCache ... in progress
SolrEventListener
UpdateHandler
FieldType ... in progress
SchemaField
SolrQueryParser Configuration in schema.xml
make more options configurable via schema.xml besides operator (
discussion) refactor option setting into a utility (possibly in IndexSchema) so people constructing a SolrQueryParser instance directly get the built in defaults. (
discussion)
Live demo server or application (perhaps host on apache lucene zone)
Mailing List Index?
Big Ideas for The Future
This section should be used for ideas that are more involved and may require major changes to the Solr codebase, and definitely should involve a lot of discussion among developers about the appropriate way to tackle them...
Robust and configurable field aliasing and globbing support: FieldAliasesAndGlobsInParams
Alternate replication strategy that can work on Windows?
NTFS w/ WinXP or later does support hard links for files (and cygwin "ln" works for files). The current replication scripts could be ported to cygwin.
Support for IndexPartitioning within a single solr webapp instance
A more powerful query language allowing one to express complicated logic without resorting to a custom Java query handler plugin.
Make use of
HiveMind or Spring for configuration & dependency injection [DONE] utilize Lucene's new field selector / lazy field loading mechanism to speed up requests that select only a few stored fields out of many. Beware interaction with the DocCache... it may need to be modified or bypassed. (see also
SOLR-52) [DONE] Implement flexible autoCommit/autoOptimize criteria (see
SOLR-65 for some discussion) Implement some ideas for ComplexFacetingBrainstorming
Implement some ideas to MakeSolrMoreSelfService
[DONE via defaults and DateMath] support for an an optional "timestamp" style field in schemas which always want every doc to include the datetime the document was added to the index. This might be a special case (like the uniqueKey field) or it could involve more general "default" support for fields and fieldtypes (ie: <field> and <fieldtype> declarations could include a default="..." attribute that gets put into any document that doesn't already have a value for that field, with the underlying FieldType parsing the text each time it's used, so that the DateField class can convert default="now" to the current time.
support for composite keys ... either with some explicit change to the <uniqueKey> declaration or perhaps just copyField with some hidden magic that concats the resulting terms into a single key Term
investigate this "
SynonymQuery" and how SynonymFilter might be modified to set token types to trigger it's use automatically in SolrQueryParser A "user query" parameter for standard request handler, much like what dismax handler has, for unstructured queries entered from a search box
[DONEish?] ... the QParser abstraction and the components framework have basically make dismax and standard the same now
refactor and separate update XML parsing from update handling... possibly implement support for JSON updates.
[DONEish?] the refactoring is done, but still no JSON update support
[IN PROGRESS] refactor all of the JSP pages into request handler so a JDK/JSP compiler isn't needed (the current JSPs are very sparse on presentation, and use no custom tags, so there is almost no advantage to them being JSPs)
Better handling of arbitrary XML charsets: see
SOLR-96 Better support for tagging: UserTagDesign