This is the Official Solr FAQ. There is also a SolrTerminology document that may be useful for understanding what some documentation means; as well as a Relevancy FAQ for addressing questions specific to how Relevancy Scoring works in Solr.
- General
-
Using
- Do my applications have to be written in Java to use Solr?
- What are the Requirements for running a Solr server?
- How can I get started playing with Solr?
- Solr Comes with Jetty, is Jetty the recommended Servlet Container to use when running Solr?
- How do I change the logging levels/files/format ?
- I POSTed some documents, why don't they show up when I search?
- How can I delete all documents from my index?
- How can I rebuild my index from scratch if I change my schema?
- How can I update a specific field of an existing document?
- How do I use copyField with wildcards?
- Why does the request time out sometimes when doing commits?
- Why don't International Characters Work?
- Solr started, and i can POST documents to it, but the admin screen doesn't work
- The Solr admin pages suddenly stop working and give a 404 error
-
Searching
- How to make the search use AND semantics by default rather than OR?
- How do I add full-text summaries to my search results?
- I have set `hl=true` but no summaries are being output
- I want to add basic category counts to my search results
- How can I figure out why my documents are being ranked the way they are?
- Why Isn't Sorting Working on my Text Fields?
- My search returns too many / too little / unexpected results, how to debug?
- How can I get ALL the matching documents back? ... How can I return an unlimited number of rows?
- Performance
-
Developing
- Where can I find the latest and Greatest Code?
- Where can I get the javadocs for the classes?
- How can I help?
- How can I submit bug reports, bug fixes or new features?
- How do I apply patches from JIRA issues?
- I can't compile Solr, ant says "JUnit not found" or "Could not create task or type of type: junit"
- How can I start the example application in Debug mode?
- Tagging using SOLR
General
What is Solr?
Solr is a stand alone enterprise search server which applications communicate with using XML and HTTP to index documents, or execute searches. Solr supports a rich schema specification that allows for a wide range of flexibility in dealing with different document fields, and has an extensive search plugin API for developing custom search behavior.
For more information please read this
overview of Solr features.
Are there Mailing lists for Solr?
Yes there are several
Solr email lists.
How do you pronounce Solr?
It's pronounced the same as you would pronounce "Solar".
What does Solr stand for?
Arguably, it stands for "Searching On Lucene w/Replication" -- but it should not be considered an acronym.
Where did Solr come from?
"Solar" (with an A) was initially developed by
CNET Networks as an in-house search platform beginning in late fall 2004. By summer 2005, CNET's product catalog was
powered by Solar, and several other CNET applications soon followed. In January 2006 CNET
Granted the existing code base to the ASF to become the "Solr" project. On January 17, 2007 Solr
graduated from the Apache Incubator to become a Lucene subproject.
Is Solr Stable? Is it "Production Quality?"
Solr is currently being used to power search applications on several high traffic publicly accessible websites.
Using
Do my applications have to be written in Java to use Solr?
No.
Solr itself is a Java Application, but all interaction with Solr is done by POSTing XML messages over HTTP (to index documents) and GETing search results back as XML, or a variety of other formats (JSON, Python, Ruby, etc...)
What are the Requirements for running a Solr server?
Solr requires Java 1.5 and an Application server (such as Tomcat) which supports the Servlet 2.4 standard.
The Replication features of Solr currently require an OS with the ability to create hard links and rsync.
How can I get started playing with Solr?
There is an
online tutorial as well as a
demonstration configuration in SVN.
Solr Comes with Jetty, is Jetty the recommended Servlet Container to use when running Solr?
The Solr example app has Jetty in it just because at the time we set it up, Jetty was the simplest/smallest servlet container we found that could be run easily in a cross platform way (ie: "java -jar start.jar"). That does not imply that Solr runs better under Jetty, or that Jetty is only good enough for demos -- it's just that Jetty made our demo setup easier.
Users should decide for themselves which Servlet Container they consider the easiest/best for their use cases based on their needs/experience. For high traffic scenarios, investing time for tuning the servlet container can often make a big difference.
How do I change the logging levels/files/format ?
Solr uses JDK standard logging, (ie; the java.util.logging.* package), an overview of how that can be configured at the JVM level can be found here...
http://java.sun.com/j2se/1.5.0/docs/guide/logging/overview.html
...but many servlet containers provide alternate log configuration options in their configuration files. You should consult your servlet containers documentation to see what options are available.
The Solr Admin console has a
screen for toggling the logging level globally. This is a transient setting good for doing diagnostic work, but does not persist after reboot.
If you don't know much about JDK containers or servlet containers and want a quick recipe for modifying the logging settings for default Solr example/ setup, see LoggingInDefaultJettySetup.
I POSTed some documents, why don't they show up when I search?
Documents that have been added to the index don't show up in search results until a <commit/> message has been POSTed as well. This allows you to POST many documents in succession and know that none of them will be visible to search clients until you have finished. (Note however, that in some circumstances users may see your changes even though you haven't POSTed a <commit/>. For instance, when Solr is restarted, some or even all of the docs added since the last <commit/> may now show up in search results.)
How can I delete all documents from my index?
Use the "match all docs" query in a delete by query command: <delete><query>*:*</query></delete>
This has been optimized to be more efficient then deleting by some arbitrary query which matches all docs because of the nature of the data.
How can I rebuild my index from scratch if I change my schema?
The most efficient/complete way is to...
Stop your application server
Change your schema.xml file
Delete the index directory in your data directory
Start your application server (Solr will detect that there is no existing index and make a new one)
Re-Index your data
If the permission scheme of your server does not allow you to manually delete the index directory an alternate technique is...
Stop your application server
Change your schema.xml file
Start your application server
Use the "match all docs" query in a delete by query command: <delete><query>*:*</query></delete>
Send an <optimize/> command.
Re-Index your data
It's very important to send the "optimize" command before re-indexing. Even though you've deleted all the documents, some low level Lucene metadata about the fields those documents had will still be there, and will influence how future documents with the same field names are indexed
How can I update a specific field of an existing document?
I want update a specific field in a document, is that possible? I only need to index one field for a specific document. Do I have to index all the document for this?
No, just the one document. Let's say you have a CMS and you edit one document. You will need to re-index this document only by using the the add solr statement for the whole document (not one field only).
In Lucene to update a document the operation is really a delete followed by an add. You will need to add the complete document as there is no such "update only a field" semantics in Lucene.
How do I use copyField with wildcards?
The <copyField> directive allows wildcards in the source, so that several fields can be copied into one destination field without having to specify them all individually. A common use case is something like:
requires that the source and dest fields both be full field names -- not wildcard expressions. so if you have the following dynamic fields defined...
<copyField source="*_t" dest="text" />
This tells Solr to copy the contents of any field that ends in "_t" to the "text" field. This is particularly useful when you have a large, and possibly changing, set of fields you want to index into a single field. With the example above, you could start indexing fields like "description_t", "editorial_review_t", and so on, and all their content would be indexed in the "text" field.
Note that you can use the wildcard copyField syntax with or without similar dynamicField declarations. Thus you could choose to index the "description_t", "editorial_review_t" fields individually with a dynamicField like
<dynamicField name="*_t" type="text" indexed="true" stored="false" />
but you don't have to if you don't want to. You could even mix and match across different dynamic fields by doing something like
<dynamicField name="*_i_t" type="text" indexed="true" stored="false" /> <copyField source="*_t" dest="text" />
Now, as you add fields, you can give them names ending in "_i_t" if you want them indexed seperately, and stored in the main "text" field, and "_t" without the "_i" if you just want them indexed in "text" but not individually.
Why does the request time out sometimes when doing commits?
Internally, Solr does nothing to time out any requests -- it lets both updates and queries take however long they need to take to be processed fully. However, the servlet container being used to run Solr may impose arbitrary timeout limits on all requests. Please consult the documentation for youy Serlvet container if you find that this value is too low.
(In Jetty, the relevant setting is "maxIdleTime" which is in milliseconds)
Why don't International Characters Work?
Solr can index any characters expressed in the UTF-8 charset (see
SOLR-96). There are no known bugs with Solr's character handling, but there have been some reported issues with the way different application servers (and different versions of the same application server) treat incoming and outgoing multibyte characters. In particular, people have reported better success with Tomcat than with Jetty...
"
International Charsets in embedded XML" (Jetty 5.1) "
Problem with surrogate characters in utf-8" (Jetty 6)
If you notice a problem with multibyte characters, the first step to ensure that it is not a true Solr bug would be to write a unit test that bypasses the application server directly using the
AbstractSolrTestCase.
The most important points are:
The document has to be indexed as UTF-8 encoded on the solr server. For example, if you send an ISO encoded document, then the special ISO characters get a byte added (screwing up the final encoding, only reindexing with UTF-8 can fix this).
The client needs UTF-8 URL encoding when forwarding the search request to the solr server.
If you just forward doing:
1 String value = request.getParameter("q");
to get the query string, it can be that q got encoded in ISO and then solr will not return a search result.
One possible solution is:
1 String encoding = request.getCharacterEncoding();
2 if (null == encoding) {
3 // Set your default encoding here
4 request.setCharacterEncoding("UTF-8");
5 } else {
6 request.setCharacterEncoding(encoding);
7 }
8 ...
9 String value = request.getParameter("q");
Another possibility is to use java.net.URLDecoder/URLEncoder to transform all parameter value to UTF-8.
Solr started, and i can POST documents to it, but the admin screen doesn't work
The admin screens are implemented using JSPs which require a JDK (instead of just a JRE) to be compiled on the fly. If you encounter errors trying to load the admin pages, and the stack traces of these errors seem to relate to compilation of JSPs, make sure you have a JDK installed, and make sure it is the instance of java being used.
NOTE: Some Servlet Containers (like Tomcat5.5 and Jetty6) don't require a JDK for JSPs.
The Solr admin pages suddenly stop working and give a 404 error
See
SOLR-118, this happens when using the default Jetty config provided by Solr, and having Jetty's work files in /tmp purged by the operating system.
Restarting Solr after creating a $(jetty.home)/work directory for Jetty's work files should solve the problem.
This might also be caused by starting two Solr instances on the same port and killing one, see
Hoss's comment in SOLR-118.
Searching
How to make the search use AND semantics by default rather than OR?
In schema.xml:
<solrQueryParser defaultOperator="AND"/>
How do I add full-text summaries to my search results?
Basic highlighting/summarization can be added adding hl=true to the query parameters. More advanced highlighting is described in HighlightingParameters.
I have set `hl=true` but no summaries are being output
For a field to be summarizable it must be both stored and indexed. Note that this can significantly increase the index size for large fields (e.g. the main content field of a document). Consider storing the field using compression (compressed=true in the schema.xml fieldType definition).
I want to add basic category counts to my search results
Solr provides support for "facets" out-of-the-box. See SimpleFacetParameters.
How can I figure out why my documents are being ranked the way they are?
Solr's uses
Lucene for ranking. A detailed summary of the ranking calculation can be obtained by adding `debugQuery=true` to the query parameter list. The output takes some getting used to if you are not familiar with Lucene's ranking model.
The SolrRelevancyFAQ has more information on understanding why documents rank the way they do.
Why Isn't Sorting Working on my Text Fields?
Lucene Sorting requires that the field you want to sort on be indexed, but it cannot contain more than one "token" per document. Most Analyzers used on Text fields result in more than one token, so the simplest thing to do is to use copyField to index a second version of your field using the StrField class.
If you need to do some processing on the field value using TokenFilters, you can also use the KeywordTokenizer, see the Solr example schema for more information.
My search returns too many / too little / unexpected results, how to debug?
The best way to debug such problems is with the analyzer admin tool, which is at
http://localhost:8983/solr/admin/analysis.jsp if using the default configuration.
That page will show you how your field is processed while indexing and while querying, and if a particular query matches.
See also the Solr tutorial and the xml.com article about Solr, listed in the SolrResources.
How can I get ALL the matching documents back? ... How can I return an unlimited number of rows?
This is impractical in most cases. People typically only want to do this when they know they are dealing with an index whose size guarantees the result sets will be always be small enough that they can feasibly be transmitted in a manageable amount -- but if that's the case just specify what you consider a "manageable amount" as your rows param and get the best of both worlds (all the results when your assumption is right, and a sanity cap on the result size if it turns out your assumptions are wrong)
Performance
How fast is indexing?
Indexing performance varies considerably depending on the size of the documents, the analysis requirements, and cpu and io performance of the machine. Rates between 10 and 150 docs/s have been reported.
How can indexing be accelerated?
A few ideas:
Include multiple documents in a single <add> operations. Note: there is no advantage in trying to post a huge number of docs in a single go. I'd suggest going no further than 10 (full-size docs) to 100 (tiny docs).
Ensure you are not performing <commit/> until you need to see the updated index.
If you are reindexing every document in your index, completely removing the index first can substantially speed up the required time and disk space.
Solr can do some, but not all, parts of indexing in parallel. Indexing on multiple threads can be a boon, particularly if you have multiple cpus and your analysis requirements are considerable.
Experiment with different mergeFactor and maxBufferedDocs settings (see
http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html).
How can I speed up facet counts?
Performance problems can arise when faceting on fields/queries with many unique values. If you are faceting on a tokenized field, consider making it untokenized (field class solr.StrField, or using solr.KeywordTokenizerFactory).
Also, keep in mind that Solr must construct a filter for every unique value on which you request faceting. This only has to be done once, and the results are stored in the filterCache. If you are experiencing slow faceting, check the cache statistics for the filterCache in the Solr admin. If there is a large number of cache misses and evictions, try increasing the capacity of the filterCache.
What does "PERFORMANCE WARNING: Overlapping onDeckSearchers=X" mean in my logs?
This warning means that at least one searcher hadn't yet finished warming in the background, when a commit was issued and another searcher started warming. This can not only eat up a lot of ram (as multiple on deck searches warm caches simultaneously) but it can can create a feedback cycle, since the more searchers warming in parallel means each searcher might take longer to warm.
Typically the way to avoid this error is to either reduce the frequency of commits, or reduce the amount of warming a searcher does while it's on deck (by reducing the work in newSearcher listeners, and/or reducing the autowarmCount on your caches)
See also the <maxWarmingSearchers/> option in SolrConfigXml.
Developing
Where can I find the latest and Greatest Code?
In the
Solr Version Control Repository.
Where can I get the javadocs for the classes?
There are currently
nightly Solr javadocs
How can I help?
Joining and participating in discussion on the
developers email list is the best way to get your feet wet with Solr development.
There is also a TaskList containing all of the ideas people have had about ways to improve Solr. Feel free to add your own ideas to this page, or investigate possible implementations of existing ideas. When you are ready, submit a patch with your changes.
How can I submit bug reports, bug fixes or new features?
Bug reports, and patch submissions should be entered in
Solr's Bug Tracking Queue.
How do I apply patches from JIRA issues?
Information about testing patches can be found on the How To Contribute wiki page
I can't compile Solr, ant says "JUnit not found" or "Could not create task or type of type: junit"
As of September 21, 2007, JUnit's JAR is now included in Solr's source repository, so there is no need to install it separately to run Solr's unit tests. If ant generates a warning that it doesn't understand the junit task, check that you have an "ant-junit.jar" in your ANT_LIB directory (it should be included when you install apache-ant).
If you are attempting to compile the Solr source tree from prior to September 21, 2007 (including Solr1.2) you will need to include the junit.jar in your ant classpath. Please see the
Ant documentation of JUnit for notes about where Ant expects to find the JUnit JAR and Ant task JARs.
How can I start the example application in Debug mode?
You can start the example application in debug mode to debug your java class with your favorite IDE (like eclipse).
java -Xdebug -Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=n -jar start.jar
Then connect to port 8000 and debug.
Tagging using SOLR
There is a wiki page on some brainstorming on how to implement tagging within Solr [UserTagDesign].