Differences between revisions 8 and 9
Revision 8 as of 2013-05-05 20:49:19
Size: 8421
Editor: ShawnHeisey
Comment: fixed OOM link
Revision 9 as of 2013-05-07 15:01:18
Size: 8565
Editor: ShawnHeisey
Comment:
Deletions are marked like this. Additions are marked like this.
Line 45: Line 45:
The chart in [[http://docs.oracle.com/javase/1.5.0/docs/guide/management/jconsole.html#memory|this jconsole example]] shows a typical sawtooth pattern - memory usage climbs to a peak, then garbage collection frees up some memory. Garbage collections that happen more often than ten or so times per minute may be an indication that the heap size is too small. The chart in [[http://docs.oracle.com/javase/1.5.0/docs/guide/management/jconsole.html#memory|this jconsole example]] shows a typical sawtooth pattern - memory usage climbs to a peak, then garbage collection frees up some memory. Figuring out how many collections is too many will depend on your query/update volume. One possible rule of thumb: Look at the number of queries per second Solr is seeing. If the number of garbage collections per minute exceeds that value, your heap MIGHT be too small.

See also: SolrPerformanceFactors, SolrPerformanceData, BenchmarkingSolr

This page will attempt to answer questions like the following:

  • Why is Solr performance so bad?
  • Why does Solr take so long to start up?
  • Why is SolrCloud acting like my servers are failing when they are fine?

This is an attempt to give basic information only. For a full understanding of the issues involved, read the included links.

General information

A major driving factor for Solr performance is RAM. Solr requires sufficient memory for two separate things: One is the Java heap, the other is "free" memory for the OS disk cache. It is strongly recommended that Solr runs on a 64-bit platform with a 64-bit OS and a 64-bit Java.

SolrCloud can be very unstable if you have underlying performance issues. Increasing the zkClientTimeout can help, but you'll also want to address those performance issues.

RAM

OS Disk Cache

Solr relies on disk caching for good performance. See Uwe's blog entry for a lot of good Lucene/Solr specific information.

In a nutshell, you want to have enough memory available in the OS disk cache so that the important parts of your index, or ideally your entire index, will fit into the cache. Let's say that you have a Solr index size of 8GB. If your OS, Solr's Java heap, and all other running programs require 4GB of memory, then an ideal memory size for that server is at least 12GB. The exact minimum requirements are highly variable and depend on things like your schema, your index contents, and your queries.

Java Heap

The java heap is the memory that Solr requires in order to actually run. Certain things will require a lot of heap memory. In no particular order, these include:

  • A large index.
  • Frequent updates.
  • Super large documents.
  • Extensive use of faceting.
  • Very large Solr caches
  • A large RAMBufferSizeMB.
  • Use of Lucene's RAMDirectoryFactory.

How much heap space do I need?

The short version: This is one of those questions that has no generic answer. You want a heap that's large enough so that you don't have OutOfMemory (OOM) errors and problems with constant garbage collection, but small enough that you're not wasting memory or running into huge garbage collection pauses.

The long version: You'll have to experiment. Java comes with two GUI tools (jconsole and jvisulavm) that you can connect to the running instance of Solr and see how much heap gets used over time.

The chart in this jconsole example shows a typical sawtooth pattern - memory usage climbs to a peak, then garbage collection frees up some memory. Figuring out how many collections is too many will depend on your query/update volume. One possible rule of thumb: Look at the number of queries per second Solr is seeing. If the number of garbage collections per minute exceeds that value, your heap MIGHT be too small.

If you let your Solr server run with a high query and update load, the low points in the sawtooth pattern will represent the absolute minimum required memory. Try setting your max heap between 125% and 150% of this value, then repeat the monitoring to see if the low points in the sawtooth pattern are noticeably higher than they were before, or if the garbage collections are happening very frequently. If they are, repeat the test with a higher max heap.

GC pause problems

When you have a large heap (about 4GB or larger), garbage collection pauses can be a major problem. There are two main solutions. One is to use a commercial low-pause JVM like Zing, which does come with a price tag. The other is to tune the JVM you've already got. GC tuning is a precise art form, and what works for one person may not work for you. Important note: Anecdotal evidence suggests that the G1 collector available in recent Java 6 and all Java 7 versions is not a good fit for Solr as of Java 7u21. The CMS collector seems to be the right choice for Solr. Here are some ideas that hopefully you will find helpful:

Normal Solr operation creates a lot of short-lived objects, so having a young generation (eden) that's larger than the Java default is important. Making eden too large can be a problem as well - the old (tenured) generation is also important.

If your max heap is just a little bit too small, you may end up with a slightly different garbage collection problem. This problem is usually worse than the problems associated with a large heap: Every time Solr wants to allocate memory for operation, it has to do a full garbage collection in order to free up enough memory to complete the allocation.

SSD

If you put your index on solid state disks, performance will increase. Most of the time the performance increase will be enormous. There is one very small caveat: despite the incredible speed of SSD, RAM (the OS disk cache) is still significantly faster, and RAM still plays a big role in the performance of SSD-based systems.

Slow startup

The most common reason for this problem is the updateLog feature introduced in Solr4.0. This feature adds a transaction log for all updates. The transaction log is a good thing, and it is required for SolrCloud. This version also introduced the concept of a soft commit.

If you send a large number of document updates to your index without doing any commits or only doing soft commits, the transaction log will get very very large. When Solr starts up, the entire transaction log is replayed, to ensure that index updates are not lost. With large logs, this goes very slowly. This problem can also be caused by a large import using the DataImportHandler, which optionally does a hard commit at the end.

To fix the slow startup, you need to keep your transaction log size down. The only way to do this is by sending a hard commit, which closes the current transaction log and starts a new one. Solr only keeps a few of these logs, so by frequently creating new ones, the total transaction log size will be small. Replaying small transaction logs goes quickly.

Turning on autoCommit in your solrconfig.xml update handler definition is the solution:

<updateHandler class="solr.DirectUpdateHandler2">
  <autoCommit>
    <maxDocs>25000</maxDocs>
    <maxTime>300000</maxTime>
    <openSearcher>false</openSearcher>
  </autoCommit>
  <updateLog />
</updateHandler>

Important: One reason that people will send a large number of updates without doing any commits is that they don't want their deletes or updates to be visible until they are all completed. This requirement is maintained by the openSearcher=false setting in the above config. If you have this requirement, you will need to send an explicit hard or soft commit to make the changes visible.

You'll want to adjust the maxDocs and maxTime parameters in your autoCommit configuration to fit your requirements.

Slow commits

The major causes of slow commit times include:

  • Large autowarmCount values on Solr caches.
  • Extremely frequent commits.
  • Not enough RAM, discussed above.

If you have large autowarmCount values on your Solr caches, it can take a very long time to do that cache warming. The filterCache is particularly slow to warm. The solution is to reduce the autowarmCount, reduce the complexity of your queries, or both.

If you commit very frequently, you may send a new commit before the previous commit is finished. If you have cache warming enabled as just discussed, this is more of a problem. If you have a high maxWarmingSearchers in your solrconfig.xml, you can end up with a lot of new searchers warming at the same time, which is very I/O intensive, so the problem compounds itself.

SolrPerformanceProblems (last edited 2015-03-25 16:54:27 by ShawnHeisey)