<!> Solr1.4

This document describes the Java implementation of index replication that works over HTTP and was introduced in Solr1.4. For information on the ssh/rsync based replication available since Solr1.1 please consult CollectionDistribution.

Features

This feature has been committed to trunk. It is available in the nightly builds. SOLR-561 tracks the development of this feature.

Configuration

The new Java-based replication feature is implemented as a RequestHandler. Configuring replication is therefore similar to any normal RequestHandler.

Master

<requestHandler name="/replication" class="solr.ReplicationHandler" >
    <lst name="master">
        <!--Replicate on 'optimize'. Other values can be 'commit', 'startup'. It is possible to have multiple entries of this config string-->
        <str name="replicateAfter">optimize</str>

        <!--Create a backup after 'optimize'. Other values can be 'commit', 'startup'. It is possible to have multiple entries of this config string.  Note that this is just for backup, replication does not require this. -->
        <!-- <str name="backupAfter">optimize</str> -->

        <!--If configuration files need to be replicated give the names here, separated by comma -->
        <str name="confFiles">schema.xml,stopwords.txt,elevate.xml</str>
       <!--The default value of reservation is 10 secs.See the documentation below . Normally , you should not need to specify this -->
        <str name="commitReserveDuration">00:00:10</str>
    </lst>
</requestHandler>

Note:

Replicating solrconfig.xml

In the configuration file on the master server, include a line like the following:

<str name="confFiles">solrconfig_slave.xml:solrconfig.xml,x.xml,y.xml</str>

This ensures that the local configuration 'solrconfig_slave.xml' will be saved as 'solrconfig.xml' on the slave. All other files will be saved with their original names.

On the master server, the file name of the slave configuration file can be anything, as long as the name is correctly identified in the "confFiles" string; then it will be saved as whatever file name appears after the colon ':'.

Slave

<requestHandler name="/replication" class="solr.ReplicationHandler" >
    <lst name="slave">

        <!--fully qualified url for the replication handler of master . It is possible to pass on this as a request param for the fetchindex command-->
        <str name="masterUrl">http://localhost:port/solr/corename/replication</str>  

        <!--Interval in which the slave should poll master .Format is HH:mm:ss . If this is absent slave does not poll automatically. 
         But a fetchindex can be triggered from the admin or the http API -->
        <str name="pollInterval">00:00:20</str>  
        <!-- THE FOLLOWING PARAMETERS ARE USUALLY NOT REQUIRED-->
        <!--to use compression while transferring the index files. The possible values are internal|external
         if the value is 'external' make sure that your master Solr has the settings to honour the accept-encoding header.
         see here for details http://wiki.apache.org/solr/SolrHttpCompression
         If it is 'internal' everything will be taken care of automatically. 
         USE THIS ONLY IF YOUR BANDWIDTH IS LOW . THIS CAN ACTUALLY SLOWDOWN REPLICATION IN A LAN-->
        <str name="compression">internal</str>
        <!--The following values are used when the slave connects to the master to download the index files. 
         Default values implicitly set as 5000ms and 10000ms respectively. The user DOES NOT need to specify 
         these unless the bandwidth is extremely low or if there is an extremely high latency-->
        <str name="httpConnTimeout">5000</str>
        <str name="httpReadTimeout">10000</str>

        <!-- If HTTP Basic authentication is enabled on the master, then the slave can be configured with the following -->
        <str name="httpBasicAuthUser">username</str>
        <str name="httpBasicAuthPassword">password</str>

     </lst>
</requestHandler>

Note: If you are not using cores, then you simply omit the "corename" parameter above in the masterUrl. To ensure that the url is correct, just hit the url with a browser. You must get a status OK response.

Setting up a Repeater

A master may be able to serve only so many slaves without affecting performance. Some organizations have deployed slave servers across multiple data centers. If each slave downloads the index from a remote data center, the resulting download may consume too much network bandwidth. To avoid performance degradation in cases like this, you can configure one or more slaves as repeaters. A repeater is simply a node that acts as both a master and a slave.

Example configuration of a repeater:

<requestHandler name="/replication" class="solr.ReplicationHandler">
    <lst name="master">
      <str name="replicateAfter">commit</str>
      <str name="confFiles">schema.xml,stopwords.txt,synonyms.txt</str>
    </lst>
    <lst name="slave">
      <str name="masterUrl">http://master.solr.company.com:8080/solr/replication</str>
      <str name="pollInterval">00:00:60</str>
    </lst>
  </requestHandler>

Replication Dashboard

This shows the following information

The following actions can be performed from the dashboard

How does it work ?

This feature relies on the IndexDeletionPolicy feature of Lucene. Through this API, Lucene exposes IndexCommits as callbacks for each commit/optimize .An IndexCommit exposes the files associated with each commit. This enables us to identify the files that need to be replicated .

True to the tradition of Solr, all operations are performed over a REST API. The ReplicationHandler exposes a REST API for all the operations it support.

What happens when I commit or optimize?

When a commit/optimize is done on master , ReplicationHandler reads the list of file names which are associated with each commit point.This relies on the 'replicateAfter' parameter in the configuration to decide when these file names are to be fetched and stored from Lucene.

How does the slave replicate ?

The master is totally unaware of the slaves. The slave continuously keeps polling the master (depending on the 'pollInterval' parameter) to check the current index version the master. If the slave finds out that the master has a newer version of the index it initiates a replication process. The steps are as follows,

How are configuration files replicated ?

What if I add documents to the slave or if slave index gets corrupted ?

If docs are added to the slave , then the slave is not in sync with the master anymore. But , it does not do anything to keep it in sync with master till the master has a newer index. When a commit happens on the master then the index version of the master will become different from that of the slave. The slave fetches the list of files and finds that some of the files (same name) are there in the local index with a different size/timestamp. This means that the master and slave have incompatible indexes. Slave then copies all the files from master (there may be scope to optimize this, but this is a rare case and may not be worth it) to a new index dir and and asks the core to load the fresh index from the new directory.

HTTP API

These commands can be invoked over HTTP to the ReplicationHandler

enable/disable master/slave in a node

If a server needs to be turned into a master from a slave or if you wish to use the same solrconfig.xml for both master and slave, do as follows,

<requestHandler name="/replication" class="solr.ReplicationHandler" >
  <lst name="master">
    <str name="enable">${enable.master:false}</str> 
    <str name="replicateAfter">commit</str>
    <str name="confFiles">schema.xml,stopwords.txt</str>
 </lst>
 <lst name="slave">
    <str name="enable">${enable.slave:false}</str> 
   <str name="masterUrl">http://localhost:8983/solr/replication</str>
   <str name="pollInterval">00:00:60</str>
 </lst>
</requestHandler>

When the master is started, pass in -Denable.master=true and in the slave pass in -Denable.slave=true. Alternately , these values can be stored in a solrcore.properties file as follows

#solrcore.properties in master
enable.master=true
enable.slave=false

and in slave

#solrcore.properties in slave
enable.master=false
enable.slave=true

Admin Page for Replication

replication.png

Performance numbers

The Chart

transfer_time.png

Data

data.PNG

SolrReplication (last edited 2009-09-28 21:03:36 by YonikSeeley)