This document describes the ssh/rsync based replication available since Solr1.1 This mechanism only works on systems that support removing open hard links.

/!\ These Scripts were superseded by the ReplicationHandler Java implementation of index replication that works over HTTP and was introduced in Solr1.4, and are no longer actively maintained. /!\

All Solr collection distribution scripts and configuration files are bundled within a Solr release.

All scripts reside in the directory solr/bin of each instance of Solr. Configuration files reside in the directory solr/conf of each instance of Solr.

Snapshots directories are named snapshot.yyyymmddHHMMSS. In additional to installing a snapshot, snapinstaller will also trigger Solr to open a new Searcher after it has put the latest snapshot into place.

A rsyncd daemon needs to be first enabled and then started on the master Solr server to handle collection distribution requests from the slaves. The file solr/logs/rsyncd-enabled determines if the rsyncd daemon is enabled or disabled. The scripts rsyncd-enable and rsyncd-disable should be used to enable and disable the rsyncd daemon. The script rsyncd-start and rsyncd-stop should be used to start and stop the rsyncd daemon. Configuration of the rsyncd daemon is controlled by the file solr/conf/rsyncd.conf which is generated by rsyncd-start at startup time and removed by rsyncd-stop at shutdown time. The process id of the rsyncd daemon is written into the file solr/logs/rsyncd.pid. Output of the rsyncd daemon is written into the file solr/logs/rsyncd.log. You may also want to install an init script on your system which starts the rsyncd daemon at system boot time if the rsyncd daemon has been enabled. Otherwise, you will have to manually start the rsyncd daemon after each system reboot.

On the slaves all the files in the index directory are hard links to the latest snapshot. This way we can keep multiple snapshots on each slave without the need to keep multiple copies of index files that have not changed. In addition, under this design the index directory will look somewhat similar on the master and slaves. On the master, all the files in the index directory will be hard links to the lastest snapshot before any updates come in.

The file solr/logs/snappuller-enabled determines if the snappuller is enabled or disabled. The scripts snappuller-enable and snappuller-disable should be used to enable and disable the snappuller.

The optional configuration file solr/conf/scripts.conf is sourced by all scripts to set the following environment variables:

variable

description

required?

default

user

user id under which the scripts will run. If set and the invoking user id is different, the scripts will use sudo to execute itself under the specified user id. Scripts must run under an id with write permission to the data directory in order to create and install snapshots. In addition, scripts also need write permission to solr/conf and solr/logs.

optional

solr_hostname

hostname used by scripts to connect to Solr server

optional

localhost

solr_port

port number used by scripts to connect to Solr server

required

rsyncd_port

rsyncd port number used for distribution

optional

10000+<Solr port number>

data_dir

directory containing Lucene index used by local Solr server

optional

solr/data

webapp_name

webapp name used by scripts to connect to Solr server

optional

solr

master_host

Solr master server hostname

required for distribution

master_data_dir

directory containing Lucene index on the master Solr server

required for distribution

master_status_dir

directory containing distribution related status on the master Solr server

required for distribution

Required variables must be set either in solr/conf/scripts.conf or by command line arguments. The values set by command line arguments will override default values and values set in solr/conf/scripts.conf.

snapshooter

snapshooter takes a snapshot of the current Lucene index by using hard links. Snapshots directories are named snapshot.yyyymmddHHMMSS where yyymmddHHMMSS is the timestamp of when the snapshot was taken. All relative paths specified with the -d option start at the Solr root directory solr.

rsyncd-enable

rsyncd-enable enables the starting of the rsyncd daemon by creating the file solr/logs/rsyncd-enabled. Please note that this script will not actually starts the rsyncd daemon.

rsyncd-disable

rsyncd-disable disables the starting of the rsyncd daemon by removing the file solr/logs/rsyncd-enabled. Please note that this script will not actually stop the rsyncd daemon if it is already running.

rsyncd-start

Starts the rsyncd daemon on the master Solr server. The rsyncd daemon sets its port number to be the port number of the Solr server incremented by 10000 if the environment variable rsyncd_port is not set in solr/conf/scripts/conf and the -p command line argument is not specified. For example, if the Solr server runs at port 7000, then its rsyncd daemon runs at port 17000. The start script is synchronous. After starting the rsyncd daemon, it will attempt to connect to it for up to 15 seconds. The start script will exit with error code 2 if it fails to connect to the rsyncd daemon.

Before starting the rsyncd daemon, this script generates the configuration file solr/conf/rsyncd.conf which is in turn removed by rsyncd-stop at shutdown time. The rsyncd.conf specifies a module that exports a directory tree via the daemon, which limits rsync clients to the solr.home/data directory. A side effect is that rsync clients connect to a directory named solr, which is a virtual path. See the rsyncd.conf man page for more information on the format of this file.

The process ID of the rsyncd daemon is written into the file solr/logs/rsyncd.pid. Output of the rsyncd daemon is written into the file solr/logs/rsyncd.log. All relative paths specified with the -d option start at the Solr root directory solr.

rsyncd-stop

Stops the rsyncd daemon on the master Solr server. The stop script is synchronous. After stopping the rsyncd daemon, it makes sure that the daemon has exited by trying to connect to it for up to 300 seconds. The stop script will exit with error code 2 if it fails to stop the rsyncd daemon. After the rsyncd daemon has been stopped, it removes the configuration file solr/conf/rsyncd.conf which is generated by rsyncd-start at startup time.

snappuller-enable

snappuller_enable enables the snappuller by creating the file solr/logs/snappuller-enabled.

snappuller-disable

snappuller-disable disables the snappuller by removing the file solr/logs/snappuller-enabled.

snappuller

If snappuller has been disabled, it will log an appropriate message in its log file, and then exit without pulling any snapshot from the master Solr server.

If the name of the snapshot to be pull is not specified by the use of the "-n" option, snappuller will use ssh to determine the name of the most recent snapshot available on the master Solr server and pull it over if it does not already exist on the slave Solr server.

The status and stats of the current or most recent rsync operation of snappuller is kept in the file solr/logs/snappuller.status. Whenever this file is updated by snappuller, a copy is scp back to the master Solr server. See SolrCollectionDistributionStatusStats for more details.

All relative paths specified with the -D and -d options start at the Solr root directory solr.

snapinstaller

After a snapshot has been installed, snapinstaller writes its name into the file solr/logs/snapshot.current, and scp a copy of this file back to the master Solr server. See SolrCollectionDistributionStatusStats for more details. All relative paths specified with the -d option start at the Solr root directory solr.

snapcleaner

Please note that the argument to the commnand line option -D is not inclusive. In addition, a day old is defined as more than 24 hours old from the time the command is invoked. For example, -D 1 will cause snapcleaner to remove snapshots more than but not including 1 day old, ie 2 days and older. Therefore, snapshots more than 48 hours old will be removed. All relative paths specified with the -d option start at the Solr root directory solr.

SolrCollectionDistributionScripts (last edited 2012-09-24 18:54:12 by HossMan)