Deployment of Solr Cores with Zookeeper

Architecture

Zookeeper may be used as a distributed filesystem to write which Solr servers should be running which cores. Ephemeral nodes are intentionally not used, except for the core status updates. Zookeeper is used as a transactionally redundant filesystem, not a system for maintaining connections to various servers. This is best left to dedicated monitoring services. GSON is the JSON library used to serialize and deserialize objects to and from the JSON format.

Code

The Jira issue for this project is located at: https://issues.apache.org/jira/browse/SOLR-1724

The Solr Cloud branch is used in conjunction with this issue. The Solr Cloud SVN location is: http://svn.apache.org/repos/asf/lucene/solr/branches/cloud

Supported File Types

Zipped cores are the standard because they are easier to manage, download, and transfer across the network.

Zookeeper Filesystem

Cores

Each "cores" file is written to Zookeeper and is of the form cores_N, where N is the version number. This is purposefully similar to the segment infos files written by Lucene. The cores file is stored in the JSON format.

Contents of the cores file:

Name

Type

Description

name

string

The name of the core which is the same as the ZK filename

version

long

version of the core which should match with the version in the filename

array

coresinfo

An array of core infos

Each cores info contains:

Name

Type

Description

host

string

The host

name

string

The core name

instanceDir

string

Optional instanceDir

configFile

string

Optional solrconfig.xml path

schemaFile

string

Optional schema.xml path

dataDir

string

Optional data directory path

url

string

Optional URL from which to download and install the core from. It will be unzipped to the instance dir

Host

Each Solr server (aka host or CoreContainer) must report to Zookeeper which cores it has installed. Each host file is of the form host_version. It is the responsibility of each Solr host/server to match the state of the cores_N file. Meaning, each Solr server must install the cores defined for it and after successful install, write the hosts file out to Zookeeper.

Contents of a host file:

Name

Type

Description

name

string

Same as the filename

version

long

Version encoded into the filename

array

hostinfo

Array of host infos

Each host info contains:

Name

Type

Description

name

string

Core name

instanceDir

string

The instance directory

configFile

string

The solrconfig.xml path

schemaFile

string

The schema.xml path

dataDir

string

The data directory path

size

long

The size in bytes of the core

lastModified

long

The last modified of the core

Sample Directory Layout

There are 2 cores files in this sample directory layout. Under /production/hosts several host files have been written. Actually, all of the necessary hosts files have been written indicating that for example cores_1 and cores_2 operational definitions have completed on each Solr server.

The /production/hosts/status directory contains ephemeral node files for each core operation currently in progress. The status directory may be read by the cores Zookeeper manager user interface to inform the administrative user of the state of the operations.

/production/cores_1
/production/cores_2
/production/hosts/servera_1
/production/hosts/serverb_1
/production/hosts/serverc_1
/production/hosts/serverd_1
/production/hosts/servera_2
/production/hosts/serverb_2
/production/hosts/serverc_2
/production/hosts/serverd_2
/production/hosts/status/serverc_2_septblogs
/production/hosts/status/servera_2_julyblogs
/production/hosts/status/serverd_2_augustblogs

CoreController

Core deploy client that lives inside a CoreContainer. It listens for events on a given path, finds it's hostname in the latest cores file by version. Each cores file is like Lucene's segment infos file which describes the set of segments that make up the current index. The cores file defines the set of cores that should be installed on a given Solr host.

A default root path must be defined, for the unit tests /production is used.

Libraries

TODO

DeploymentofSolrCoreswithZookeeper (last edited 2010-03-01 05:19:50 by JasonRutherglen)