Abstract

When a ZooKeeper server loses contact with over half of the other servers in an ensemble ('loses a quorum'), it stops responding to client requests. For some applications, it would be beneficial if a server still responded to read requests when the quorum is lost, but caused an error condition when a write request was attempted.

This project will implement a 'read-only' mode for ZooKeeper servers that allows read requests to be served as long as the client can contact a server

Detailed description

Client-side

Protocol changes

To make server distinguish these two types of clients, "am-i-readonly-client" field is added to a packet client sends to a server during connection handshake. If a server in r-o mode receives connection request from not r-o client, it rejects the client. This is the only protocol change, so traffic overhead is sizeof boolean per session.
This will involve changes in both Java and C clients.

Server-side

Server-side activity in r-o mode is handled by a subclass of ZooKeeperServer, ReadOnlyZooKeeperServer. Its chain of request processors is similar to leader's chain, but at the beginning it has ReadOnlyRequestProcessor which passes read operations but throws exceptions to state-changing operations.

When server, namely QuorumPeer, loses a quorum it destroys whichever ZooKeeperServer was running and goes to LOOKING state (this is a current logic which doesn't need to be altered), and creates ReadOnlyZooKeeperServer (new logic). Then, when some client connects to this peer, if running server sees this is a read-only client, it starts handling its requests; if it's a usual client, server drops the connection, as it does currently.

When the peer reconnects to the majority, it acts similarly to other state changes: shutdowns zk server (which will cause notification of all read-only clients about state change), and switches to another state.

Recovering from partitioning

Backwards compatibility

Usage

Application developers will decide which client to use -- r-o enabled or not -- i.e. they'll choose whether to have guaranteed consistent view of system, or agree to sometimes have outdated view in return for read access.

Enabling read-only mode in current applications will involve changing session handling logic (they will have to detect new "mode changed" notifications), but since interface remains unchanged transition should be very smooth.

It's worth repeating, despite server side will be (heavily?) changed, behavior for usual clients remains the same, so there will be no backwards incompatibility issues introduced.

Random notes

Benefits of this design: transparent usage of a new client, backwards compatible.
Drawbacks: more coupling between server and client (but this seems unavoidable in any case).

Jira ticket: ZOOKEEPER-704

My GSoC proposal