Topic management in Hedwig

ZooKeeper data structure

Metadata about topics, subscribers, and hubs will be stored in ZooKeeper. For a given Hedwig region, we will store the following structure:

The rectangles in this diagram are znodes; rectangles with dashed borders are ephemeral znodes.

Topic creation

Topic creation and assignment to a hub is a lazy process. Topics are created on demand (e.g., when there is a subscriber) and assigned to a hub on demand (e.g., when there is a new subscriber or a message published.) When a hub responsible for a topic fails, we reassign the topic on demand; e.g. when the connected subscribers reconnect.

Subscription process

The subscribe call takes three parameters:

When a client C subscribes to a topic T, it will contact one of the hubs (say, H1) and send a subscribe(C,T,False) message. When a client receives a redirect message from a hub, it will retry its subscription to the hub listed in the message (e.g. H2). It will do this by sending a subscribe(C,T,true) message to the hub H2. The flow is similar to the "false" case, except that the hub H2 knows that it should try to become the owner of the topic, instead of choosing a random hub.

Upon receiving a subscribe message for topic T, the hub H1 will follow these steps:

Notes:

Re-subscription process

A client may become disconnected from a hub, for many reasons including:

When this happens, the client can just resubscribe to the topic. Using the same subscription process as above, Hedwig will direct the client to the appropriate hub, either the (old) hub which still owns the topic, or a (new) hub which has taken over the topic.

Publish process

When a client C publishes to a topic T, C contacts a hub (say, H1) and tries to publish. The publish call takes four parameters:

When the client C sends a publish call to H1 to publish a message on topic T, H1 follows these steps:

Notes:

Topic redistribution

Occasionally, we should shuffle topics between hubs to ensure load balancing. For example, when a new hub joins, we want topics to be assigned to it. Similarly, if some topics are hotter than others, the hub should be able to shed load. Since all of the persistent state about a topic is in ZooKeeper or BookKeeper, shuffling a single topic can be easy: the hub just stops accepting publishes and deletes its ephemeral node. The next time a client tries to subscribe or publish to the topic, it will get assigned to a random hub.

When should a hub abandon a topic? It should do so at least under the following conditions:

The constant shuffling of topics should help to keep load evenly balanced across hubs, without human intervention. Moreover, lazily abandoning topics will help the shuffling to occur in an incremental fashion spread out over time.

Open questions