This page is for design notes and information relating to operations effecting token/range ownership. See also:

Requirements

  1. Offsetting ownership ratios for heterogeneous nodes

  2. Correcting imbalances created by random token selection

  3. Randomizing ranges after a migration

Heterogeneous Nodes

When running a cluster of heterogeneous nodes, (i.e. differing amounts of storage, memory, cores, etc), it may be desirable to place a greater or less portion of the keyspace on one or more nodes.

Imbalance

By default, nodes tokens are randomly generated with the expectation that an even distribution of the namespace will result. However, variations of as much as 7% have been reported on small clusters when using the num_tokens default of 256.

These randomly generated tokens are MD5 sums, so entropy isn't the problem here, at least not in the sense that using a better RNG would create a more even distribution of ranges. Increasing the token count (either by increasing num_tokens, or the number of nodes) will improve this, (the more tokens, the more the distribution will even out).

This anecdotal worst-case is probably Good Enough, especially when considering that key distribution is subject to the same properties, or that many data sets are skewed on their own, (i.e. optimal ownership is not necessary optimal anyway).

That said, our history is one where random token selection produced completely unacceptable results, and manual intervention was required. The typical (expected) result of manual token selection is near perfect balance of ownership, and it will likely be some time before people are comfortable seeing otherwise.

Shuffling

When migrating a legacy cluster with one-token-per-node to virtual nodes, the existing range is carved up into num_tokens new ranges. These new ranges are still contiguous however, and a means of randomizing their placement is needed.

Implementation (Draft)

Considerations

Nodes / Cluster

The most straightforward method of effecting ownership is a token move (i.e. relocating a range from one node to another). Exposing this with JMX would allow implementing all of the required operations client-side.

User Interface

TODO

stats

VirtualNodes/Balance (last edited 2013-11-15 18:49:20 by GehrigKunz)