Anti-entropy Overview

AntiEntropyService generates MerkleTrees for column families during major compactions. These trees are then exchanged with remote nodes via a TreeRequest/TreeResponse conversation, and when ranges in the trees disagree, the 'org.apache.cassandra.streaming' package is used to repair those ranges.

Tree comparison and repair triggering occur in the single threaded AE_SERVICE_STAGE.

The steps taken to enact a repair are as follows:

1. A major compaction is triggered either via nodeprobe, or automatically:

2. The compaction process validates the column family by:

3. When a node receives a TreeResponse, it passes the tree to rendezvous(), which checks for trees to rendezvous with / compare to:

4. Differencers are executed in AE_SERVICE_STAGE, to compare the two trees.


Repairs currently require 2 major compactions: one to validate a column family, and then another to send the disagreeing ranges.

One possible fix to this problem would be to use something like a Linear Bloom Filter to store a summary of every SSTable on disk, where each sub-bloom is partitioned using 'midpoint()' like the current MerkleTree. Then, to validate a column family, you could OR together the bloom filters for each SSTable, and send it to neighbors without performing a compaction.


ArchitectureAntiEntropy (last edited 2013-11-12 23:53:30 by 50)