Reading into Apache Cassandra AntiEntropyService

AntiEntropyService encapsulates "validating" (hashing) individual column families,
exchanging MerkleTrees with remote nodes via a TreeRequest/Response conversation,
and then triggering repairs for disagreeing ranges.
Every Tree conversation has an 'initiator', where valid trees are sent after generation
and where the local and remote tree will rendezvous in rendezvous(cf, endpoint, tree).
Once the trees rendezvous, a Differencer is executed and the service can trigger repairs
for disagreeing ranges.
Tree comparison and repair triggering occur in the single threaded Stage.ANTIENTROPY.
The steps taken to enact a repair are as follows:
1. A major compaction is triggered via nodeprobe:
Nodeprobe sends TreeRequest messages to all neighbors of the target node: when a node
receives a TreeRequest, it will perform a readonly compaction to immediately validate
the column family.
2. The compaction process validates the column family by:
Calling Validator.prepare(), which samples the column family to determine key distribution,
Calling Validator.add() in order for every row in the column family,
Calling Validator.complete() to indicate that all rows have been added.
Calling complete() indicates that a valid MerkleTree has been created for the column family.
The valid tree is returned to the requesting node via a TreeResponse.
3. When a node receives a TreeResponse, it passes the tree to rendezvous(), which checks for trees to
rendezvous with / compare to:
If the tree is local, it is cached, and compared to any trees that were received from neighbors.
If the tree is remote, it is immediately compared to a local tree if one is cached. Otherwise,
the remote tree is stored until a local tree can be generated.
A Differencer object is enqueued for each comparison.
4. Differencers are executed in Stage.ANTIENTROPY, to compare the two trees, and perform repair via the streaming api.

That definitely a lot of operations involve in AntiEntropyService. Let's first identify all the classes

Validator

ValidatorSerializer

TreeRequestVerbHandler

TreeResponseVerbHandler

CFPair

TreeRequest

TreeRequestSerializer

RepairSession

RepairJob

Differencer

TreeResponse

RepairFuture

RequestCoordinator

Order

SequentialOrder

ParallelOrder

There are 16 classes in total and we can see that the classes is what the javadoc described above.

AntiEntropyService is a singleton service with four status, started, session_success, session_failed and finished. An important method submitRepairSession
/**
* Requests repairs for the given table and column families, and blocks until all repairs have been completed.
*
* @return Future for asynchronous call or null if there is no need to repair
*/
public RepairFuture submitRepairSession(Range<Token> range, String tablename, boolean isSequential, boolean isLocal, String... cfnames)
{
RepairSession session = new RepairSession(range, tablename, isSequential, isLocal, cfnames);
if (session.endpoints.isEmpty())
return null;
RepairFuture futureTask = session.getFuture();
executor.execute(futureTask);
return futureTask;
}

where a new repair session is created and run by the executor. Another static method getNeighbors() where it gets neighbors that share the range.

Next, a static Validator class implement Runnable interface has the following javadoc description

A Strategy to handle building and validating a merkle tree for a column family.
Lifecycle:
1. prepare() - Initialize tree with samples.
2. add() - 0 or more times, to add hashes to the tree.
3. complete() - Enqueues any operations that were blocked waiting for a valid tree.

Then we read that there is a inner static class ValidatorSerializer with two important methods, serialize and deserialize. Mainly serialize (or deserialize) tree request and merkle tree. The next two classes, TreeRequestVerbHandler and TreeResponseVerbHandler which is pretty trivial, both handling request and response from remote nodes.

Then another simple calss CFPair . Then another important class TreeRequest with method createMessage(). Same like ValidatorSerializer, TreeRequestSerializer also has two method serialize and deserialize.

The next class RepairSession which is pretty important, the javadoc written

Triggers repairs with all neighbors for the given table, cfs and range.
Typical lifecycle is: start() then join(). Executed in client threads.

The next two classes nested of RepairSession are RepairJob and Differencer. Which pretty much details to calculate the difference of trees and the preform a repair. The remaining tasks are pretty trivial.