Thanks Andrey. Also found this ticket regarding this issue:
https://issues.apache.org/jira/browse/CASSANDRA-2698
On Tue, Oct 16, 2012 at 8:00 PM, Andrey Ilinykh <ailinykh@gmail.com> wrote:
>> In my experience running repair on some counter data, the size of
>> streamed data is much bigger than the cluster could possibly have lost
>> messages or would be due to snapshotting at different times.
>>
>> I know the data will eventually be in sync on every repair, but I'm
>> more interested in whether Cassandra transfers excess data and how to
>> minimize this.
>>
>> Does any body have insights on this?
>>
> The problem is in granularity of Merkle tree. Cassandra sends regions
> which have different hash values. It could be much bigger then a
> single row.
>
> Andrey