Description

Cluster set up:
c1 : c2 :: 10 : 10

sbucket: c1 -> c2
default: c2 -> c1

>> Replication set up with continuous front end load
>> Front end load for default = ~10K ops per sec
>> Front end load for sbucket = ~4-5K ops per sec
>> Average replication seen on c1 (for default): ~12-14K ops per sec
>> Average replication seen on c2 (for sbucket): ~15-18K ops per sec

At a particular snapshot, on C1:
{With same amount of load (mixed), on bucket "sbucket"}

Activity

1. The stats "docs to replicate" is not the # of items accumulated in changes queue. It is the stats reported by CouchDB (109M docs on your screenshot).
Stat "docs in the queue" is the real number of docs in the queue, which 1.33M. I think I explained this in my earlier email to xdcr-eng.

Today, to reduce the memory overhead of XDCR, the queue is limited to max 4k items and 400KB per active vbucket. By some simple math, you can compute that
the number of items in queue per clusters is about 4K *32*10 = 1.3M items, which is consistent with your observation. This is expected behavior.

In your case, since you have large clusters of 10 nodes, you may try higher number of concurrent replications from 32 to 100 to see any difference.

2. You have expired items in your workload but with very small expiration time 60 seconds, given the size of your workload, it will mostly like expire before replicating to the remote cluster, that is why you see high XDCR ops but low set/deletes. I strongly suggest you exclude expired items from your test since it just created lots of confusion.

Without further information, this is pretty much all I can say. Please let me know how to log onto the EC2 node and how to access the logs.

Junyi Xie (Inactive)
added a comment - 15/Oct/12 4:51 PM 1. The stats "docs to replicate" is not the # of items accumulated in changes queue. It is the stats reported by CouchDB (109M docs on your screenshot).
Stat "docs in the queue" is the real number of docs in the queue, which 1.33M. I think I explained this in my earlier email to xdcr-eng.
Today, to reduce the memory overhead of XDCR, the queue is limited to max 4k items and 400KB per active vbucket. By some simple math, you can compute that
the number of items in queue per clusters is about 4K *32*10 = 1.3M items, which is consistent with your observation. This is expected behavior.
In your case, since you have large clusters of 10 nodes, you may try higher number of concurrent replications from 32 to 100 to see any difference.
2. You have expired items in your workload but with very small expiration time 60 seconds, given the size of your workload, it will mostly like expire before replicating to the remote cluster, that is why you see high XDCR ops but low set/deletes. I strongly suggest you exclude expired items from your test since it just created lots of confusion.
Without further information, this is pretty much all I can say. Please let me know how to log onto the EC2 node and how to access the logs.

Abhinav Dangeti
added a comment - 15/Oct/12 5:11 PM I understand, to log into the ec2 nodes, for the west coast cluster you could use the QAKey as:
ssh -i QAKey.pem ubuntu@ec2-50-18-140-172.us-west-1.compute.amazonaws.com
and for the southeast cluster, using SingaporeQAKey.pem as:
ssh -i SingaporeQAKey.pem ubuntu@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com
I shall email you both the keys.

Junyi Xie (Inactive)
added a comment - 15/Oct/12 5:22 PM On southeast cluster where there is an ongoing XDCR with high get but low set/delete, I saw a bunch of source lost the conflict resolution, consistent with the stats on UI.
Please remove the expired items from your workload, otherwise we cannot determine it is a bug or not.
[xdcr:debug,2012-10-15T22:16:07.067, ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com :<0.24978.2227>:capi_replication:get_missing_revs:43]after conflict resolution for 500 docs, num of remote winners is 0 and number of local winners is 500.
[xdcr:debug,2012-10-15T22:16:07.071, ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com :<0.25039.2227>:capi_replication:get_missing_revs:43]after conflict resolution for 500 docs, num of remote winners is 0 and number of local winners is 500.
[xdcr:debug,2012-10-15T22:16:07.074, ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com :<0.24910.2227>:capi_replication:get_missing_revs:43]after conflict resolution for 500 docs, num of remote winners is 0 and number of local winners is 500.
[xdcr:debug,2012-10-15T22:16:07.078, ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com :<0.24916.2227>:capi_replication:get_missing_revs:43]after conflict resolution for 500 docs, num of remote winners is 0 and number of local winners is 500.
[xdcr:debug,2012-10-15T22:16:07.081, ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com :<0.25327.2227>:capi_replication:get_missing_revs:43]after conflict resolution for 500 docs, num of remote winners is 0 and number of local winners is 500.
[xdcr:debug,2012-10-15T22:16:07.225, ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com :<0.24889.2227>:capi_replication:get_missing_revs:43]after conflict resolution for 500 docs, num of remote winners is 0 and number of local winners is 500.
[xdcr:debug,2012-10-15T22:16:07.230, ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com :<0.25085.2227>:capi_replication:get_missing_revs:43]after conflict resolution for 500 docs, num of remote winners is 0 and number of local winners is 500.
[xdcr:debug,2012-10-15T22:16:14.582, ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com :<0.25015.2227>:capi_replication:get_missing_revs:43]after conflict resolution for 500 docs, num of remote winners is 0 and number of local winners is 500.
[xdcr:debug,2012-10-15T22:16:22.882, ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com :<0.25084.2227>:capi_replication:get_missing_revs:43]after conflict resolution for 500 docs, num of remote winners is 0 and number of local winners is 500.
[xdcr:debug,2012-10-15T22:16:29.451, ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com :<0.24913.2227>:capi_replication:get_missing_revs:43]after conflict resolution for 500 docs, num of remote winners is 0 and number of local winners is 500.