incubator-couchdb-user mailing list archives

Hi folks,
I have been digging further into replication performance on
Couch 0.10.1 and have noticed a few problems when replicating
attachments. The main problem centers around continuous
replication of large attachments.
Below is a summary of two test scenarios. The tests were broken
up by size of attachment, 2.6MB and 14K, which represent
possible working sets in my system. Think PDF or JAR file,
versus a small text document.
I know some bug reports have been filed around replication
performance in 0.10.1, but I am offering this report in case it
helps further delineate problem cases.
For now, I am looking for guidance on how to plan around these
problems. It seems the upcoming 0.11 release would address some
of these problems, but will it address all of them? Is there an
ETA on the 0.11 release?
Matt
1. Large attachment replication test
Source database contains 100 documents each with one 2.6M
attachment. Total database size is 263.3MB. Document ID's
follow the form "1testx" where x is a number from 1 to 100.
remote := replication across two couchdb servers on separate
hosts on the LAN
local := local replication on one couchdb server
a. Failed: remote, continuous pull.
Set up a continuous "pull" replication. Source and target
are on separate CouchDB servers on the LAN. Only one document
named "1test26" was replicated. In the couch log file, GET
requests for replicated documents were seen. After a few more
minutes, the database grew in size to 13M and contained two
documents. The target CouchDB server then crashed without
replicating any more documents.
b. Failed: remote, continuous push.
Same as (a).
c. Failed: local, continuous, fully-qualified source and
target names.
Same as (a) but the target and source were on same Couch
server. Same result as (a).
See attachments to this email for erl crash dump, "large_local_pull_erl_crash.dump".
In this case, the curl command specified fully-qualified
source and target sources:
d. Success: local, NOT continuous, only source and target names.
Source and target were on same couch server. The name of
the source and target database names were specified;
fully-qualified URL was not used.
curl -d '\"source\":"src_test1\" \"target\":\"dest_test1\".
. . '
"continuous" was set to "false".
Unlike (c), this replication was a success. All documents
were replicated.
e. Failed: local, continuous, only source and target names.
Replication was configured the same as (d) but "continuous"
was set to "true".
The result was the same as (a): initial replication of one
document was followed, a few minutes later, by a crash.
2. Small attachment replication test
Source database contains 100 documents each with one 14K
attachment. Total database size is 2.3MB.
a. Success: remote, continuous pull.
b. Success: remote, continuous push.
c. Success: local, continuous, fully-qualified source and
target names.
d. Success: local, NOT continuous, only source and target names.
e. Success: local, continuous, only source and target names.
In other words, small attachments replicated without problem
with or without continuous replication. Given the relatively
small size of the database in this test, would it be worthwhile
to gin up a larger working set (~10K documents) in order to
increase the database size?