couchdb-dev mailing list archives

On Tue, Apr 13, 2010 at 7:28 PM, Adam Kocoloski <kocolosk@apache.org> wrote:
> On Apr 13, 2010, at 12:39 PM, J Chris Anderson wrote:
>
>>
>> On Apr 13, 2010, at 9:31 AM, till wrote:
>>
>>> Hey devs,
>>>
>>> I'm trying to compact a production database here (in hope to recover
>>> some space), and made the following observations:
>>>
>>> * the set is 212+ million docs
>>> * currently 0.8 TB in size
>>> * the instance (XL) has 2 cores, one is idle, the other maybe utilized at 10%
>>> * memory - 2 of 15 GB taken, no spikes
>>> * io - well it's EBS :(
>>>
>>> When I started _compact read operations slowed down (I'll give you 20
>>> Mississippi's for something that loads instantly otherwise).
>>> Everything "eventually" worked, but it slowed down tremendously.
>>>
>>> I restarted the CouchDB process and everything is back to "snap".
>>>
>>> Does anyone have any insight on why that is the case?
>>
>> I'm guessing this is an EBS / EC2 issue. You are probably saturating the IO pipeline.
It's too bad there's not an easy way to 'nice' the compaction IO.
>>
>> If you got unlucky and are on a particularly bad EBS / EC2 instance, you might do
best to start up a new Couch in the same availability zone and replicate across to it. This
will accomplish more-or-less the same effect as compaction.
>>
>>>
>>> Till
>>
>
> I'm surprised it's _that_ bad. The compactor only submits one I/O to EBS at a time,
so I wouldn't expect other reads to be starved too much. On the other hand, I'll bet compacting
a DB that large takes at least a month, especially if you used random IDs.
>
> On the other hand, when you compact you're messing with the page cache something fierce.
At 212M docs you need every one of those 16GB of RAM to keep the btree nodes cached. The
compactor a) reads nodes that your client app may not have been touching and b) writes to
a new file and the kernel starts to cache that too. So it's a fairly brutal process from
the perspective of the page cache.
I was looking at my fancy htop when it started to slow down and
neither RAM or CPUs were fully utilized. I mean, not even 50%. That's
what surprises me.
>
> Does anyone have a sense of how deep a btree with 212M entries will be? That is, how
many pread calls are required to pull up a doc?
>
> Till, do you have iostat numbers from the compaction run?
root@box:~# iostat
Linux 2.6.21.7-2.fc8xen (couchdb01.east1.aws.easybib.com) 04/13/2010
_x86_64_ (4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.91 0.00 0.08 8.43 1.38 89.21
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 0.00 0.00 0.00 4578 1008
sdc 0.00 0.00 0.00 776 0
sdd 0.00 0.00 0.00 776 0
sde 0.00 0.00 0.00 776 0
sda1 0.16 1.09 4.12 3552818 13408432
sdg 13.63 133.43 106.81 433818674 347266448
sdh 13.54 94.30 212.11 306595821 689630885
sdi 13.38 94.23 212.93 306366410 692284040
sdk 1.91 46.01 73.82 149575695 239999486
md0 27.04 188.53 425.04 612960367 1381916061
>
> Best, Adam
>
>
>