couchdb-user mailing list archives

On Wed, Jan 7, 2009 at 3:47 PM, Josh Bryan <jbryan@cashnetusa.com> wrote:
> Thanks for all the replies, I'll upgrade couch and erlang to the latest and
> retest. Yes, this is a single time import, but 70 millions records at 50 -
> 60 writes a second doesn't mean a day, it means 2 weeks or more. I don't
> mind throwing extra hardware at the problem, but I just want to make sure
> I'm throwing extra hardware in the right place and using existing hardware
> as best as I can. If writes to all DBs are serialized in a single thread,
> then if I partition the data into two DBs and fire up two copies of couch, I
> should be able to make use of another processor on the same machine?
Each DB should get its own updater process I believe so yes this
should lead to a speedup.
> I'll
> test this tomorrow along with the newer versions.
>
> Thanks,
> Josh
>
> Paul Davis wrote:
>>
>> Erlang 5.5.5 is borked. 5.6.x should be ok.
>>
>> Also, yes, writes to the database are serialized in a single thread.
>> For reference, when storing data, are you using the _bulk_docs
>> interface?
>>
>> Also, in trunk the fsync calls are turned off by default now so you
>> should notice more speedup there.
>>
>> Also, if these are archived records, wouldn't this be a single time
>> cost? Faster is always better, but if it takes a day, is that a big
>> deal?
>>
>> HTH
>> Paul
>>
>> On Wed, Jan 7, 2009 at 2:55 PM, Josh Bryan <jbryan@cashnetusa.com> wrote:
>>
>>>
>>> Chris Anderson wrote:
>>>
>>>>
>>>> On Wed, Jan 7, 2009 at 4:37 PM, Josh Bryan <jbryan@cashnetusa.com>
>>>> wrote:
>>>>
>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am looking into CouchDB as a solution to store a bunch (approx 70
>>>>> million) archived documents. While planning for the import process,
I
>>>>> did some benchmarking to figure out how long the import will take. I
>>>>> get about 50-70 inserts per second on average. However, when I looked
>>>>> for the bottleneck, I couldn't figure it out. I am connected to the
>>>>> database via a fast lan and can verify that the network is not
>>>>> saturated. I can also verify that disk IO is not saturated. The only
>>>>> clue is that of the 4 cpus on the server, it seems that only one is
>>>>> getting fully loaded. Also, of the 5 erlang processes I can see
>>>>> running, only one of them seems to be getting most of the cpu time.
I
>>>>> know that erlang is built with smp enabled, so if it is cpu bound, why
>>>>> can't it make use of the other 3 processors?
>>>>>
>>>>> I thought that perhaps there was some internal write lock issue per
>>>>> database that allowed only one thread to write to a db at a time, so
I
>>>>> tried running the benchmarks while hitting multiple databases, but
>>>>> still
>>>>> got the same write rate across the databases. Is there some globally
>>>>> shared resource in couchdb that limits all writes to a single thread?
>>>>>
>>>>> Thanks,
>>>>> Josh
>>>>>
>>>>>
>>>>>
>>>>
>>>> Before we can help you diagnose the performance you're seeing, could
>>>> you tell us the version of CouchDB and the version of Erlang that you
>>>> are using? It wouldn't hurt to describe the hardware in more detail
>>>> either.
>>>>
>>>>
>>>>
>>>
>>> I am seeing similar results on two systems.
>>>
>>> System 1:
>>> Quad core Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
>>> 2 GB ram
>>> Linux 2.6.18-4 -- Debian Lenny
>>> Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [smp:4]
>>> [async-threads:0] [kernel-poll:false]
>>> couchdb - Apache CouchDB 0.8.0-incubating
>>>
>>> System 2:
>>> Intel(R) Pentium(R) D CPU 3.00GHz
>>> 3 GB ram
>>> Erlang (BEAM) emulator version 5.5.5 [source] [async-threads:0]
>>> [kernel-poll:false]
>>> couchdb - Apache CouchDB 0.9.0a724455-incubating
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>