From dev-return-21299-apmail-couchdb-dev-archive=couchdb.apache.org@couchdb.apache.org Sun Mar 4 17:47:26 2012
Return-Path:
X-Original-To: apmail-couchdb-dev-archive@www.apache.org
Delivered-To: apmail-couchdb-dev-archive@www.apache.org
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
by minotaur.apache.org (Postfix) with SMTP id 59AE79626
for ; Sun, 4 Mar 2012 17:47:26 +0000 (UTC)
Received: (qmail 20354 invoked by uid 500); 4 Mar 2012 17:47:25 -0000
Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org
Received: (qmail 20310 invoked by uid 500); 4 Mar 2012 17:47:25 -0000
Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
List-Help:
List-Unsubscribe:
List-Post:
List-Id:
Reply-To: dev@couchdb.apache.org
Delivered-To: mailing list dev@couchdb.apache.org
Received: (qmail 20302 invoked by uid 99); 4 Mar 2012 17:47:25 -0000
Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136)
by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Mar 2012 17:47:25 +0000
X-ASF-Spam-Status: No, hits=0.7 required=5.0
tests=SPF_NEUTRAL,TO_NO_BRKTS_PCNT
X-Spam-Check-By: apache.org
Received-SPF: neutral (athena.apache.org: local policy)
Received: from [80.244.253.218] (HELO mail.traeumt.net) (80.244.253.218)
by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Mar 2012 17:47:20 +0000
Received: from [10.0.0.10] (91-64-198-154-dynip.superkabel.de [91.64.198.154])
(using TLSv1 with cipher AES128-SHA (128/128 bits))
(No client certificate requested)
by mail.traeumt.net (Postfix) with ESMTPSA id 67D8D3CE88
for ; Sun, 4 Mar 2012 18:46:59 +0100 (CET)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Apple Message framework v1257)
Subject: Re: Please report your indexing speed
From: Jan Lehnardt
In-Reply-To: <4969999D-A6C0-469A-9120-D4C5CC2526F1@apache.org>
Date: Sun, 4 Mar 2012 18:46:58 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <4EE449FD-0DAC-4BAC-9C8D-6A5641587D11@apache.org>
References: <11B987B0-8C27-4C68-8DA7-7C56488702C9@apache.org> <4969999D-A6C0-469A-9120-D4C5CC2526F1@apache.org>
To: dev@couchdb.apache.org
X-Mailer: Apple Mail (2.1257)
X-Virus-Checked: Checked by ClamAV on apache.org
On Mar 4, 2012, at 18:40 , Jan Lehnardt wrote:
>=20
> On Mar 4, 2012, at 18:24 , Jan Lehnardt wrote:
>=20
>> I updated the google doc with results from an EC2 cc1.4xlarge =
instance (details are in the spreadsheet)
>>=20
>> This on EBS and Ubuntu 11.04/64.
>>=20
>> The results are bit different from the previous machine, but that =
isn't at all unexpected.
>>=20
>> tl;dr: for small docs (10bytes, 100bytes) 1.2.x-filipe beats 1.2.x =
and 1.1.1 , for large docs (1000bytes), 1.2.x beats 1.2.x-filipe (6% =
difference).
>=20
> Hah, I re-read through the results to make sure this is correct and I =
found a mistake. A copy and paste formula error accounted for bigger =
improvements of 1.2.x-filipe. This includes all my previous results.
>=20
> The good thing is 1.2.x-filipe is still faster, across the board than =
1.1.1 and 1.2.x. Still significantly, but not *as* much as about 30% in =
all but one case.
>=20
> The tl;dr for the EC2 run can now be changed to that 1.2.x-filipe =
beats 1.1.1 and 1.2.x for all docs, it's just that for large docs =
(1000bytes), 1.2.x is faster than 1.1.1. But 1.2.x-filipe is even =
faster.
>=20
>=20
>> So far, across the board, 1.2.x-filipe is ~16% faster (stdev 9%) than =
1.1.1 for view builds.
Sorry for misquoting this line, it is new and the most significant of =
this email, I'll just repeat it :)
So far, across the board, 1.2.x-filipe is ~16% faster (stdev 9%) than =
1.1.1 for view builds.
=
--------------------------------------------------------------------------=
------------------
The bigger the docs, the better the results, on both SSD and spinning =
disk.
Cheers
Jan
--=20
>=20
>=20
> If you have any more hardware I could run this on, I'm happy to help =
with the setup, it isn't hard :)
>=20
> Cheers
> Jan
> --
>=20
>=20
>>=20
>> This still makes me want to include Filipe's patch into 1.2.x.
>>=20
>> Cheers
>> Jan
>> --=20
>>=20
>> On Mar 4, 2012, at 10:24 , Jan Lehnardt wrote:
>>=20
>>> Hey all,
>>>=20
>>> I made another run with a bit of a different scenario.
>>>=20
>>>=20
>>> # The Scenario
>>>=20
>>> I used a modified benchbulk.sh for inserting data (because it is an =
order of magnitude faster than the other methods we had). I added a =
command line parameter to specify the size of a single document in bytes =
(this was previously hardcoded in the script). Note that this script =
creates docs in a btree-friendly incrementing ID way.
>>>=20
>>> I added a new script benchview.sh which is basically the lower part =
of Robert Newson's script. It creates a single view and queries it, =
measuring execution time of curl.
>>>=20
>>> And a third matrix.sh (yay) that would run, on my system, different =
configurations.
>>>=20
>>> See https://gist.github.com/1971611 for the scripts.
>>>=20
>>> I ran ./benchbulk $size && ./benchview.sh for the following =
combinations, all on Mac OS X 10.7.3, Erlang R15B, Spidermonkey 1.8.5:
>>>=20
>>> - Doc sizes 10, 100, 1000 bytes
>>> - CouchDB 1.1.1, 1.2.x (as of last night), 1.2.x-filipe (as of last =
night + Filipe's patch from earlier in the thread)
>>> - On an SSD and on a 5400rpm internal drive.
>>>=20
>>> I ran each individual test three times and took the average to =
compare numbers. The full report (see below) includes each individual =
run's numbers)
>>>=20
>>> (The gist includes the raw output data from matrix.sh for the =
5400rpm run, for the SSDs, I don't have the original numbers anymore. =
I'm happy to re-run this, if you want that data as well.)
>>>=20
>>> # The Numbers
>>>=20
>>> See =
https://docs.google.com/spreadsheet/ccc?key=3D0AhESVUYnc_sQdDJ1Ry1KMTQ5enB=
DY0s1dHk2UVEzMHc for the full data set. It'd be great to get a second =
pair of eyes to make sure I didn't make any mistakes.
>>>=20
>>> See the "Grouped Data" sheet for comparisons.
>>>=20
>>> tl;dr: 1.2.x is about 30% slower and 1.2.x-filipe is about 30% =
faster than 1.1.1 in the scenario above.
>>>=20
>>>=20
>>> # Conclusion
>>>=20
>>> +1 to include Filipe's patch into 1.2.x.
>>>=20
>>>=20
>>>=20
>>> I'd love any feedback on methods, calculations and whatnot :)
>>>=20
>>> Also, I can run more variations, if you like, other Erlang or =
SpiderMokney versions e.g., just let me know.
>>>=20
>>>=20
>>> Cheers
>>> Jan
>>> --=20
>>>=20
>>> On Feb 28, 2012, at 14:17 , Jason Smith wrote:
>>>=20
>>>> Forgive the clean new thread. Hopefully it will not remain so.
>>>>=20
>>>> If you can, would you please clone =
https://github.com/jhs/slow_couchdb
>>>>=20
>>>> And build whatever Erlangs and CouchDB checkouts you see fit, and =
run
>>>> the test. For example:
>>>>=20
>>>> docs=3D500000 ./bench.sh small_doc.tpl
>>>>=20
>>>> That should run the test and, God willing, upload the results to a
>>>> couch in the cloud. We should be able to use that information to
>>>> identify who you are, whether you are on SSD, what Erlang and Couch
>>>> build, and how fast it ran. Modulo bugs.
>>>=20
>>=20
>=20