From dev-return-11052-apmail-couchdb-dev-archive=couchdb.apache.org@couchdb.apache.org Mon Aug 02 21:55:23 2010
Return-Path:
Delivered-To: apmail-couchdb-dev-archive@www.apache.org
Received: (qmail 55109 invoked from network); 2 Aug 2010 21:55:22 -0000
Received: from unknown (HELO mail.apache.org) (140.211.11.3)
by 140.211.11.9 with SMTP; 2 Aug 2010 21:55:22 -0000
Received: (qmail 88182 invoked by uid 500); 2 Aug 2010 21:55:22 -0000
Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org
Received: (qmail 88066 invoked by uid 500); 2 Aug 2010 21:55:21 -0000
Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
List-Help:
List-Unsubscribe:
List-Post:
List-Id:
Reply-To: dev@couchdb.apache.org
Delivered-To: mailing list dev@couchdb.apache.org
Received: (qmail 88058 invoked by uid 99); 2 Aug 2010 21:55:21 -0000
Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136)
by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Aug 2010 21:55:21 +0000
X-ASF-Spam-Status: No, hits=0.0 required=10.0
tests=FREEMAIL_FROM,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL
X-Spam-Check-By: apache.org
Received-SPF: pass (athena.apache.org: domain of paul.joseph.davis@gmail.com designates 209.85.160.180 as permitted sender)
Received: from [209.85.160.180] (HELO mail-gy0-f180.google.com) (209.85.160.180)
by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Aug 2010 21:55:15 +0000
Received: by gye5 with SMTP id 5so1992872gye.11
for ; Mon, 02 Aug 2010 14:54:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=gamma;
h=domainkey-signature:received:mime-version:received:in-reply-to
:references:from:date:message-id:subject:to:content-type;
bh=TFXkwFihk4/3eCRGQ7Zgy4wkosHHXCe1NEQgfk1BNAo=;
b=Nqx/mn013CNsjachynYoXlyvO5dCpCUwARFLX55BMwhzNOPY02r1fjbkyTpCMWbBjM
vyYzfthbqWzdf4Q2FxgTxY+xIyqT31BbXQ6hp6VmJtPi1hpOIz/I3oa0XW8JpI5UqV+S
mPki65q3lCyZvxNuPFWVDXu6abprZVdju1Vjg=
DomainKey-Signature: a=rsa-sha1; c=nofws;
d=gmail.com; s=gamma;
h=mime-version:in-reply-to:references:from:date:message-id:subject:to
:content-type;
b=EEF7dv5PDm54Y7jww++rr99HQFyzUYqF/5XbSwVs7zUfHWGbvAabMcQn/OxrTgBhUn
R5f+ItSP37L2lukCDEZyJW2We+GX0NMwu2qGHijKkywj4XeKuz8tRmxzdPCsCgkpZ5yZ
XRNjQRUO2bRluR9Mz+3K90TOzr2MltivgVtNI=
Received: by 10.101.28.26 with SMTP id f26mr7079648anj.149.1280786094112; Mon,
02 Aug 2010 14:54:54 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.100.239.15 with HTTP; Mon, 2 Aug 2010 14:54:34 -0700 (PDT)
In-Reply-To:
References:
From: Paul Davis
Date: Mon, 2 Aug 2010 17:54:34 -0400
Message-ID:
Subject: Re: Proposal for changes in view server/protocol
To: dev@couchdb.apache.org
Content-Type: text/plain; charset=ISO-8859-1
On Mon, Aug 2, 2010 at 5:34 PM, Mikeal Rogers wrote:
>>
>> For the first point about CommonJS modules in Map/Reduce views I'd say
>> the goal is fine, but I don't understand how or why you'd want that
>> hash to happen in JavaScript. Unless I'm mistaken, aren't the import
>> statements executable JS? As in, is there any requirement that you
>> couldn't import a module inside your map function? In which case, JS
>> can't really hash all imported modules until after all possible code
>> paths have been traced?
>>
>> I think a better answer would be to allow commonjs modules, but only
>> in some name-space of the design document. (IIRC, the other functions
>> can pull from anywhere, but that would make all design doc updates
>> trigger view regeneration) Then Erlang just loads this namespace and
>> anything that could be imported is included in the hash some how (hash
>> of sorted hashes or some such).
>>
>
> This is an interesting idea and I think I like it more than my original
> proposal. My fear with the original proposal was that it might be opaque to
> most users what will invalidate their views if we start doing fancy
> invalidation on modules they use. If we re-scope or restrict the module
> support to an attribute that would make it very clear that changes to those
> modules will invalidate the view.
>
>
>>
>> Batching docs across the I/O might not give you as much of a
>> performance improvement as you'd think. There's a pretty nasty time
>> explosion on parsing larger JSON documents in some of the various
>> parsers I've tried. I've noticed this on various Pure erlang parsers,
>> but I wouldn't be suprised if the the json.js suffered as well. And in
>> this, I mean, that parsing a one megabyte document might be quite a
>> bit slower than parsing many smaller documents. So simply wrapping
>> things in an array could be bad.
>>
>
> The new native C parser in JavaScript is fine with anything this size and I
> believe Damien just wrote an evented JSON parser which should make this more
> acceptable on the client side. One good idea I think jchris has was instead
> of having a number of documents threshold was to have a byte length
> restriction on the batch we send to the view server.
Yeah, the new embedded JSON parser should be fine as long as we can
motivate people to upgrade to a recent JavaScript library. My
experience is more related to the Erlang side as that what I've done
all of my comparisons against. I haven't done any testing on the
streaming parser but it'd be interesting to see how it behaves in
relation to doc size input.
> The I/O time for large amounts of small documents is larger than you would
> expect. I ran some tests a while back and there was more time spent in stdio
> for simple map/reduce operations than there was in processing on the view
> server.
Did you run the experiment to try batching the updates across the
wire? I'm not surprised that the transfer can take longer than the
computation, but I'm not sure how much benefit you'd get from batching
100 or so docs. I reckon there'd be some, I just don't have any idea
how much.
> Of course the most time spent on view generation is still writing to the
> btree but that performance has already increased quite a bit so we're
> looking for other places we can optimize.
>
>
>>
>> An alternative that I haven't seen anywhere else in this thread was an
>> idea to tag every message passed to the view engine with a uuid. Then
>> people can do all sorts of fancy things with the view engine like
>> async processing and so on and such forth. The downside being that the
>> saturday afternoon implementation of the view engine in language X now
>> takes both saturday and sunday afternoon.
>>
>
> So, this gets dicey really fast. I want the external process protocol to go
> non-blocking and support this uuid style communication but I'm really
> skeptical of it in the view server.
>
> The view server should do pure functional transforms, allowing it to do I/O
> means that is no longer true. It's also not just as simple as stamping the
> protocol with a uuid because erlang still needs to load balance any number
> of external processes. When the view server no longer solely blocks on
> processing it becomes much harder to achieve that load balancing.
>
Well, the original proposal was that if we do an asynchronous message
passing thing between with the view server then Erlang doesn't do the
load balancing, the view server could become threaded or be use a
pre-fork server model and do the load balancing across multiple cores
itself.
But you reminded me of the point that convinced me to not experiment
with the approach. If something causes the view engine to crash you
can end up affecting a lot of things that are unrelated. Ie, someone
gets a 500 on a _show function because a different app had a bug in
its view handling code that happened to be reindexing. With the
current model the effects of errors are more isolated.
>
>>
>> Apologies for missing this thread earlier. Better late than never I guess.
>>
>> Paul Davis
>>
>