From couchdb-user-return-1064-apmail-incubator-couchdb-user-archive=incubator.apache.org@incubator.apache.org Tue Aug 19 08:17:24 2008
Return-Path:
Delivered-To: apmail-incubator-couchdb-user-archive@locus.apache.org
Received: (qmail 59116 invoked from network); 19 Aug 2008 08:17:24 -0000
Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2)
by minotaur.apache.org with SMTP; 19 Aug 2008 08:17:24 -0000
Received: (qmail 3543 invoked by uid 500); 19 Aug 2008 08:17:22 -0000
Delivered-To: apmail-incubator-couchdb-user-archive@incubator.apache.org
Received: (qmail 3521 invoked by uid 500); 19 Aug 2008 08:17:22 -0000
Mailing-List: contact couchdb-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
List-Help:
List-Unsubscribe:
List-Post:
List-Id:
Reply-To: couchdb-user@incubator.apache.org
Delivered-To: mailing list couchdb-user@incubator.apache.org
Received: (qmail 3510 invoked by uid 99); 19 Aug 2008 08:17:22 -0000
Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136)
by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Aug 2008 01:17:22 -0700
X-ASF-Spam-Status: No, hits=-0.0 required=10.0
tests=SPF_PASS
X-Spam-Check-By: apache.org
Received-SPF: pass (athena.apache.org: domain of ralf.nieuwenhuijsen@gmail.com designates 209.85.198.245 as permitted sender)
Received: from [209.85.198.245] (HELO rv-out-0708.google.com) (209.85.198.245)
by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Aug 2008 08:16:26 +0000
Received: by rv-out-0708.google.com with SMTP id k29so2281153rvb.0
for ; Tue, 19 Aug 2008 01:16:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=gamma;
h=domainkey-signature:received:received:message-id:date:from:to
:subject:in-reply-to:mime-version:content-type
:content-transfer-encoding:content-disposition:references;
bh=aeC/UhXroTfY8X+WDRsCnYZIftBz82MtvMharNCdr1g=;
b=qM+c8o+DzbjO54bWdEm1mReF0gSUCvW5I3zg5qSL1KMsbO6dHZvep+uEhfB+4shmCK
nFbt9mhOOnxMqpConYGrf+/Y0fW+c2dsNLUNcELNCWzZ3NNmZ1yShNIFV49max1R89Hn
FGKBBafCGsdQzKpU9BLEUP6fvMJv1l0c7KlSo=
DomainKey-Signature: a=rsa-sha1; c=nofws;
d=gmail.com; s=gamma;
h=message-id:date:from:to:subject:in-reply-to:mime-version
:content-type:content-transfer-encoding:content-disposition
:references;
b=Yz9ea0K39iW4wcWz/R+Ocd5gKgq5wSkXntJp0efyOWFnS+wAIT9AW7csa3m7hf+Bsn
ZpNLj44UNoBkc7aY2bqRQEFnLxQwmzXjBcWU0h3neDziN4vttKjDXxIYvFlh+y3nUKjW
MTxMe6YVjYWA5QewDG3zG2+YyhNz6QMRnhoZM=
Received: by 10.114.197.1 with SMTP id u1mr6287789waf.75.1219133815373;
Tue, 19 Aug 2008 01:16:55 -0700 (PDT)
Received: by 10.114.58.9 with HTTP; Tue, 19 Aug 2008 01:16:55 -0700 (PDT)
Message-ID: <41fe564f0808190116i235cb618sabe19a2059e6289f@mail.gmail.com>
Date: Tue, 19 Aug 2008 10:16:55 +0200
From: "Ralf Nieuwenhuijsen"
To: couchdb-user@incubator.apache.org
Subject: Re: flexible filtering needed, with speed.
In-Reply-To: <76E4AFEC-7FCA-4063-9819-34150CF19E68@sankatygroup.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <76E4AFEC-7FCA-4063-9819-34150CF19E68@sankatygroup.com>
X-Virus-Checked: Checked by ClamAV on apache.org
Don't take Futon as a speed measure; since it might also be slowing
down in the rendering part if your documents are big. (there is a lot
of stuff going on client=side as well).
The truth is, all data that is being searched, people only care about
3-5 different types of search.
You can offcourse, go nuts with the indexing and just generate all
possible indexes you could possible need.
Here is one of my favorites; this creates an index for every unique field.
function(doc) {
for(var k in doc){
emit([k,1,doc[k]], rdoc);
}
}
You can query it like:
use startkey=['someField',1,null] and endkey=['someField',2,null]
To get the index for 'someField'.
Offcourse, this baby is going to create a huge index if used with too
many or too big documents, but I would at least try something like
that.
I use the above view function to make sure I can get the data sorted
however I want.
2008/8/19 Brad Anderson :
> Howdy,
>
> I have 12K docs that look like this:
>
> {
> "_id": "000111bf7a8515da822b05ebbb8cd257",
> "_rev": "94750440",
> "month": 17,
> "store": {
> "store_num": 123,
> "city": "Atlanta",
> "state": "GA",
> "zip": "30301",
> "exterior": true,
> "interior": true,
> "restroom": true,
> "breakfast": true,
> "sunday": true,
> "adi_name": "Atlanta, GA",
> "adi_num": 123,
> "ownership": "Company",
> "playground": "Indoor",
> "seats": 123,
> "parking_spaces": 123
> },
> "raw": {
> "Other Hourly Pay": 0.28,
> "Workers Comp - State Funds Exp": 401.65,
> "Rent Expense - Company": -8,
> "Archives Expense": 82.81,
> "Revised Hours allowed per": 860.22,
> "Merch Standard": 174.78,
> "Total Property Tax": 1190.91
>
> ...
>
> }
> }
>
> I truncated 'raw' but it's usually much longer, and avg. doc size is 5K.
>
> I'm trying to see how I will query them with views. I want to be able to
> filter down by various store sub fields, i.e all the Breakfast = true stores
> in Georgia that are owned by Franchisees. However, this will differ for
> just about every query.
>
> The 'reduce' function would then be averaging each line in the 'raw' field.
>
> I have played around with views that take the store filters, but just
> returning the 'raw' field as the value from the map function is brutally
> slow in Futon. This is because the view is accessed right away, so it
> builds, takes about 3-4 mins (on a MBP with 4GB RAM, 2.2GHz dual core,
> 7200RPM disk). I understand the next time this specific store group is
> requested, it's fast... but they will all be so dynamic that this seems
> prohibitively slow.
>
> So, I thought, should I be doing this in two steps? Set up the key to be
> store and whatever else I might want to query on (Month or whatever
> timeframe), and return the doc id's as the values on the original query? I
> would then send in a complex key to do the filtering. This would require
> waiting for the _bulk_get functionality, and I'd send that list of ID's into
> a 2nd query to get the raw data to send it to 'map'.
>
> This is slow now on 12K docs... It needs to be stupid-fast at that low
> number of docs, because the plan is for *way* more data.
>
> The filtering part is tailor-made for a RDBMS, but the doc handling (all the
> 'raw' fields will be different store-by-store, industry by industry, change
> over time, and in general be free-form) is perfect for CouchDB. Thoughts?
> I want to use the right tool for the job, and that's looking like a RDBMS,
> sadly. That is, unless I'm completely misusing Couch. In which case, swift
> blows to the head are welcome.
>
> Cheers,
> BA
>
>
>