Hi Everyone-
I have been using CouchDB for several years and I absolutely love working
with it most of the time. Thanks to everyone who has made it such a joy to
work with. There are however a few consistent situations where I run into
trouble and that I would like to fix if possible. I have a few ideas
regarding features that would help me design my data model the way I want
and would require me to make far less trade offs at the application level.
While I think I grok the public facing api fairly well, I don't know the
internals at all so I don't know if these features are possible or not.
What I would like is for some feedback regarding the possibility of each
as well as some sort of feedback regarding difficult, why that feature
hasn't been implemented yet, etc... I know some have been discussed
before, but haven't been implemented yet so I just want to figure out why
and what I can do about it. My CouchDB use is mostly personal thus far,
but in the spirit of contributing back (since I have no erlang skills), I
would be willing to sponsor somebody to get these features coded up and
committed to core if everyone agreed that they were valuable features that
should be included in CouchDB.
Anyway, on to my features:
*1) Multiple Start & End Keys For CouchDB Views With Group Level Option For
Reduce Views*
This one has been discussed multiple times before. The JIRA issue is 523 I
think.
I don't think there is really much debate that this is a must have for
CouchDB. There has even been a patch or two. What is stopping this from
happening? There hasn't been much discussion on the topic lately. A
status update would be great from anyone who has the power to make this
happen/get the ball rolling again. This single feature alone would make so
many more things possible.
*2) Return "No Key" And Empty Row When POSTing Keys To Views Instead Of
Nothing If No Key Matches View*
The common scenario I have is where I can't get everything in one query
from CouchDB, where I am view my data in a list format. Say for example,
the category page of a store. I want to display a list or products and
each product has the following related documents that need to be called per
view row: The products brand doc, the products pricing doc, the products
currently availablilty (reduce view row ), any customer specific product
documents such as the customer part number, customer specific pricing,
etc...
So my product doc, looks like this:
type: Product
brand: id_for_brand_doc_here
prices: id_for_prices_doc_here
attributes: { hash of attributes here }
categories: [array_of_category_ids_here]
So, at most I can get the product and one other doc per view row using the
linked document feature. This means that if I want to display all the
information I want in my application, I have to do multiple lookups per
product in the list view. This could easily generate 100's of queries to
couch for 1 page view. Multiply this by several requests coming in at the
same time at it starts to become a problem.
Alternatively, I could issue 4 requests to couch for the entire list by
issuing POST request to couch and then zipping the arrays together. (I use
Ruby at the application level...) Then, I just have to iterate over the new
array one time and make no more requests to couch. The reason this fails
is that if you issue a POST request to couch with a key that is not in the
view your are posting to, CouchDB doesn't respond with anything for that
view row so it would make the array sizes different and therefore make it
hard to handle in the client with iterating over the array multiple times.
Once to join the data to its proper row and one more time when displaying
the information. If CouchDB gave me back the same number of rows as keys I
requested I could easily join the arrays together in my application and
significantly limit that amount of queries I am sending to couch.
For example:
Request 1:
URL: database/_design/Prodouct/_view/product_with_price?include_docs=true
Keys: ["product_id_1", "product_id_2", "product_id_3" ]
Now, if my view had the following:
if doc.type == product && doc.status == enabled
emit (product._id, { name: doc.name, _id: doc.prices }
I would get back all 3 products as long as all three were enabled. But
if I set a product to disabled it won't show in the view row and therefore
couch would return an array of only 2 results, which will make it hard when
joining arrays in my application.
Request 2:
URL: database/_design/Product/_view/by_stock_levels?reduce=true&group=true
Keys: ["product_id_1", "product_id_2", "product_id_3" ]
Side Note: I can't combine 1&2 to one reduce view even though the key is
the same because I get reduce overflow error.
Request 3:
URL: database/_design/Product/_view/by_customer_part_number
Keys: [["product_id_1", "customer_id"], ["product_id_2",
"customer_id"], ["product_id_3", "customer_id"] ]
If the customer doesn't have a doc that matches this view, couch won't
return an empty row, it just won't return the row. Therefore if customer
had a matching row for products 1 and 3, and I just zipped the arrays of
returned results together, I would get products 3 doc with product 2 in my
application. However if couch returns, "no key found" with an empty row,
the joining of arrays in my application would still work.
Request 4:
URL: database/_design/Brand/_view/all
Keys:
["brand_id_for_product_id_1", "brand_id_for_product_id_2",
"brand_id_for_product_id_3" ]
Now, if for some reason, a brand gets deleted and the ID is still on the
product, Couch will return and array of rows that did not match the size in
my application and it's conceivable that I could get the wrong brand on the
wrong product.
I could of course check that the array sizes match and only merge if they
match and if not, don't merge and make request on the per product basis
when displaying results, but it just seems to me that it would be better if
couch gave me feedback that no results match for that key i requested.
This would save me a ton and certainly make working with couch more
relaxing.
I could really really use this feature and I don't think it would be very
much trouble at all to send a row if no key matches with just something
like "key_not_found": null
*3) Return Multiple Linked Documents Per View Row*
I use the linked documents feature all the time. Really helps me cut down
on the number of requests I make. But, I could even further cut down if I
was able to get multiple docs back per row if I passed couch and array of
ids I wanted with the row instead of just a single id.
So, using the example above in #2, lest say I had this view:
if doc.type == product && doc.status == enabled
emit (product._id, { name: doc.name, _id: doc.prices }
But, I also had a brand id stored, that I wanted to get in the same row...
lets say I just went like this:
if doc.type == product && doc.status == enabled
emit (product._id, { name: doc.name, docs: [{_id: doc.prices},{_id:
doc.brand}] }
couch would respond with
docs: [prices_doc, brands_doc] plus my name field from the product doc. I
could get most everything I want in one query.
I know I can call emit multiple times, but again this just
makes everything so much harder in my application because I don't know if
there is a brand for every product or not. It essentially forces me to
loop through the array multiple times. I could also do collation with
reduce, but then I consistently run into reduce overflow errors as this is
not what reduce is really designed to handle.
There my be some reason why this isn't possible, but I don't know it and I
KNOW it would be useful from a users perspective.
Combine this with the feature in #2 and I could get everything I want from
couch in 2 requests per page view that currently takes me 100 requests for
25 products.
*Final Thoughts*
Like I said in the beginning, I don't know if some of these are possible or
not, but I know that they would make my life as a user of CouchDB much more
relaxing. I would sincerely appreciate it if anyone could give feedback on
the possibility of each and what we have to do to get moving on these. I
am willing to put my cash up to anyone who can get these features included
in couch.
I appreciate everyone who did taking the time to read this long ass email.
I wanted to be clear. If anyone has any other suggestions, please feel
free to contact me.
Thanks,
James Hayton