incubator-couchdb-user mailing list archives

On Thu, May 28, 2009 at 12:54:16PM -0700, Chris Anderson wrote:
> The deal is that if your reduce function's output is the same size as
> its input, the final reduce value will end up being as large as all
> the map rows put together.
>
> If your reduce function's output is 1/2 the size of it's input, you'll
> also end up with quite a large amount of data in the final reduce. In
> these cases each reduction stage actually accumulates more data, as it
> is based on ever increasing numbers of map rows.
>
> If the function reduces data fast enough, the intermediate reduction
> values will stay relatively constant, even as each reduce stage
> reflects logarithmically more map rows. This is the kind of reduce
> function you want.
So actually, the requirement is that the final (root) result of the reduce
process should be of a moderate size, and so should all the intermediate
reduce values which comprise it. That makes sense.
Depending on the type of reduce function you use, the "growth" may or may
not be related to the number of documents which have been reduced together
to form the reduce value.
For example: an object which returns a map of {tag: count}, where the number
of unique tags is bounded, may return a fairly large object when reduced
across a small number of docs, but the final root reduce is no larger. So
all you need to do is keep the number of tags into a 'reasonable' range
(e.g. tens rather than thousands)
Regards,
Brian.