incubator-couchdb-user mailing list archives

I think what many people really concerned is the growing pattern of size as
number of docs increase. (space complexity)
(If it grows exponentially then that's not a good sign.)
So is there any official/non-official, theoretically/benchmark showing this
characteristic?
2011/6/30 Paul Davis <paul.joseph.davis@gmail.com>
>
> Teslan,
>
> I'm not sure where you were getting the impression that Erlang was
> frugal with disk space. In general, its true that Erlang is pretty
> good at using a minimal amount of CPU/RAM resources while it runs,
> though as in all things, that usage will scale with load.
>
> As to disk usage, that's a direct trade off in the design of CouchDB.
> The append only b+tree is going to cause fragmentation in the database
> files. There are of course games we could play to minimize to a
> certain extent by doing things like log structured merge trees with
> more aggressive compaction but then the issue becomes that we end up
> requiring more active file descriptors per database which in turn
> hurts people that are hosting a large number of databases on a single
> node (think hosting, or db per user account).
>
> My guess that whoever it was on IRC was just speaking with conviction.
> We don't try and hide the fact that CouchDB uses quite a bit more
> space than people would expect at first by any means.
>
> As to the amount of space that can be cleaned up, it really depends on
> the specific load patterns and how aggressive people are at keeping
> the database files compacted. Obviously I could write a single
> document hundreds of thousands of times without compacting, and then
> compact and have a database that is a percent or less of the
> "uncompacted" size.
>
> I'm also not sure about why someone would say that a 2GiB database
> would struggle with less than 2GiB of RAM. RAM usage is more or less
> tied to the number of concurrent clients you have accessing the
> database and the amount and type of view generations you have running.
> Its not really tied to the physical size of the database as we don't
> hold caches to anything. There used to be a silly benchmark floating
> around that showed CouchDB handling a couple thousand requests for a
> small doc and it was only using 9M of RAM. Granted that's a super
> idealized case, but I'd just point out that it's more about access
> patterns rather than disk usage.
>
> As to the mobile stuff, my guess would probably be "don't store a lot
> of data on the device". AFAIK the story for mobile developers revolves
> quite a bit around the fact that replicating data in and out from The
> Cloud &trade; makes it super easy for them to have bits and pieces of
> a marge larger database.
>
> But in the end, the fact that CouchDB has a much larger disk usage
> size than some would expect is that's the trade off in the grand
> design. There are features we have like database snapshots, append
> only storage to simplify guarantees on consistency (also, hot backups)
> and hosting a large number of db's in a single Erlang VM that end up
> intersecting in such a way that the price we pay is using more bytes.
>
> Also, I'd like to recommend you keep an eye on development because
> this is an active area of optimization. Filipe has been doing awesome
> work integrating things like snappy compression and other things deep
> down at the storage layer to improve the situation. We may be frank in
> saying we use a non-trivial amount of extra space, but its not like
> we're not working on improving that situation. :D
>
> That ended up longer than expected. Let us know if you have any other
> questions.
>
--
- sleepnova