couchdb-dev mailing list archives

On Thu, Jan 19, 2012 at 19:41, kzhang <kzhang@wayfair.com> wrote:
> Hi,
>
> If I have three nodes to set up a cluster, what difference does it make if
> 1) I set up three shards, with each of the three nodes being the primary of
> one shard, with replication running among them, so if one node goes down,
> the other two can fully take over; v.s. 2) if still the same three nodes, I
> set up six shards, still running replication among them.
>
> I know it sounds more overhead in 2). But is there any performance gain/loss
> associated with either approach?
The overhead should be minimal, especially with BigCouch, since it's
limited to the connection and request overhead. With BIgCouch, the
requests between servers are not HTTP, so there's no HTTP overhead
and I would bet the connections are persistent, so there isn't any
added overhead there. The extra overhead is additional resources for
the additional connections between servers, but that's negligible, and
making the individual shards' responses smaller might actually mean
memory usage is more stable since the JSON is split into more
manageable chunks for the serializer.
In general, over-sharding is a great tool and will make your life a
lot easier when you decide you need more than 3 nodes. Someone from
Cloudant can chime in if I'm wrong, but I wouldn't hesitate to set up
10x or 20x that many shards, depending on how quickly you expect to
scale out. As an added bonus it will increase parallelism in your view
index generation.
-Randall
>
> I guess I am not sure what is the standard/ideal config for No. of shards/
> No. of Nodes.
>
> --
> View this message in context: http://couchdb-development.1959287.n2.nabble.com/bigcouch-sharding-uestion-tp7206277p7206277.html
> Sent from the CouchDB Development mailing list archive at Nabble.com.