(posting a few questions as separate topics I think that makes it easier to follow up to a specific topic, drop the posts if you want me to create a single topic)

What is the process of growing/shrinking a cluster.

I currently have a single node installation here’s what I’d like to do:

Grow to 2 nodes

Grow to 5 nodes

Shrink to 3 nodes
** I’m expecting data loss here since I’m not (yet!) running with p4 but only CE
** OR downtime with some kind of dump/reload cycle

My expectation would be that:

growing can be done online

shrinking is possible an operation where the cluster needs to be taken down
** possibly a dump/reload cycle with a maintenance window

there is some process that “rebalances” the arrays that are already in the cluster
** automatic rebalancing and/or manual rebalancing

there is some process that let’s me identify arrays that are “unbalanced” (lacking a better word).
** something that let’s me identify which arrays were created before growing the cluster
** something that let’s me identify arrays that were being loaded while I expanded the cluster (if that even applies)

At the moment (14.12, and probably into the middle of 2015) the only way to accomplish this is to unload(dump) your arrays to a file, create your new installation, and then reload the dumped arrays. Recovery provisioning–replacing a dead physical node–and resource provisioning–increasing the size of your cluster–is absolutely on the product road map. But we’re not there yet.

At the moment (14.12) we don’t have any facilities for adding a new node (we call 'em instance) to a running SciDB cluster (which we call an installation).

When you initialize a SciDB installation, SciDB’s internals write down the list of instances it’s being asked to spread the data over. When you add data to an array, the internals distribute it as evenly as it can over the instances. But there’s no mechanism (for now) to introduce a new instance into a running installation; the list of instances is fixed at initialization time and immutable. With the EE, we set things up so that you can lose one instance and keep reading data. But you can’t (yet) replace an instance if it’s dead.

So to bring additional compute resources to bear on your problem, or to replace a dead instance, you need to create a new installation (maybe with more instances). You need to ( a ) unload (save) data from the installation you’re replacing, ( b ) initialize the new installation, and ( c ) load the previously saved data. SciDB will internally ensure that the new installation will use all of the physical resources at it’s disposal to hold the data.