Today, we are going to use the pre-splitting technique, this will allow us to create the chunks and well distribute them across the shards before beginning to insert data. Also, we will be able to decide the range of each chunk.

When is pre-splitting used

We use pre-splitting when we need to load a great amount of data in our collection and wish to avoid to the database the job of split and chunk moves (round balancing). Data will be straight inserted in the shard who owns the chunk referred to the shard key we are going to insert.

In the next example, in which we do not have used pre-splitting in it, we can note that MongoDB has distributed all chunks of the namespace “school.students” across the three shards (s0, s1 and s2) or the cluster:

Without our manual intervention is MongoDB who does the split, when the chunk exceeds its maximum value (64MB by default), who chooses what chunks must be moved and what will be its new shard.

The collection we are going to use is the same than before, “school.students”. First of all, we drop it (also its chunks) for beginning again. The “drop” command will remove both data and metadata (chunks). The “remove” command would remove only the data.

1

2

3

mongos>db.students.drop()

true

mongos>

We shard now the collection:

1

2

3

mongos>sh.shardCollection("school.students",{student_id:1})

{"collectionsharded":"school.students","ok":1}

mongos>

“sh.status()” command tells us that MongoDB has created one chunk. This gathers the whole range for the shard key (student_id). Also, we can check that this chunk owns to s0 shard, where database “school” belongs to.