Description

I have a collection of 100k+ documents where about 7% are missing the field used for a hashed shard key. While you cannot shard a collection where some documents are missing the shard key field, you can if you used a hashed shard key.

Prior to sharding, queries on the field (using

{CITY: null}

) succeed, after sharding they appear to fail. The query is directed to the shards and they appear to process it, but mongos does not return any documents. Only happens if the field with null values is the shard key.

Have reproduced with mongo shell and pymongo, have not narrowed down enough to write JS test case.

Activity

This is not code I'm super familiar with, but it looks like running the shardCollection command on mongos sends out checkShardingIndex commands to the mongods. And checkShardingIndex checks for index keys where the shard key is null, indicating that a shard key field may be absent from a document, from CheckShardingIndex::run():

if ( currKeyElt.type() && currKeyElt.type() != jstNULL )

continue;

Hash indexes don't store a key of null for a missing field, but instead they store the hash of null. Missing values cannot be identified by the presence of null index keys in the current hash index implementation.

Aaron Staple (Inactive)
added a comment - Mar 02 2013 05:43:51 AM +00:00 - edited This is not code I'm super familiar with, but it looks like running the shardCollection command on mongos sends out checkShardingIndex commands to the mongods. And checkShardingIndex checks for index keys where the shard key is null, indicating that a shard key field may be absent from a document, from CheckShardingIndex::run():
if ( currKeyElt.type() && currKeyElt.type() != jstNULL )
continue;
Hash indexes don't store a key of null for a missing field, but instead they store the hash of null. Missing values cannot be identified by the presence of null index keys in the current hash index implementation.