Navicat Blog

Specifying Collation in MongoDB (Part 1)

Oct 3, 2018 by Robert Gravelle

Collation involves a set of language-specific rules for string comparison, such as those for lettercase and accent marks. Your run of the mill sorting is fine for simple entries made up of alphanumeric characters, but once you include special characters, such as @, #, $, % (etc) and è, é, ê, ö (etc, etc), it becomes imperative that you specify your own collation.

MongoDB added collation support in version 3.4, so that you can specify collation for a collection or a view, an index, or certain operations that support collation, such as find() and aggregate().

Today's blog will provide a brief introduction to the concept of collation, cover the fields that govern collation in MongoDB, as well as how to specify collation in MongoDB using the Navicat for MongoDB GUI administration and development tool. Moreover, we'll get into the specifics of the first three fields today, while the rest will be described in part 2.

Collation Document Fields

To use collation options other than the default, you can specify a Collation Document. It's made up of the following fields:

You can see the same fields represented in Navicat on the Collation tab:

Of all the above fields, only the locale field is mandatory; all of the other collation fields are optional.

Now let's take a closer look at each field and get a better idea what values are permissible to each:

Locale:

A locale identifies a specific user community, i.e, a group of individuals who share a similar culture and language idioms. In practice, a community is the intersection of all people speaking the same language and living in the same country. For example, the French locale for France is distinct from the French locale of Canada. Therefore, "fr" is the locale code for France French, while "fr_CA" adds the 2 character Country code for Canada. While the two locales will have many similarities, there will be some differences, such as currency, which is the Euro (€) in France and the Dollar ($) in Canada.

As you might imagine, there are numerous locales. The Locale dropdown contains many of the more common ones. The first item in the list, "simple", specifies a simple binary comparison. You can also enter your own in the textbox portion of the dropdown.

Sorting Differences Between Languages

With regards to sorting, every language has its own sort order, and sometimes even multiple sort orders. Here's how the same names would be sorted under different locales:

English (en): bailey, boffey, böhm, brown

German (de_DE): bailey, boffey, böhm, brown

German phonebook (de-DE_phonebook): bailey, böhm, boffey, brown

Swedish (sv_SE): bailey, boffey, brown, böhm

Case Level:

A flag that determines whether to include case comparison.

If "on", include case comparison.

If "off", do not include case comparison.

Case First:

A field that determines sort order of case differences. Values include:

"upper": Uppercase sorts before lowercase.

"lower": Lowercase sorts before uppercase.

"off": Default value. Similar to "lower", but with slight differences.

Conclusion

In today's blog, we were introduced to the concept of collation, covered the fields that govern collation in MongoDB, and learned how to specify collation for MongoDB using Navicat for MongoDB. Having familiarized ourselves with the first three Collation Document fields, we'll move on to the last five fields in part 2.