Appendix:Unicode normalization

Wikimedia, along with most servers on the internet, stores Unicode strings in the form called NFC or Normal Form (Canonical) Composition. This means that often several different unicode strings are mapped to the same canonical form.

In a number of common cases, Unicode's canonical ordering of two diacritics is counterintuitive, and/or interoperates poorly with certain existing software. In other, less common cases, the problem is that the diacritics should not have a canonical ordering, because the two orderings are not actually equivalent (that is, the two diacritics should have the same value for the Canonical_Combining_Class (ccc) property, but instead they have different ones). For example, Hebrew לִַ ("lai") is mistakenly normalized to לִַ ("lia").

As the conversion is automatic, there cannot exist pages for the non-NFC form. Attempting to explicitly link to the non-NFC form, Å, will always show a red-link but when clicked on will take the user to the NFC page Å.

One can display the non-NFC characters on a page using {{HTML char}} ({{HTML char|212B}} will show Å). To note canonical equivalence between two single characters, use {{normalization}} in the caption field of the appropriate {{character info}} template on the NFC character (see Å for an example). To note that the NCF of a precomposed character is a decomposition, use {{decomposed}} in the caption field of the appropriate {{character info}} template on the NFC decomposition (see क़ for an example).