Monthly Archives: January 2013

Java has a build-in java.text.Normalizer class to transform Unicode text into an equivalent composed or decomposed form. Dafuq? The letter ‘Á’ can be represented in a composed form U+00C1 LATIN CAPITAL LETTER A WITH ACUTE and a decomposed form U+0041 LATIN CAPITAL LETTER A U+0301 COMBINING ACUTE ACCENT Normalizer handles this for your: import java.text.Normalizer; […]

UTF-8 has always been a multi-byte encoding but you probably had to handle only 2 byte (16bit) UTF-8 characters. With the raise of Emojis 4 byte characters rose as well so handling 4 byte UTF-8 characters is not only of interest for handling exotic languages but also for the needs of average users who want […]