Tags with Accents are Incorrectly Sorted in Cloud

Description

By accents, I mean letters like "é", "ü", "ç", "î", etc.

Currently, in a tag cloud, the word "égypte" gets sorted AFTER the word "zimbabwe". This is incorrect alphabetical sorting. A correct sorting would consider "é" as equivalent to "e", and place "égypte" BEFORE the word "zimbabwe".

"Differences between computer numeric sorting and alphabetic sorting occur in Danish and Norwegian (aa is ordered at the end of the alphabet when it is pronounced like å, and at the start of the alphabet when it is pronounced like a), German (ß is ordered as s + s; ä, ö, ü are ordered as a + e, o + e, u + e in phone books, but as o elsewhere, and behind o in Austria), Icelandic (ð follows d), Dutch (ij is sometimes ordered as y; see IJ: Collation), English (æ is ordered as a + e), and many other languages."

Imho, the only reasonable approach is to leave this up to PHP and MySQL respectively, and file bugs on their end when natsort and order by clauses don't work as expected on servers with a properly configured locale. There's no way we'll get to function as end users would expect in WP without excessive amounts of number crunching.

So, couldn't WordPress read that WPLANG constant, and modify the sorting accordingly ? This is beyond my competence, but I found a PHP code that seems to do that, using the "Collator" class :

Sadly not. Setting locale at the PHP level is not thread safe in PHP at the time of writing this. So it's generally not acceptable for hosts to allow end users to tweat the setting. In other words, either we implement UTF-8 collation rules in WP (which is insanely complex, since it depends on heaps of things) or we've to deal with things as they're returned by strnatcasecmp(), which is acceptable for English but, as you point out, has a few issues with other locale.

I'm attaching a few PostgreSQL functions in case there's interest in trying to work around this at the MySQL level.