I’m not sure I have the best feelings about where this article is going so far? The beginning seems to have a bunch of mix-ups between UTF-32 and UTF-8? Also the claim that Linux is fully UCS-4 is false, as is Linux not being locale-dependent.

Perhaps the author has their own mental model of how unicode, translation formats and wide characters work, but the explanation here doesn’t lend me a ton of confidence.

That said, I totally agree with the advice so far about creating a width barrier at the edge of your app, and ensuring that you are consistent internally. This makes it easier to port code to systems like Windows.

Also the claim that Linux is fully UCS-4 is false, as is Linux not being locale-dependent.

If I write GNU/Linux, will it make you feel better? It is a Glibc fact, and other C libraries on Linux AFAIK share this. To be locale-independent with wide characters (though perhaps Han unification still makes it lossy).