ACME Updates

10mar2019 Unicode in JavaScript

I was doing some Unicode stuff in JavaScript today.
I needed to extract the code points from a string.
You might think that the way to get the code point
at a given position in a string is:

cp = str.codePointAt( i );

Hah hah, no.
That only works for points in the 16-bit Basic Multilingual Plane.
If you have characters outside there, such as many emoji, you get garbage.

To handle points past 16 bits you instead do like so:

chars = Array.from( str );
cp = chars[i].codePointAt( 0 );

Array.from() knows how to correctly split a string into individual code points.
Why does something as generically named as Array.from() have intimate
knowledge of Unicode?
¯\_(ツ)_/¯

And why does codePointAt() correctly handle high-plane code points
here but not before?
Maybe the single-character strings produced by Array.from() have
a different invisible encoding flag?
Again, ¯\_(ツ)_/¯

Anyway, that's how you extract Unicode code points in JavaScript.
Thank you for coming to my TED talk.