I was curious to see if they did something special when they detect a page is using AMP (spoiler alert: they do not), so I quickly hacked together a fake AMP page that seemingly fulfilled their simple test.

<html⚡️><body>Fake AMP</body></html>

I am a big emoji fan, so instead of the <html amp> variant, I went for the <html ⚡> variant and entered the ⚡ via the macOS emoji picker. To my surprise, Facebook logged "FBNavAmpDetect: false". Huh 🤷‍♂️?

My first reaction was: <html ⚡️> does not quite look like what the founders of HTML had in mind, so maybe hasAttribute() is specified to return false when an attribute name is invalid. But what even is a valid attribute name? I consulted the HTML spec where it says (emphasis mine):

Attribute names must consist of one or more characters other than controls, U+0020 SPACE, U+0022 ("), U+0027 ('), U+003E (>), U+002F (/), U+003D (=), and noncharacters. In the HTML syntax, attribute names, even those for foreign elements, may be written with any mix of ASCII lower and ASCII upper alphas.

I was on company chat with Jake Archibald at that moment, so I confirmed my reading of the spec that ⚡ is not a valid attribute name. Turns out, it is a valid name, but the spec is formulated in an ambiguous way, so Jake filed "HTML syntax" attribute names. And my lead to a rational explanation was gone.

Luckily a valid AMP boilerplate example was just a quick Web search away, so I copy-pasted the code and Facebook, as expected, reported "FBNavAmpDetect: true". I reduced the AMP boilerplate example until it looked like my fake AMP page, but still Facebook detected the modified boilerplate as AMP, but did not detect mine as AMP. Essentially my experiment looked like the below code sample. Perfect Heisenbug?

An invisible code point which specifies that the preceding character should be displayed with emoji presentation. Only required if the preceding character defaults to text presentation.

You may have seen this in effect with the Unicode snowman that appears in a textual ☃︎ as well as in an emoji representation ☃️ (depending on the device you read this on, they may both look the same). As far as I can tell, Chrome DevTools prefers to always render the textual variant, as you can see in the screenshot above. But with the help of the length() and the charCodeAt() functions, the difference gets visible.

The macOS emoji picker creates the variant ⚡️, which includes the Variation Selector-16, but AMP requires the variant without, which I have also confirmed in the validator code. You can see in the screenshot below how the AMP Validator rejects one of the two High Voltage symbols.

I have filed crbug.com/1033453 against the Chrome DevTools asking for rendering the characters differently, depending on whether the Variation Selector-16 is present or not. Further, I have opened a feature request on the AMP Project repo demanding that AMP should respect ⚡️ apart from ⚡. Same same, but different.

Webmentions

14 Replies

Note that macOS is disagreeing with UTS51 here: “only fully-qualified emoji zwj sequences should be generated by keyboards and other user input devices.” unicode.org/reports/tr51/#… (Time to change the “should” to “must”?)

The variation selectors are part of a family of invisible unicode codepoints that aren't whitespace or control codes. They're classified as "default ignorable". It might be good to show them all as red dots? unicode.org/Public/12.1.0/…

This was my initial misreading as well, but as @jaffathecake wrote in github.com/whatwg/html/is…, it should probably say the following instead: "If the document is an HTML document, attribute names are considered equivalent if they are an ASCII case-insensitive match".

Nice read! But I don't get the "Turns out, it is a valid name..." bit. "Any mix of ASCII lower and ASCII upper alphas" means that 122 (z) is the highest decimal code point allowed. Which makes 9889 (⚡) invalid, right?