# ''XML is a real opportunity.'' In the days of Appendix C, XML was not a real option. Whereas today it ''can'' be a serious option, since all the common user agents support it. The ability to try it out live, will help authors keep the XML real.

# ''XML is a real opportunity.'' In the days of Appendix C, XML was not a real option. Whereas today it ''can'' be a serious option, since all the common user agents support it. The ability to try it out live, will help authors keep the XML real.

# ''The normativity problem is different.'' Some of the problems of Appendix C, were related to [[#normative|normativity]]. That Appendix C was not normative, while XHTML 1.0 as such was, was probably part of the problem. The HTML language was updated via XHTML. And thus there was a desire to use the XHTML syntax. With Polyglot Markup, there is no need to “update” HTML via a ”foreign” specification from the “Land of XML”. HTML5 is already defined, and it has been defined on HTML’s own premises. Hence, the motivation behind the desire to use HTML-compatible XHTML is not the same or as wide as it perhaps became when XHTML 1.0 was introduced. HTML5 already defines the two serializations and includes syntactic details that are meant to help moving between XML and HTML. Thus, for Polyglot Markup, it was relatively simple to define the exact syntax whereas Appendix C underestimated the problems due to lack of parser specification.

# ''The normativity problem is different.'' Some of the problems of Appendix C, were related to [[#normative|normativity]]. That Appendix C was not normative, while XHTML 1.0 as such was, was probably part of the problem. The HTML language was updated via XHTML. And thus there was a desire to use the XHTML syntax. With Polyglot Markup, there is no need to “update” HTML via a ”foreign” specification from the “Land of XML”. HTML5 is already defined, and it has been defined on HTML’s own premises. Hence, the motivation behind the desire to use HTML-compatible XHTML is not the same or as wide as it perhaps became when XHTML 1.0 was introduced. HTML5 already defines the two serializations and includes syntactic details that are meant to help moving between XML and HTML. Thus, for Polyglot Markup, it was relatively simple to define the exact syntax whereas Appendix C underestimated the problems due to lack of parser specification.

−

# ''The situation is different.'' When we consider HTML4’s more arcane, SGML-inherited rules, then XHTML 1.0 was in many ways a simplification. And some seeds to Polyglot Markup can are also present in Appendix C. But Appendix C did for whatever reasons not reach far enough. For instance, Appendix C [http://www.w3.org/TR/xhtml1/#C_1 section 1] and [http://www.w3.org/TR/xhtml1/#C_9 section 9] could in in theory have lead up to the same, simple decision with regard to character encoding as Polyglot Markup has made, but didn't. But XHTML 1.0 also didn't form the same good basis for a polyglot spec as HTML5 does when it comes to the same task. Simply put, XHTML 1.0 did not make as many brave, thoughtful and tested decisions as HTML5 has made — for instance, Appendix C did not forbid the XML encoding declaration. Or HTML5’s rules about how only the only value permitted for <code>&lt;meta charset="FOO"/></code> when used in XHTML5, is <code>UTF-8</code>. As result, it may have been difficult to understand what Appendix C said. And, as result, there were authors who did not get whether XHTML 1.0 or its Appendix C and its relationship to <code>text/html</code> right. In turn, this lead to much irony about XHTML itself. However, we believe that, with HTML5, then HTML and XHTML are well understood as different languages. And this is in fact also reflected in the spec title - polyglot markup.

+

# ''The situation is different.'' When we consider HTML4’s more arcane, SGML-inherited rules, then XHTML 1.0 was in many ways a simplification. And some seeds to Polyglot Markup can are also present in Appendix C. But Appendix C did for whatever reasons not reach far enough. For instance, Appendix C [http://www.w3.org/TR/xhtml1/#C_1 section 1] and [http://www.w3.org/TR/xhtml1/#C_9 section 9] could in in theory have lead up to the same, simple decision with regard to character encoding as Polyglot Markup has made, but didn't. But XHTML 1.0 also didn't form the same good basis for a polyglot spec as HTML5 does when it comes to the same task. Simply put, XHTML 1.0 did not make as many brave, thoughtful and tested decisions as HTML5 has made — for instance, Appendix C did not forbid the XML encoding declaration. Nor did it have HTML5 rule that limites the value of <code>&lt;meta charset="FOO"/></code> to <code>utf-8</code> when used in an XHTML5 file. However, with HTML5, then HTML and XHTML are well understood as different languages. And this is in fact also reflected in the spec title - Polyglot Markup.

−

# ''The scope is different.'' Polyglot Markup is also ''robust'' markup. An important aspect of Polyglot Markup is that it it is about more than about being polyglot — two languages in one. Polyglot Markup is could be said to be about discovering the beautiful and safe best-practise language ''within'' HTML — the subset that ''only'' supports the best practises that we want authors to use. Such as external scripts, external stylesheets, UTF-8, non-valid XML (only well-formed XML), no-quirks mode et cetera. This subset is partly a natural common denominator of XHTML5 and HTML5, and partly a “man made” subset.

+

# ''The scope is different.'' Polyglot Markup is also ''robust'' markup. Polyglot Markup is also about the sideeffect of being polyglot. Polyglot Markup is could be said to be about discovering the beautiful and safe best-practise language ''within'' HTML — the shared subset of XHTML and HTML that promotes the best practises that we want authors to use. Such as external scripts, external stylesheets, UTF-8, non-valid XML (only well-formed XML), no-quirks mode et cetera. This subset is partly a natural common denominator of XHTML5 and HTML5, and partly a “man made” subset.

Contents

Why recommendation

On the topic of making Polyglot Markup a recommendation:

While we might not recommend that all authors (or in fact that many authors) should create Polyglot documents, we should Recommend that when authors want to create polyglot markup they do so by following the authoring requirements outlined in this specification. In fact, the introduction of the Polyglot spec does state that

All web content need not be authored in polyglot markup."

The Polyglot spec does include normative language that can be followed precisely and a document can be measured objectively against those requirements to see if it successfully adheres to the polyglot markup rules. It is possible to build a validator that checks a document to see if it is a valid polyglot document according to the polyglot spec and to identify the normative requirements that have been violated should the validation fail.

There is precedent for guidance documents including authoring guidance to be published as Recommendations. For example:

Why normative language

On the topic of using normative language in the specification:

The purpose for using normative language in the specification is to make it clear which parts are necessary to conform to the specification and which parts provide advisory or informative content. The polyglot spec is intended to make it possible to objectively determine if a document adheres to its requirements and therefore it is appropriate to differentiate between normative and informative parts.

One lesson from XHTML 1.0’s Appendix C is that vagueness and lack of normative status hurts: While XHTML 1.0 section 5 said that Appendix C was normative, Appendix C itself was only informative — which was confusing. That, combined with vague and somewhate convoluted rules that appeared quite permissive, as well as perhaps a lack of clear and well motivated principles behind the Appendix C-rules, lead to confusion and misunderstanding. To avoid repeat, clear, normatives and conformance-checable rules are needed.

Authors are going to use XHTML5 syntax for text/html, hence it is of benefit that there is a normative spec for how to do it.

Why good value

On the topic of the value of promoting polyglot markup:

Subsetting is a well known method for emphasizing on, and benefitting from, the good parts of a computer language.Example: From the author of “JavaScript. The Good Parts.” comes as well ADsafe, a Safe for Advertising subset of JavaScript that intends to remove its security risks via restrictions on the permitted syntax. Polyglot Markup can be viewed from a similiar angle.

Conservative markup may help authors as well as the language itself. Currently, markup best-practises are defined outside the HTMLwg. Example: The above mentioned ADsafe JavaScript profile has rules not only for JavaScript but for HTML as well, including a restriction to use UTF-8, which happens to be a Polyglot Markup requirement as well. ADsafe also forbids document.write — which is also not used by Polyglot Markup. Not only does this example show that there can be real value in a conservative spec, but it also shows that there is a market for such spec, for which the HTMLwg should offer real value. And — by the way – the effect of this does not need to be that XML gets more attetion — it could just as well lead to an attetion to the secure subset of the text/html serialization.

One syntax, two serialization is a feature. A tool vendor could serve two usergroups via one syntax. And this, in turn, has the potential of simplfying the tool for its users, as it would the vendor to skip poking the user to make choices about character encoding or markup format.

It keeps the XHTML simple. While Polyglot Markup also adds requirements (like <!DOCTYPE html>) to XHTML5, overall, the HTML-compatibility requirements holds XHTML in the ears and keeps it simple. E.g. it forbids non-UTF-8, it forbids the XML declaration and so on. And best of all: This is not an artificial extra but an effect of HTML5’s design — Polyglot Markup is merly picking the fruit.

It adds pedagogical value. While not being something that itself makes it worth sendting Polyglot Markup for Recommendation, the single syntax highlights, in a pedagogical way, how HTML5 itself is designed to be XML-compatible and often permits the XML syntax within HTML documents and, as well, the differences between the HTML DOM and the XML DOM.

Why no C risk

On the topic of Appendix C

While some sees the risk that Polyglot Markup on the Recommendation track would make it the new Appendix C, the Web in 2012 differs alot from the Web in 1999.

C-mantics vs semantics: Polyglot Markup “safes” against semantic loss accross various parsers (including XML vs HTML) and, by removing many choices, it might allow those authors to whom it matters more focus on content than on code. But it does of course not directly affect semantics. When XHTML 1.0 came along, then all it offered was a XML version of HTML4. Thus much attention perhaps naturally was drawn to its syntax. By contrast, Polyglot Markup builds directly on HTML5, which defines a shared vocabulary for both XML and HTML.

There might be little consensus around polyglot. But on the flip side, the sceptical attention ought to also help prevent history repeating.

XML is a real opportunity. In the days of Appendix C, XML was not a real option. Whereas today it can be a serious option, since all the common user agents support it. The ability to try it out live, will help authors keep the XML real.

The normativity problem is different. Some of the problems of Appendix C, were related to normativity. That Appendix C was not normative, while XHTML 1.0 as such was, was probably part of the problem. The HTML language was updated via XHTML. And thus there was a desire to use the XHTML syntax. With Polyglot Markup, there is no need to “update” HTML via a ”foreign” specification from the “Land of XML”. HTML5 is already defined, and it has been defined on HTML’s own premises. Hence, the motivation behind the desire to use HTML-compatible XHTML is not the same or as wide as it perhaps became when XHTML 1.0 was introduced. HTML5 already defines the two serializations and includes syntactic details that are meant to help moving between XML and HTML. Thus, for Polyglot Markup, it was relatively simple to define the exact syntax whereas Appendix C underestimated the problems due to lack of parser specification.

The situation is different. When we consider HTML4’s more arcane, SGML-inherited rules, then XHTML 1.0 was in many ways a simplification. And some seeds to Polyglot Markup can are also present in Appendix C. But Appendix C did for whatever reasons not reach far enough. For instance, Appendix C section 1 and section 9 could in in theory have lead up to the same, simple decision with regard to character encoding as Polyglot Markup has made, but didn't. But XHTML 1.0 also didn't form the same good basis for a polyglot spec as HTML5 does when it comes to the same task. Simply put, XHTML 1.0 did not make as many brave, thoughtful and tested decisions as HTML5 has made — for instance, Appendix C did not forbid the XML encoding declaration. Nor did it have HTML5 rule that limites the value of <meta charset="FOO"/> to utf-8 when used in an XHTML5 file. However, with HTML5, then HTML and XHTML are well understood as different languages. And this is in fact also reflected in the spec title - Polyglot Markup.

The scope is different. Polyglot Markup is also robust markup. Polyglot Markup is also about the sideeffect of being polyglot. Polyglot Markup is could be said to be about discovering the beautiful and safe best-practise language within HTML — the shared subset of XHTML and HTML that promotes the best practises that we want authors to use. Such as external scripts, external stylesheets, UTF-8, non-valid XML (only well-formed XML), no-quirks mode et cetera. This subset is partly a natural common denominator of XHTML5 and HTML5, and partly a “man made” subset.