Update of /sources/public/html5/spec
In directory hutz:/tmp/cvs-serv2938
Modified Files:
Overview.html
Log Message:
A general editorial cleanup, primarily around how Unicode characters are presented. (whatwg r4261)
Index: Overview.html
===================================================================
RCS file: /sources/public/html5/spec/Overview.html,v
retrieving revision 1.3403
retrieving revision 1.3404
diff -u -d -r1.3403 -r1.3404
--- Overview.html 21 Oct 2009 11:46:12 -0000 1.3403
+++ Overview.html 21 Oct 2009 11:59:28 -0000 1.3404
@@ -4244,9 +4244,9 @@
<!-- http://www.hixie.ch/tests/adhoc/html/navigation/javascript-url/ -->
- <!-- XXX this should be tested in the case of a browsing context
- that was navigated to about:blank after having been elsewhere,
- as opposed to the about:blank used at the time of the browsing
+ <!-- this should be tested in the case of a browsing context that
+ was navigated to about:blank after having been elsewhere, as
+ opposed to the about:blank used at the time of the browsing
context's creation. -->
<p>If <var title="">fallback base url</var> is
@@ -45377,9 +45377,9 @@
"NETWORK" followed by a U+003A COLON character (:)), then set <var title="">mode</var> to "online whitelist" and jump back to the step
labeled "start of line".</li>
- <li><p>If <var title="">line</var> ends with a U+003A COLON (:)
- character, then set <var title="">mode</var> to "unknown" and jump
- back to the step labeled "start of line".</li>
+ <li><p>If <var title="">line</var> ends with a U+003A COLON
+ character (:), then set <var title="">mode</var> to "unknown" and
+ jump back to the step labeled "start of line".</li>
<li><p>This is either a data line or it is syntactically
incorrect.</li>
@@ -53511,14 +53511,14 @@
incompatible with some specifications. Including the DOCTYPE in a
document ensures that the browser makes a best-effort attempt at
following the relevant specifications.<p>A DOCTYPE must consist of the following characters, in this
- order:<ol class="brief"><li>A U+003C LESS-THAN SIGN (<code>&lt;</code>) character.</li>
- <li>A U+0021 EXCLAMATION MARK (<code>!</code>) character.</li>
+ order:<ol class="brief"><li>A U+003C LESS-THAN SIGN character (&lt;).</li>
+ <li>A U+0021 EXCLAMATION MARK character (!).</li>
<li>A string that is an <a href="#ascii-case-insensitive">ASCII case-insensitive</a> match for the string "<code title="">DOCTYPE</code>".</li>
<li>One or more <a href="#space-character" title="space character">space characters</a>.</li>
<li>A string that is an <a href="#ascii-case-insensitive">ASCII case-insensitive</a> match for the string "<code title="">HTML</code>".</li>
<li>Optionally, a <a href="#doctype-legacy-string">DOCTYPE legacy string</a> (defined below).</li>
<li>Zero or more <a href="#space-character" title="space character">space characters</a>.</li>
- <li>A U+003E GREATER-THAN SIGN (<code>&gt;</code>) character.</li>
+ <li>A U+003E GREATER-THAN SIGN character (&gt;).</li>
</ol><p class="note">In other words, <code>&lt;!DOCTYPE HTML&gt;</code>,
case-insensitively.<p>For the purposes of HTML generators that cannot output HTML
markup with the short DOCTYPE "<code title="">&lt;!DOCTYPE
@@ -53597,9 +53597,7 @@
end tag, no content can be put between the start tag and the end
tag). <a href="#foreign-elements">Foreign elements</a> whose start tag is <em>not</em>
marked as self-closing can have <a href="#syntax-text" title="syntax-text">text</a>, <a href="#syntax-charref" title="syntax-charref">character references</a>, <a href="#syntax-cdata" title="syntax-cdata">CDATA sections</a>, other <a href="#syntax-elements" title="syntax-elements">elements</a>, and <a href="#syntax-comments" title="syntax-comments">comments</a>, but the text must not
- contain the character U+003C LESS-THAN SIGN (<code>&lt;</code>) or
- an <a href="#syntax-ambiguous-ampersand" title="syntax-ambiguous-ampersand">ambiguous
- ampersand</a>.<div class="note">
+ contain the character U+003C LESS-THAN SIGN (&lt;) or an <a href="#syntax-ambiguous-ampersand" title="syntax-ambiguous-ampersand">ambiguous ampersand</a>.<div class="note">
<p>The HTML syntax does not support namespace
declarations, even in <a href="#foreign-elements">foreign elements</a>.</p>
@@ -53622,9 +53620,8 @@
specification does not define any elements called "<code title="">cdr:license</code>" in the SVG namespace.</p>
</div><p><a href="#normal-elements">Normal elements</a> can have <a href="#syntax-text" title="syntax-text">text</a>, <a href="#syntax-charref" title="syntax-charref">character references</a>, other <a href="#syntax-elements" title="syntax-elements">elements</a>, and <a href="#syntax-comments" title="syntax-comments">comments</a>, but the text must not
- contain the character U+003C LESS-THAN SIGN (<code>&lt;</code>) or
- an <a href="#syntax-ambiguous-ampersand" title="syntax-ambiguous-ampersand">ambiguous
- ampersand</a>. Some <a href="#normal-elements">normal elements</a> also have <a href="#element-restrictions">yet more restrictions</a> on what
+ contain the character U+003C LESS-THAN SIGN (&lt;) or an <a href="#syntax-ambiguous-ampersand" title="syntax-ambiguous-ampersand">ambiguous ampersand</a>. Some
+ <a href="#normal-elements">normal elements</a> also have <a href="#element-restrictions">yet more restrictions</a> on what
content they are allowed to hold, beyond the restrictions imposed by
the content model and those described in this paragraph. Those
restrictions are described below.<p>Tags contain a <dfn id="syntax-tag-name" title="syntax-tag-name">tag name</dfn>,
@@ -53637,7 +53634,7 @@
letters that, when converted to all-lowercase, matches the element's
tag name; tag names are case-insensitive.<h5 id="start-tags"><span class="secno">9.1.2.1 </span>Start tags</h5><p><dfn id="syntax-start-tag" title="syntax-start-tag">Start tags</dfn> must have the
following format:<ol><li>The first character of a start tag must be a U+003C LESS-THAN
- SIGN (<code>&lt;</code>).</li>
+ SIGN character (&lt;).</li>
<li>The next few characters of a start tag must be the element's
<a href="#syntax-tag-name" title="syntax-tag-name">tag name</a>.</li>
@@ -53655,20 +53652,20 @@
<li>Then, if the element is one of the <a href="#void-elements">void elements</a>,
or if the element is a <a href="#foreign-elements" title="foreign elements">foreign
- element</a>, then there may be a single U+002F SOLIDUS
- (<code>/</code>) character. This character has no effect on
- <a href="#void-elements">void elements</a>, but on <a href="#foreign-elements">foreign elements</a> it
- marks the start tag as self-closing.</li>
+ element</a>, then there may be a single U+002F SOLIDUS character
+ (/). This character has no effect on <a href="#void-elements">void elements</a>,
+ but on <a href="#foreign-elements">foreign elements</a> it marks the start tag as
+ self-closing.</li>
<li>Finally, start tags must be closed by a U+003E GREATER-THAN
- SIGN (<code>&gt;</code>) character.</li>
+ SIGN character (&gt;).</li>
</ol><h5 id="end-tags"><span class="secno">9.1.2.2 </span>End tags</h5><p><dfn id="syntax-end-tag" title="syntax-end-tag">End tags</dfn> must have the
following format:<ol><li>The first character of an end tag must be a U+003C LESS-THAN
- SIGN (<code>&lt;</code>).</li>
+ SIGN character (&lt;).</li>
<li>The second character of an end tag must be a U+002F SOLIDUS
- (<code>/</code>).</li>
+ character (/).</li>
<li>The next few characters of an end tag must be the element's
<a href="#syntax-tag-name" title="syntax-tag-name">tag name</a>.</li>
@@ -53676,8 +53673,8 @@
<li>After the tag name, there may be one or more <a href="#space-character" title="space
character">space characters</a>.</li>
- <li>Finally, end tags must be closed by a U+003E GREATER-THAN
- SIGN (<code>&gt;</code>) character.</li>
+ <li>Finally, end tags must be closed by a U+003E GREATER-THAN SIGN
+ character (&gt;).</li>
</ol><h5 id="attributes"><span class="secno">9.1.2.3 </span>Attributes</h5><p><dfn id="syntax-attributes" title="syntax-attributes">Attributes</dfn> for an element
are expressed inside the element's start tag.<p>Attributes have a name and a value. <dfn id="syntax-attribute-name" title="syntax-attribute-name">Attribute names</dfn> must consist of
@@ -53724,12 +53721,11 @@
character">space characters</a>, followed by the <a href="#syntax-attribute-value" title="syntax-attribute-value">attribute value</a>, which, in
addition to the requirements given above for attribute values,
must not contain any literal <a href="#space-character" title="space character">space
- characters</a>, any U+0022 QUOTATION MARK (<code>"</code>)
- characters, U+0027 APOSTROPHE (<code>'</code>) characters,
- U+003D EQUALS SIGN (<code>=</code>) characters, U+003C LESS-THAN
- SIGN (<code>&lt;</code>) characters, U+003E GREATER-THAN SIGN
- (<code>&gt;</code>) characters, or U+0060 GRAVE ACCENT (`)
- characters, and must not be the empty string.</p>
+ characters</a>, any U+0022 QUOTATION MARK characters ("),
+ U+0027 APOSTROPHE characters ('), U+003D EQUALS SIGN
+ characters (=), U+003C LESS-THAN SIGN characters (&lt;), U+003E
+ GREATER-THAN SIGN characters (&gt;), or U+0060 GRAVE ACCENT
+ characters (`), and must not be the empty string.</p>
<!-- The ` character is in this list on a temporary basis, waiting
for IE to fix it's parsing bug whereby it treats ` as an
@@ -53786,11 +53782,11 @@
characters</a>, followed by a single U+003D EQUALS SIGN
character, followed by zero or more <a href="#space-character" title="space
character">space characters</a>, followed by a single U+0027
- APOSTROPHE (<code>'</code>) character, followed by the <a href="#syntax-attribute-value" title="syntax-attribute-value">attribute value</a>, which, in
+ APOSTROPHE character ('), followed by the <a href="#syntax-attribute-value" title="syntax-attribute-value">attribute value</a>, which, in
addition to the requirements given above for attribute values,
- must not contain any literal U+0027 APOSTROPHE (<code>'</code>)
- characters, and finally followed by a second single U+0027
- APOSTROPHE (<code>'</code>) character.</p>
+ must not contain any literal U+0027 APOSTROPHE characters ('), and
+ finally followed by a second single U+0027 APOSTROPHE character
+ (').</p>
<div class="example">
@@ -53816,11 +53812,11 @@
characters</a>, followed by a single U+003D EQUALS SIGN
character, followed by zero or more <a href="#space-character" title="space
character">space characters</a>, followed by a single U+0022
- QUOTATION MARK (<code>"</code>) character, followed by the <a href="#syntax-attribute-value" title="syntax-attribute-value">attribute value</a>, which, in
+ QUOTATION MARK character ("), followed by the <a href="#syntax-attribute-value" title="syntax-attribute-value">attribute value</a>, which, in
addition to the requirements given above for attribute values,
- must not contain any literal U+0022 QUOTATION MARK
- (<code>"</code>) characters, and finally followed by a second
- single U+0022 QUOTATION MARK (<code>"</code>) character.</p>
+ must not contain any literal U+0022 QUOTATION MARK characters ("),
+ and finally followed by a second single U+0022 QUOTATION MARK
+ character (").</p>
<div class="example">
@@ -53993,9 +53989,9 @@
LINE FEED (LF) characters, or pairs of U+000D CARRIAGE RETURN (CR),
U+000A LINE FEED (LF) characters in that order.<h4 id="character-references"><span class="secno">9.1.4 </span>Character references</h4><p>In certain cases described in other sections, <a href="#syntax-text" title="syntax-text">text</a> may be mixed with <dfn id="syntax-charref" title="syntax-charref">character references</dfn>. These can be used
to escape characters that couldn't otherwise legally be included in
- <a href="#syntax-text" title="syntax-text">text</a>.<p>Character references must start with a U+0026 AMPERSAND
- (<code>&amp;</code>). Following this, there are three possible kinds
- of character references:<dl><dt>Named character references</dt>
+ <a href="#syntax-text" title="syntax-text">text</a>.<p>Character references must start with a U+0026 AMPERSAND character
+ (&amp;). Following this, there are three possible kinds of character
+ references:<dl><dt>Named character references</dt>
<dd>The ampersand must be followed by one of the names given in the
<a href="#named-character-references">named character references</a> section, using the same
@@ -54006,22 +54002,22 @@
<dt>Decimal numeric character reference</dt>
<dd>The ampersand must be followed by a U+0023 NUMBER SIGN
- (<code>#</code>) character, followed by one or more digits in the
- range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), representing
- a base-ten integer that corresponds to a Unicode code point that is
- allowed according to the definition below. The digits must then be
- followed by a U+003B SEMICOLON character (;).</dd>
+ character (#), followed by one or more digits in the range U+0030
+ DIGIT ZERO (0) to U+0039 DIGIT NINE (9), representing a base-ten
+ integer that corresponds to a Unicode code point that is allowed
+ according to the definition below. The digits must then be followed
+ by a U+003B SEMICOLON character (;).</dd>
<dt>Hexadecimal numeric character reference</dt>
<dd>The ampersand must be followed by a U+0023 NUMBER SIGN
- (<code>#</code>) character, which must be followed by either a
- U+0078 LATIN SMALL LETTER X character (x) or a U+0058 LATIN CAPITAL
- LETTER X character (X), which must then be followed by one or more
- digits in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9),
- U+0061 LATIN SMALL LETTER A to U+0066 LATIN SMALL LETTER F, and
- U+0041 LATIN CAPITAL LETTER A to U+0046 LATIN CAPITAL LETTER F,
+ character (#), which must be followed by either a U+0078 LATIN
+ SMALL LETTER X character (x) or a U+0058 LATIN CAPITAL LETTER X
+ character (X), which must then be followed by one or more digits in
+ the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), U+0061
+ LATIN SMALL LETTER A to U+0066 LATIN SMALL LETTER F, and U+0041
+ LATIN CAPITAL LETTER A to U+0046 LATIN CAPITAL LETTER F,
representing a base-sixteen integer that corresponds to a Unicode
code point that is allowed according to the definition below. The
digits must then be followed by a U+003B SEMICOLON character
@@ -54035,8 +54031,7 @@
ampersand</dfn> is a U+0026 AMPERSAND character (&amp;) that is
followed by some <a href="#syntax-text" title="syntax-text">text</a> other than a
<a href="#space-character">space character</a>, a U+003C LESS-THAN SIGN character
- (&lt;), or another U+0026 AMPERSAND character
- (<code>&amp;</code>).<h4 id="cdata-sections"><span class="secno">9.1.5 </span>CDATA sections</h4><p><dfn id="syntax-cdata" title="syntax-cdata">CDATA sections</dfn> must start with
+ (&lt;), or another U+0026 AMPERSAND character (&amp;).<h4 id="cdata-sections"><span class="secno">9.1.5 </span>CDATA sections</h4><p><dfn id="syntax-cdata" title="syntax-cdata">CDATA sections</dfn> must start with
the character sequence U+003C LESS-THAN SIGN, U+0021 EXCLAMATION
MARK, U+005B LEFT SQUARE BRACKET, U+0043 LATIN CAPITAL LETTER C,
U+0044 LATIN CAPITAL LETTER D, U+0041 LATIN CAPITAL LETTER A, U+0054
@@ -54053,11 +54048,11 @@
MARK, U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS (<code title="">&lt;!--</code>). Following this sequence, the comment may
have <a href="#syntax-text" title="syntax-text">text</a>, with the additional
restriction that the text must not start with a single U+003E
- GREATER-THAN SIGN (&gt;) character, nor start with a U+002D
- HYPHEN-MINUS character (-) followed by a
- U+003E GREATER-THAN SIGN (&gt;) character, nor contain two
- consecutive U+002D HYPHEN-MINUS (<code title="">-</code>)
- characters, nor end with a U+002D HYPHEN-MINUS (<code title="">-</code>) character. Finally, the comment must be ended by
+ GREATER-THAN SIGN character (&gt;), nor start with a U+002D
+ HYPHEN-MINUS character (-) followed by a U+003E GREATER-THAN SIGN
+ (&gt;) character, nor contain two consecutive U+002D HYPHEN-MINUS
+ characters (<code title="">--</code>), nor end with a U+002D
+ HYPHEN-MINUS character (-). Finally, the comment must be ended by
the three character sequence U+002D HYPHEN-MINUS, U+002D
HYPHEN-MINUS, U+003E GREATER-THAN SIGN (<code title="">--&gt;</code>).<div class="impl">
@@ -56536,8 +56531,8 @@
<h5 id="markup-declaration-open-state"><span class="secno">9.2.4.44 </span><dfn>Markup declaration open state</dfn></h5>
- <p>If the next two characters are both U+002D HYPHEN-MINUS (-)
- characters, consume those two characters, create a comment token
+ <p>If the next two characters are both U+002D HYPHEN-MINUS
+ characters (-), consume those two characters, create a comment token
whose data is the empty string, and switch to the <a href="#comment-start-state">comment
start state</a>.</p>
@@ -56646,8 +56641,8 @@
<dt>U+000C FORM FEED (FF)</dt>
<!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
<dt>U+0020 SPACE</dt>
- <dd><a href="#parse-error">Parse error</a>. Append two U+002D HYPHEN-MINUS (-)
- characters and the <a href="#current-input-character">current input character</a> to the
+ <dd><a href="#parse-error">Parse error</a>. Append two U+002D HYPHEN-MINUS
+ characters (-) and the <a href="#current-input-character">current input character</a> to the
comment token's data. Switch to the <a href="#comment-end-space-state">comment end space
state</a>.</dd>
@@ -56669,8 +56664,8 @@
be treated as live code -->
<dt>Anything else</dt>
- <dd><a href="#parse-error">Parse error</a>. Append two U+002D HYPHEN-MINUS (-)
- characters and the <a href="#current-input-character">current input character</a> to the
+ <dd><a href="#parse-error">Parse error</a>. Append two U+002D HYPHEN-MINUS
+ characters (-) and the <a href="#current-input-character">current input character</a> to the
comment token's data. Switch to the <a href="#comment-state">comment
state</a>.</dd>
@@ -56679,7 +56674,7 @@
<p>Consume the <a href="#next-input-character">next input character</a>:</p>
<dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
- <dd>Append two U+002D HYPHEN-MINUS (-) characters and a U+0021
+ <dd>Append two U+002D HYPHEN-MINUS characters (-) and a U+0021
EXCLAMATION MARK character (!) to the comment token's data. Switch
to the <a href="#comment-end-dash-state">comment end dash state</a>.</dd>
@@ -56693,7 +56688,7 @@
comment in comment end state -->
<dt>Anything else</dt>
- <dd>Append two U+002D HYPHEN-MINUS (-) characters, a U+0021
+ <dd>Append two U+002D HYPHEN-MINUS characters (-), a U+0021
EXCLAMATION MARK character (!), and the <a href="#current-input-character">current input
character</a> to the comment token's data. Switch to the
<a href="#comment-state">comment state</a>.</dd>
@@ -57344,17 +57339,18 @@
error</a>. No characters are consumed, and nothing is
returned.</p>
- <p>If the last character matched is not a U+003B SEMICOLON (<code title="">;</code>), there is a <a href="#parse-error">parse error</a>.</p>
+ <p>If the last character matched is not a U+003B SEMICOLON
+ character (;), there is a <a href="#parse-error">parse error</a>.</p>
<p>If the character reference is being consumed <a href="#character-reference-in-attribute-value-state" title="character reference in attribute value state">as part of an
attribute</a>, and the last character matched is not a U+003B
- SEMICOLON character (<code title="">;</code>), and the next
- character is in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT
- NINE (9), U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL
- LETTER Z, or U+0061 LATIN SMALL LETTER A to U+007A LATIN SMALL
- LETTER Z, then, for historical reasons, all the characters that
- were matched after the U+0026 AMPERSAND character (&amp;) must be
- unconsumed, and nothing is returned.</p>
+ SEMICOLON character (;), and the next character is in the range
+ U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), U+0041 LATIN
+ CAPITAL LETTER A to U+005A LATIN CAPITAL LETTER Z, or U+0061 LATIN
+ SMALL LETTER A to U+007A LATIN SMALL LETTER Z, then, for
+ historical reasons, all the characters that were matched after the
+ U+0026 AMPERSAND character (&amp;) must be unconsumed, and nothing
+ is returned.</p>
<p>Otherwise, return a character token for the character
corresponding to the character reference name (as given by the
@@ -61439,19 +61435,18 @@
<dd>
- <p>Append a U+003C LESS-THAN SIGN (<code title="">&lt;</code>)
- character, followed by the element's tag name. (For nodes
- created by the <a href="#html-parser">HTML parser</a> or <code title="">Document.createElement()</code>, the tag name will be
+ <p>Append a U+003C LESS-THAN SIGN character character (&lt;),
+ followed by the element's tag name. (For nodes created by the
+ <a href="#html-parser">HTML parser</a> or <code title="">Document.createElement()</code>, the tag name will be
lowercase.)</p>
<p>For each attribute that the element has, append a U+0020
SPACE character, the attribute's name (which, for attributes
set by the <a href="#html-parser">HTML parser</a> or by <code title="">Element.setAttributeNode()</code> or <code title="">Element.setAttribute()</code>, will be lowercase), a
- U+003D EQUALS SIGN (<code title="">=</code>) character, a
- U+0022 QUOTATION MARK (<code title="">"</code>)
- character, the attribute's value, <a href="#escapingString" title="escaping a
- string">escaped as described below</a> in <i>attribute
- mode</i>, and a second U+0022 QUOTATION MARK (<code title="">"</code>) character.</p>
+ U+003D EQUALS SIGN character (=), a U+0022 QUOTATION MARK
+ character ("), the attribute's value, <a href="#escapingString" title="escaping a string">escaped as described below</a> in
+ <i>attribute mode</i>, and a second U+0022 QUOTATION MARK
+ character (").</p>
<p>While the exact order of attributes is UA-defined, and may
depend on factors such as the order that the attributes were
@@ -61459,8 +61454,7 @@
such that consecutive invocations of this algorithm serialize an
element's attributes in the same order.</p>
- <p>Append a U+003E GREATER-THAN SIGN (<code title="">&gt;</code>)
- character.</p>
+ <p>Append a U+003E GREATER-THAN SIGN character (&gt;).</p>
<p>If <var title="">current node</var> is an
<code><a href="#the-area-element">area</a></code>, <code><a href="#the-base-element">base</a></code>, <code><a href="#basefont">basefont</a></code>,
@@ -61481,8 +61475,10 @@
<p>Append the value of running the <a href="#html-fragment-serialization-algorithm">HTML fragment
serialization algorithm</a> on the <var title="">current
node</var> element (thus recursing into this algorithm for
- that element), followed by a U+003C LESS-THAN SIGN (<code title="">&lt;</code>) character, a U+002F SOLIDUS (<code title="">/</code>) character, the element's tag name again,
- and finally a U+003E GREATER-THAN SIGN (<code title="">&gt;</code>) character.</p>
+ that element), followed by a U+003C LESS-THAN SIGN character
+ (&lt;), a U+002F SOLIDUS character (/), the element's tag name
+ again, and finally a U+003E GREATER-THAN SIGN character
+ (&gt;).</p>
</dd>
@@ -64068,7 +64064,7 @@
string "<code title="">]]&gt;</code>".</li> (these can be split)-->
<li>A <code>Comment</code> node whose data contains two adjacent
- U+002D HYPHEN-MINUS (-) characters or ends with such a
+ U+002D HYPHEN-MINUS characters (-) or ends with such a
character.</li>
<li>A <code>ProcessingInstruction</code> node whose target name is