hixie: Move a section so that the character encoding requirements are
closer together. (whatwg r6992)
http://dev.w3.org/cvsweb/html5/spec/Overview.html?r1=1.5584&r2=1.5585&f=hhttp://html5.org/tools/web-apps-tracker?from=6991&to=6992
===================================================================
RCS file: /sources/public/html5/spec/Overview.html,v
retrieving revision 1.5584
retrieving revision 1.5585
diff -u -d -r1.5584 -r1.5585
--- Overview.html 13 Feb 2012 22:48:18 -0000 1.5584
+++ Overview.html 13 Feb 2012 22:50:16 -0000 1.5585
@@ -1157,8 +1157,8 @@
<ol>
<li><a href="#determining-the-character-encoding"><span class="secno">8.2.2.1 </span>Determining the character encoding</a></li>
<li><a href="#character-encodings-0"><span class="secno">8.2.2.2 </span>Character encodings</a></li>
- <li><a href="#preprocessing-the-input-stream"><span class="secno">8.2.2.3 </span>Preprocessing the input stream</a></li>
- <li><a href="#changing-the-encoding-while-parsing"><span class="secno">8.2.2.4 </span>Changing the encoding while parsing</a></ol></li>
+ <li><a href="#changing-the-encoding-while-parsing"><span class="secno">8.2.2.3 </span>Changing the encoding while parsing</a></li>
+ <li><a href="#preprocessing-the-input-stream"><span class="secno">8.2.2.4 </span>Preprocessing the input stream</a></ol></li>
<li><a href="#parse-state"><span class="secno">8.2.3 </span>Parse state</a>
<ol>
<li><a href="#the-insertion-mode"><span class="secno">8.2.3.1 </span>The insertion mode</a></li>
@@ -58895,7 +58895,59 @@
- <h5 id="preprocessing-the-input-stream"><span class="secno">8.2.2.3 </span>Preprocessing the input stream</h5>
+ <h5 id="changing-the-encoding-while-parsing"><span class="secno">8.2.2.3 </span>Changing the encoding while parsing</h5>
+
+ <p>When the parser requires the user agent to <dfn id="change-the-encoding">change the
+ encoding</dfn>, it must run the following steps. This might happen
+ if the <a href="#encoding-sniffing-algorithm">encoding sniffing algorithm</a> described above
+ failed to find an encoding, or if it found an encoding that was not
+ the actual encoding of the file.</p>
+
+ <ol><li>If the encoding that is already being used to interpret the
+ input stream is <a href="#a-utf-16-encoding">a UTF-16 encoding</a>, then set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
+ <i>certain</i> and abort these steps. The new encoding is ignored;
+ if it was anything but the same encoding, then it would be clearly
+ incorrect.</li>
+
+ <li>If the new encoding is <a href="#a-utf-16-encoding">a UTF-16 encoding</a>, change
+ it to UTF-8.</li>
+
+ <li>If the new encoding is identical or equivalent to the encoding
+ that is already being used to interpret the input stream, then set
+ the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
+ <i>certain</i> and abort these steps. This happens when the
+ encoding information found in the file matches what the
+ <a href="#encoding-sniffing-algorithm">encoding sniffing algorithm</a> determined to be the
+ encoding, and in the second pass through the parser if the first
+ pass found that the encoding sniffing algorithm described in the
+ earlier section failed to find the right encoding.</li>
+
+ <li>If all the bytes up to the last byte converted by the current
+ decoder have the same Unicode interpretations in both the current
+ encoding and the new encoding, and if the user agent supports
+ changing the converter on the fly, then the user agent may change
+ to the new converter for the encoding on the fly. Set the
+ <a href="#document-s-character-encoding">document's character encoding</a> and the encoding used to
+ convert the input stream to the new encoding, set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
+ <i>certain</i>, and abort these steps.</li>
+
+ <li>Otherwise, <a href="#navigate">navigate</a> to the
+ document again, with <a href="#replacement-enabled">replacement enabled</a>, and using
+ the same <a href="#source-browsing-context">source browsing context</a>, but this time skip
+ the <a href="#encoding-sniffing-algorithm">encoding sniffing algorithm</a> and instead just set
+ the encoding to the new encoding and the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
+ <i>certain</i>. Whenever possible, this should be done without
+ actually contacting the network layer (the bytes should be
+ re-parsed from memory), even if, e.g., the document is marked as
+ not being cacheable. If this is not possible and contacting the
+ network layer would involve repeating a request that uses a method
+ other than HTTP GET (<a href="#concept-http-equivalent-get" title="concept-http-equivalent-get">or
+ equivalent</a> for non-HTTP URLs), then instead set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
+ <i>certain</i> and ignore the new encoding. The resource will be
+ misinterpreted. User agents may notify the user of the situation,
+ to aid in application development.</li>
+
+ </ol><h5 id="preprocessing-the-input-stream"><span class="secno">8.2.2.4 </span>Preprocessing the input stream</h5>
<p>The <dfn id="input-stream">input stream</dfn> consists of the characters pushed
into it as the <a href="#the-input-byte-stream">input byte stream</a> is decoded or from the
@@ -58952,60 +59004,7 @@
consumed. Otherwise, the "EOF" character is not a real character in
the stream, but rather the lack of any further characters.</p>
-
- <h5 id="changing-the-encoding-while-parsing"><span class="secno">8.2.2.4 </span>Changing the encoding while parsing</h5>
-
- <p>When the parser requires the user agent to <dfn id="change-the-encoding">change the
- encoding</dfn>, it must run the following steps. This might happen
- if the <a href="#encoding-sniffing-algorithm">encoding sniffing algorithm</a> described above
- failed to find an encoding, or if it found an encoding that was not
- the actual encoding of the file.</p>
-
- <ol><li>If the encoding that is already being used to interpret the
- input stream is <a href="#a-utf-16-encoding">a UTF-16 encoding</a>, then set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
- <i>certain</i> and abort these steps. The new encoding is ignored;
- if it was anything but the same encoding, then it would be clearly
- incorrect.</li>
-
- <li>If the new encoding is <a href="#a-utf-16-encoding">a UTF-16 encoding</a>, change
- it to UTF-8.</li>
-
- <li>If the new encoding is identical or equivalent to the encoding
- that is already being used to interpret the input stream, then set
- the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
- <i>certain</i> and abort these steps. This happens when the
- encoding information found in the file matches what the
- <a href="#encoding-sniffing-algorithm">encoding sniffing algorithm</a> determined to be the
- encoding, and in the second pass through the parser if the first
- pass found that the encoding sniffing algorithm described in the
- earlier section failed to find the right encoding.</li>
-
- <li>If all the bytes up to the last byte converted by the current
- decoder have the same Unicode interpretations in both the current
- encoding and the new encoding, and if the user agent supports
- changing the converter on the fly, then the user agent may change
- to the new converter for the encoding on the fly. Set the
- <a href="#document-s-character-encoding">document's character encoding</a> and the encoding used to
- convert the input stream to the new encoding, set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
- <i>certain</i>, and abort these steps.</li>
-
- <li>Otherwise, <a href="#navigate">navigate</a> to the
- document again, with <a href="#replacement-enabled">replacement enabled</a>, and using
- the same <a href="#source-browsing-context">source browsing context</a>, but this time skip
- the <a href="#encoding-sniffing-algorithm">encoding sniffing algorithm</a> and instead just set
- the encoding to the new encoding and the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
- <i>certain</i>. Whenever possible, this should be done without
- actually contacting the network layer (the bytes should be
- re-parsed from memory), even if, e.g., the document is marked as
- not being cacheable. If this is not possible and contacting the
- network layer would involve repeating a request that uses a method
- other than HTTP GET (<a href="#concept-http-equivalent-get" title="concept-http-equivalent-get">or
- equivalent</a> for non-HTTP URLs), then instead set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
- <i>certain</i> and ignore the new encoding. The resource will be
- misinterpreted. User agents may notify the user of the situation,
- to aid in application development.</li>
-
- </ol></div><div class="impl">
+ </div><div class="impl">
<h4 id="parse-state"><span class="secno">8.2.3 </span>Parse state</h4>