Abstract

This document outlines the way in which the HTML Working Group addressed the
comments submitted during the XHTML-Print Last Call Working Draft
review period.

Status of this document

During the Last Call Working Draft review period for XHTML-Print a
number of comments were received from both inside and outside of the
W3C. This document summarizes those comments and describes the ways in
which the comments were addressed by the HTML Working Group.

Note that the majority of this document is
automatically generated from the Working Group's database of comments. As
such, it may contain typographical or stylistic errors. If so, these are
contained in the original submissions, and the HTML Working Group elected to
not change these submissions.

This document is a product of the W3C's HTML Working Group.
This document may be updated,
replaced or rendered obsolete by other W3C documents at any time. It
is inappropriate to use this document as reference material
or to cite it as other than "work in progress". This document
is work in progress and does not imply endorsement by the W3C membership.

Please send detailed comments on this document to www-html-editor@w3.org.
We cannot guarantee a personal response, but
we will try when it is appropriate. Public discussion on HTML features
takes place on the mailing list
www-html@w3.org.

A list of current W3C Recommendations and other technical documents can
be found at http://www.w3.org/TR.

From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
> From: Jun Fujisawa [mailto:fujisawa.jun@canon.co.jp]
Sent: Monday, July 28, 2003 3:44 AM
To: don@lexmark.com
Cc: xp@pwg.org; jim.bigelow@hp.com
Subject: Re: XP> Incorrect example in Appendix B.3 of XHTML Print
Hello Don,
At 8:15 AM -0400 03.7.25, don@lexmark.com wrote:
>The intent of this example was to show how an image can be declared
>inline with the other XHTML while the actual data for the image may
>come later.
I don't understand the intent. I you want to get actual image
data later (not at the declaration), you can just use 'img'
or 'object' element without 'declare' attribute.
>If the example provided is incorrect, can
>you provide an example that achieves this separation?
The following example shows one type of separation, but I
don't think that meets your need.
<object id="image_1" declare="declare" type="image/jpeg"
data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa .
. . "> </object>
. . . .
<object height="20 mm" width="20 mm"
data="#image_1" >
</object>
--
Jun Fujisawa
<mailto:fujisawa.jun@canon.co.jp>

FOLLOWUP 4:

From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com]
Sent: Friday, August 01, 2003 8:07 AM
To: Jun Fujisawa
Cc: don@lexmark.com; jim.bigelow@hp.com; owner-xp@pwg.org; xp@pwg.org
Subject: Re: XP> Incorrect example in Appendix B.3 of XHTML Print
I see two issues here, perhaps separable.
1. Use of inline data.
This can be accomplished by adding support for the data scheme.
Examples (from Fujisawa-san):
<object height="20 mm" width="20 mm" type="image/jpeg"
data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . ">
Example Image
</object>
or
<img height="20 mm" width="20 mm" alt="Example Image"
src="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . " />
2. Separation of the data from the reference
This is where the declare attribute comes in. I went back and read
http://www.w3.org/TR/html4/struct/objects.html#h-13.3.4
It seems to me that the declare facility would let a client supply the
content for the object before its reference, not after. If the requirement
is that the client can send the image data at the end, I'm not sure that
HTML supports that.
If there is a requirement that the client can send the data first, then
refer to it, then an example (again, thanks Fujisawa) is:
<object id="image_1" declare="declare" type="image/jpeg"
data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . ">
</object>
. . . .
<object height="20 mm" width="20 mm"
data="#image_1" >
</object>
I think the first requirement is good to have, but we can probably drop the
second, especially since the ordering is probably not what we want.
------------------------------------------
Elliott Bradshaw
Director, Software Engineering
Oak Technology Imaging Group
781 638-7534

FOLLOWUP 5:

From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
From: BIGELOW,JIM (HP-Boise,ex1)
Sent: Friday, August 01, 2003 8:38 AM
To: 'ElliottBradshaw@oaktech.com'; Jun Fujisawa
Cc: don@lexmark.com; BIGELOW,JIM (HP-Boise,ex1);
owner-xp@pwg.org; xp@pwg.org
Subject: RE: XP> Incorrect example in Appendix B.3 of XHTML Print
Elliott wrote:
> I see two issues here, perhaps separable.
> 1. Use of inline data.
>
> This can be accomplished by adding support for the data scheme. ...
>
> 2. Separation of the data from the reference
>
> ...
>
> I think the first requirement is good to have, but we can
> probably drop the second, especially since the ordering is
> probably not what we want.
>
I'm not perfectly clear on what you think the requirements should be. The
current spec says that printer may support in-line data via the object/img
elements, but is not required to.
Are you calling for a change to this statement?
Arguments against requiring support for in-line image data have been that:
1. it requires too much buffering
2. the image data could overflow the memory used to store element
attributes. Alternately, to avoid the possibility of exceeding the memory
set aside for storing element attributes while processing a job, a printer
must either reserve large amounts of memory to avoid problems in this one,
almost unique case, or implement a complex, dynamic memory allocation
scheme.
In any event supporting in-line data via the object and image attributes
means that the entire image is funneled through the document parser,
whereas, alternate means of handling image data are possible if the image is
referenced via the cid or http schemes.
There is another method for managing image data buffering, Section B.2.1
In-line images of the W3C spec provides some informative suggestions about
ways to stage the delivery of image data using the (required) multiplexed
document format. This method seeks to reduce the memory needed to store
images while processing the document, by providing enough of the image
header to determine the image's size, synchronized with the image's
reference. The remainder or bulk of the image is delivered later in the
document, hopefully, when the printer is ready to commit the image to the
page.
Jim
--

FOLLOWUP 6:

From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com]
Sent: Friday, August 01, 2003 9:46 AM
To: BIGELOW,JIM (HP-Boise,ex1)
Cc: don@lexmark.com; Jun Fujisawa; BIGELOW,JIM (HP-Boise,ex1);
owner-xp@pwg.org; xp@pwg.org
Subject: RE: XP> Incorrect example in Appendix B.3 of XHTML Print
Sorry, I didn't mean to change the actual requirements. Section B.3 should
stay informative and just be a discussion of different things a printer may
choose to implement.
However, there is at least one case of a conditional requirement elsewhere
in the document (the Object Module) that refers to this section.
But, it is confusing what problem this section is trying to solve (in an
optional way). And, it looks like the example for use of the declare
attribute is just plain wrong.
I propose that we re-write this section to eliminate all discussion of the
declare attribute, and simply show how to use the data URL scheme to handle
inline data.
For example:
<proposal>
This section is informative.
An alternative method to include inline image data in XHTML-Print is via the
"data" URL scheme (see RFC2397). Because this method normally encodes the
binary image data using base64 encoding, a significant increase in the size
of the data transmitted will be experienced. This SHOULD be avoided over low
speed connections. Printers supporting inline data MAYsupport base64
encoding using the img or object element.
<object height="20 mm" width="20 mm" type="image/jpeg"
data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . ">
Example Image
</object>
or
<img height="20 mm" width="20 mm" alt="Example Image"
src="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . " />
This method MAY be useful for very simple clients that cannot afford a
server for image downloading or for some reason cannot utilize the
Application/Multiplexed MIME type; however, it is not RECOMMENDED for
general use especially if the size of the printer's buffer is unknown.
</proposal>

REPLY 1:

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Fujisawa-san,
Thank you for you comment. It is recorded as issue 6492 [1] in the HTML Working
Group's issue tracking system. The working group has elected to accept this
defect and modify XHTML-Print spec by accepting Elliott Bradshaw's proposal to
change Appendix B.3 to read as shown below. If this is not acceptable, please
respond to this message with your comments.
Jim Bigelow
--
This section is informative.
An alternative method to include inline image data in XHTML-Print is via the
"data" URL scheme (see RFC2397). Because this method normally encodes the
binary image data using base64 encoding, a significant increase in the size
of the data transmitted will be experienced. This SHOULD be avoided over low
speed connections.. Printers supporting inline data MAY support base64
encoding using the img or object element.
<object height="20 mm" width="20 mm" type="image/jpeg"
data="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . ">
Example Image
</object>
or
<img height="20 mm" width="20 mm" alt="Example Image"
src="data:image/jpeg;base64,aGh67Fghsapa0Hji7dfGSweTa . . . " />
This method MAY be useful for very simple clients that cannot afford a
server for image downloading or for some reason cannot utilize the
Application/Multiplexed MIME type; however, it is not RECOMMENDED for
general use especially if the size of the printer's buffer is unknown.
[1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6492;user=guest

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Thank you for your comment on the XHTML-Print Last Call
Working Draft. It is recorded as issue 6782 [1] in the HTML
Working Group's issue tracking system.
The working group has elected to implement you suggestions.
Jim Bigelow
Editor
[1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6782;user=guest

From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
To: www-html-editor@w3.org
Cc: xp@pwg.org
Subject: XHTML-Print: change of url from xhtml-print.org to w3c.org breaks current implementations.
Date: Thu, 4 Sep 2003 11:02:17 -0700
Message-ID: <020A3CF87FB5AC47AA67966B33845755050DB585@xboi22.boise.itc.hp.com>
X-Archived-At: http://www.w3.org/mid/020A3CF87FB5AC47AA67966B33845755050DB585@xboi22.boise.itc.hp.com
The W3C Last Call Working Draft of XHTML-Print [1] changes the URL in the
DOCTYPE from
"http://www.xhtml-print.org/xhtml-print/xhtml-print10.dtd" to
"http://www.w3.org/MarkUp/DTD/xhtml-print10.dtd".
This breaks compatibility with existing implementations. Can this situation
be handled by redirecting the xhtml-print.org url to the w3.org url? If so,
how is this done?
[1] http://www.w3.org/TR/2003/WD-xhtml-print-20030729/
Jim Bigelow
Hewlett-Packard

REPLY 1:

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Jonny Axelsson wrote:
Just for my curiosity: How does that break backwards compatibility? The
old DTD will presumably remain at the www.xhtml-print.org location for at
least as long as is needed (for the current implementations), while new or
updated XHTML-Print implementations will use the new location. Or?
--
Jonny Axelsson,
Web Standards,
Opera Software

REPLY 2:

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Elliott Bradshaw wrote:
Don is going to remind us (as well he should) that the URL is not used for a
live retrieval from that server. So a redirect doesn't work.
So I think this is, technically, an incompatible change. But I think it's one
we could live with.
--------------------------------------------------------------------------------
Elliott Bradshaw
Director, Software Engineering
Zoran Imaging Group (formerly Oak Technology Imaging Group)
781 638-7534

REPLY 3:

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Jim Bigelow wrote:
Jonny,
Thanks for the question.
If a document with the w3c DTD is sent to a printer that shipped with firmware
written using the spec saying that conforming XHTML-Print documents must have a
DTD containing a URL to the xhtml-print.org DTD, then the it is possible that
the document wouldn't print correctly, even though the printer
is not validating.
In the extreme case, it is possible that the document wouldn't print at all,
since Section 2.3.1, item 1 says, "A printer MAY ignore or otherwise reject a
non-conforming XHTML-Print document."
I think we're all better off avoiding things that could make the user unhappy!
:-)
Jim

REPLY 4:

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Thank you for your comment on the XHTML-Print Last Call
Working Draft. It is recorded as issue 6869 [1] in the HTML
Working Group's issue tracking system.
The working group following the reasoning of issue 6780 [2] decided that the DTD
in in Appendix C of the spec [3] and the DTD in Appendix C of XHTML-Print [4]
must be accepted. However, the DTD in Appendix C of XHTML-Print [4] is
deprecated in favor of the DTD in Appendic C. Future releases of this
specification may remove the required support for the DTD in Appendix C of
XHTML-Print [4].
If you feel that this resolution of your comment is not acceptable, please
respond to this message with your comments.
Jim Bigelow
Editor
[1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6869;user=guest
[2] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6780;user=guest
[3] http://www.w3.org/TR/2003/WD-xhtml-print-20030729/
[4] http://www.pwg.org/xhtml-print/HTML-Version/XHTML-Print.html

From: Henri Sivonen <hsivonen@iki.fi>
From: Henri Sivonen <hsivonen@iki.fi>
To: www-html-editor@w3.org
Subject: Scripts and Events
Date: Sun, 3 Aug 2003 22:01:47 +0300
Message-Id: <EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi>
X-Archived-At: http://www.w3.org/mid/EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi
1.3.1 Script and Events
Since the specification requires the documents to conform to
restrictions that are not applicable to all XHTML documents, it is
unlikely that casually authored XHTML documents would happen to be
conforming XHTML-Print documents. Therefore, it is reasonable to expect
some preprocessing to take place in the application before sending a
document to the printer. That application could be required to discard
script elements without burdening the printer with that task.
Such modification would change the document tree, though, and could
change the matching of CSS selectors. If it is important to take into
account the special case that someone could use a CSS selector such as
"script + p" to style a paragraph, it would be necessary to elaborate
on what "discarding" an element on the printer means (that is, is it
discarded from the document tree or merely defaulted to display: none;).
[extracted from issue 6548]
--
Henri Sivonen
hsivonen@iki.fi
http://www.iki.fi/hsivonen/

REPLY 1:

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Thank you for your comment. It is recorded as issue 6772 [1] in the HTML
Working
Group's issue tracking system. The working group has elected to accept your
comment by clarifying that discarding an element should be the equivalent to
setting its display property to "none".
If this resolution of you comment is not acceptable, please
respond to this message with your comments.
Jim Bigelow
Editor
[1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6772;user=guest

From: Henri Sivonen <hsivonen@iki.fi>
From: Henri Sivonen <hsivonen@iki.fi>
To: www-html-editor@w3.org
Subject: Document Conformance
Date: Sun, 3 Aug 2003 22:01:47 +0300
Message-Id: <EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi>
X-Archived-At: http://www.w3.org/mid/EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi
2.1 Document Conformance
Considering that printers are allowed to ignore non-conforming
documents, requiring a particular doctype declaration and DTD validity
looks like a significant burden for applications producing XHTML-Print
documents. In particular, DTD validity requires namespaces to be
represented in a particular way even though other representations would
be semantically equivalent. This means applications producing
XHTML-Print documents cannot use any off-the-shelf XML serializer but
need a serializer specifically tailored to meet the requirements of
XML-Print.
Wouldn't it be enough allow DTDless documents as long as the element
structure meets the requirements expressed in the DTD (even though this
kind of conformance can't be checked with a [DTD-]validating XML
processor)?
[extracted from issue 6548]
--
Henri Sivonen
hsivonen@iki.fi
http://www.iki.fi/hsivonen/

REPLY 1:

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Thank you for your comment on the XHTML-Print Last Call
Working Draft. It is recorded as issue 6773 [1] in the HTML
Working Group's issue tracking system. The working group does
not agree that the inclusion of the required doctype element in
XHTML-Print documents would be a burden either to an application
that produced XHTML-Print documents or a printer that processed
them. Therefore, no change is planned to the specific regarding
your issue.
If you feel that this resolution of your comment is not acceptable, please
respond to this message with your comments.
Jim Bigelow
Editor
[1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6773;user=guest

From: Henri Sivonen <hsivonen@iki.fi>
From: Henri Sivonen <hsivonen@iki.fi>
To: www-html-editor@w3.org
Subject: allow UTF-16 not just UTF-8
Date: Sun, 3 Aug 2003 22:01:47 +0300
Message-Id: <EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi>
X-Archived-At: http://www.w3.org/mid/EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi
It is said that if a "charset" parameter is present for the
application/xhtml+xml MIME type, the only valid value is "utf-8". It
would make sense to allow "utf-16" as well. All XML processors are
required to support UTF-16 in addition to UTF-8, so allowing UTF-16 for
XHTML-Print doesn't cause any additional burden to implementations.
Also, the payload of Application/Vnd.pwg-multiplexed chunks is defined
as octets, so UTF-16 strings can be delivered as
Application/Vnd.pwg-multiplexed chunks without any further encoding.
[extracted from issue 6548]
--
Henri Sivonen
hsivonen@iki.fi
http://www.iki.fi/hsivonen/

From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
> From: elliott.bradshaw@zoran.com [mailto:elliott.bradshaw@zoran.com]
> Sent: Wednesday, September 03, 2003 7:07 AM
> To: don@lexmark.com
> Cc: BIGELOW,JIM (HP-Boise,ex1); owner-xp@pwg.org; xp@pwg.org
> Subject: Re: XP> Relaxing XHTML-Print's restriction to UTF-8
> to include UTF-16
>
Or to put it another way, XHTML-Print describes a single way of doing
something. Wherease HTML and its derivatives frequently support multiple
ways of getting the same effect.
In the past, we have have resisted features that appear easy, unless they
actually extend the capabilities of what can be done.
Since I think a UTF-8 oriented client can get the same work done as a UTF-16
client, we should not mandate the extension.
IMHO.
E.

FOLLOWUP 4:

From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
> From: Michael Sweet [mailto:mike@easysw.com]
> Sent: Wednesday, September 03, 2003 7:26 AM
> To: don@lexmark.com
> Cc: BIGELOW,JIM (HP-Boise,ex1); xp@pwg.org
> Subject: Re: XP> Relaxing XHTML-Print's restriction to UTF-8
> to include UTF-16
I'm not so worried about memory usage; converting UTF-16 to UTF-8 on the
input side is not expensive in terms of memory or processor.
However, reliably detecting UTF-16 and managing the endianess of the words
is a pain in the ass in the real world. Assuming that all UTF-16 files
start with FFFE or FEFF, the XML parser can handle the UTF-16 encoding
without difficulty, however certain large convicted software monopolies
regularly omit this important information making autodetection unreliable.
Given the limited scope of XHTML-Print and the desire for maximum
interoperability, I would recommend that we stick with UTF-8 as the only
requirement so that applications that send XHTML-Print data have to use
UTF-8 and manage whatever perversion of UTF-16 they use internally
themselves...
--
______________________________________________________________________
Michael Sweet, Easy Software Products mike at easysw dot com
Printing Software for UNIX http://www.easysw.com

FOLLOWUP 5:

From: don@lexmark.com
I maintain my disagreement with this decision for all the reasons
previously mentioned including:
1) There are no characters which can be represented in UTF16 that connot be
represented in UTF8
2) Reliable detection of UTF16 has not been proven
3) High "zoot" clients can much more easily convert any UTF16 to UTF8
4) Many of the target printers will have no need to deal with generic XML
and hence no reason to support UTF16
Jim Bigelow <voyager-issues@mn.aptest.com> on 09/26/2003 03:48:41 PM
To: hsivonen@iki.fi
cc: don@lexmark.com, elliott.bradshaw@zoran.com
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
Thank you for your comment on the XHTML-Print Last Call
Working Draft. It is recorded as issue 6774 [1] in the HTML
Working Group's issue tracking system.
The working group agrees that since XHTML-Print is a member
of the family of XHTML 1.0 languages documents encodings cannot
be restricted to UTF-8 but must also include UTF-16. The
specification will be modified to remove the sentence,
'The only valid value for the "charset" parameter is "utf-8".'
If you feel that this resolution of your comment is not acceptable, please
respond to this message with your comments.
Jim Bigelow
Editor
[1]
http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6774;user=guest

From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
To the HTML WG:
Hello,
Please help me understand this facet of XHTML-Print as a member of the
Family of Languages defined by the Modularization of XHTML 1.0 -- must an
application that processes XHTML-Print documents be a conforming XML
processor?
I'm sure that it must be able to process XHTML-Print documents as described
by the XHTML-Print specification, but are there other constraints? For
example, an xml processor is supposed to be able to process documents in
UTF-8 and UTF-16. Why does an XHTML-Print processor have support UTF-16?
What would be the reasons for not restricting the encoding to UTF-8?
The potential benefit of only requiring support for UTF-8, rather than both
UTF-8 and UTF-16, is that a more low-cost (in terms of memory and processing
power) printers could process utf-8 encoded XHTML-Print documents. Requiring
support for both UTF-8 and UTF-16 increases the memory and processing
requirements and thereby reduces the number of devices that could process
XHTML-Print documents.
One of the goals of XHTML-Print is to provide document format for printing
from and to low-cost devices, so keeping requirements to a minimum increases
the possibilities that low-cost printers will implement support for it.
Several representative of printer manufactures have expressed the opinion
that support for UTF-8 and not for UTF-16 is preferred. Can you help me
understand the technical reasons why UTF-16 support should be required, so
we can judge the trade-offs in implementation costs versus capabilities?
Jim

FOLLOWUP 8:

From: elliott.bradshaw@zoran.com
Jim,
Um, seems to me like a game of semantics. Whether we make a statement
about the language or a statement about how the client generates it,
seems like it's the same thing.
I think the conflict here is:
1. PWG wanted a simple way to send print jobs. No need for multiple
ways to accomplish the same thing.
2. But there seem to be W3C rules about how one derives languages
from XHTML.
I do think that #2 is contrary to the purpose of the original
project. Just as we are able to say that XHTML-Print does not mandate
certain properties which are too hard for a printer (e.g. the caveats
on the position property) we ought to be able to exclude something
that is not appropriate to the problem at hand.
The only justification for this extension is "W3C says so." In
principle we shouldn't do it. But, as a compromise I could live with
it if I had to.
--
Elliott Bradshaw
Director, Software Engineering Zoran Imaging Division
(formerly Oak Technology Imaging Group) 781 638-7534 0

From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
Here is Don Wright's objection to UTF-16 support.
Jim
http://oz.boi.hp.com/~jhb/
-----Original Message-----
From: don@lexmark.com [mailto:don@lexmark.com]
Sent: Wednesday, October 08, 2003 9:42 AM
To: BIGELOW,JIM (HP-Boise,ex1)
Cc: elliott.bradshaw@zoran.com; www-html@w3.org
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
Jim:
So let me understand this....
Because people have poorly designed and written XML applications running on
3 GHz Pentium 4s with 512 megabytes of real memory that do not allow the
control over whether UTF-8 or UTF-16 are emitted, we are expecting to burden
$49 printers with code to be able to detect and interpret both.
I maintain my objection and my no vote.
**********************************************
Don Wright don@lexmark.com
Chair, IEEE SA Standards Board
Member, IEEE-ISTO Board of Directors
f.wright@ieee.org / f.wright@computer.org
Director, Alliances & Standards
Lexmark International
740 New Circle Rd
Lexington, Ky 40550
859-825-4808 (phone) 603-963-8352 (fax)
**********************************************
"BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> on 10/08/2003 10:24:45 AM
To: don@lexmark.com
cc: elliott.bradshaw@zoran.com, www-html@w3.org
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
From
http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6774;user=g
uest - reply #3
Date: Wed Oct 1 12:43:54 2003
Don and Elliott,
The HTML working group discussed my question of why and XHTML-Print
processor must be a conforming XML processor (in particular, why it must
support both UTF-8 and UTF-16 encodings) on October 1, 2003.
The answer is that XHTML-Print must be a conforming XML processor and
support both UTF-8 and UTF-16 encodings to preserve compatibility between
xml-based applications.
If XHTML-Print processors only supported UTF-8 then an xml-based application
could not be reliably depended upon to emit an XHTML-Print document that the
XHTML-print application could process. For example, an xml-based Xforms
application's output of an XHTML-Print document cannot be restricted by the
XHTML-Print specification to UTF-8 since the application may not be able to
control the encoding.
Section 4.3.3 [1] and Appendix F [2] of the XML specification [3] give
heuristics for determing a document's encoding when the charset parameter of
the MIME type [4] is absent.
An example UTF-16 decoder is available at [5] other encodings are at [6].
Jim Bigelow
[1] http://www.w3.org/TR/REC-xml#charencoding
[2] http://www.w3.org/TR/REC-xml#sec-guessing
[3] http://www.w3.org/TR/REC-xml
[4] http://www.ietf.org/rfc/rfc3023.txt
[5] http://interscript.sourceforge.net/interscript/doc/en_iscr_0282.html
[6] http://interscript.sourceforge.net/interscript/doc/en_iscr_0275.html
Jim
http://oz.boi.hp.com/~jhb/

FOLLOWUP 11:

From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
-----Original Message-----
From: elliott.bradshaw@zoran.com [mailto:elliott.bradshaw@zoran.com]
Sent: Thursday, October 09, 2003 2:14 PM
To: don@lexmark.com
Cc: BIGELOW,JIM (HP-Boise,ex1)
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
Don,
As you know I have been skeptical of feature creep all along. But I think
this one may be different...here's why.
When we originally conceived XHTML-Print the idea was that the client code
would be essentially a hand-coded print driver. But this W3C discussion
brings up the idea that people could use XML application development tools
as well. This could be in our interest if it gives people an easy way to
write XHTML-Print aware applications. (And it seems to be pretty
fundamental to the way they defined XML.)
It seems that such tools don't like to be constrained to only one of UTF-8
vs. UTF-16...it would be "unnatural" to limit a developer in this way. It
sort of reminds me of 10baseT vs. 100baseT, in which it seems odd to support
one but not the other.
How much complexity would this add to the $49 printer? Once we know whether
or not we are in UTF-16, it would add very little (if nothing else do a
brute force conversion from UTF-16 to UTF-8). Detection of UTF-16 is also
straightforward, as described in 4.3.3 of http://www.w3.org/TR/REC-xml,
which says the special Byte Order Mark is required at the beginning of
UTF-16. (It also says very clearly that UTF-16 support is required.)
So I think the cost is low, the benefit of XML-based application tools might
be significant, and technical alignment with XML makes it worth doing.
E.
----------------------------------------------------------------------------
----
Elliott Bradshaw
Director, Software Engineering
Zoran Imaging Division (formerly Oak Technology Imaging Group) 781 638-7534
don@lexmark.co
m To: "BIGELOW,JIM
(HP-Boise,ex1)"
<jim.bigelow@hp.com>
10/08/2003 cc: elliott.bradshaw@zoran.com,
www-html@w3.org
12:41 PM Subject: Re: allow UTF-16 not
just UTF-8
(PR#6774)
Jim:
So let me understand this....
Because people have poorly designed and written XML applications running on
3 GHz Pentium 4s with 512 megabytes of real memory that do not allow the
control over whether UTF-8 or UTF-16 are emitted, we are expecting to burden
$49 printers with code to be able to detect and interpret both.
I maintain my objection and my no vote.
**********************************************
Don Wright don@lexmark.com
Chair, IEEE SA Standards Board
Member, IEEE-ISTO Board of Directors
f.wright@ieee.org / f.wright@computer.org
Director, Alliances & Standards
Lexmark International
740 New Circle Rd
Lexington, Ky 40550
859-825-4808 (phone) 603-963-8352 (fax)
**********************************************
"BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com> on 10/08/2003 10:24:45 AM
To: don@lexmark.com
cc: elliott.bradshaw@zoran.com, www-html@w3.org
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
From
http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6774;user=g
uest - reply #3
Date: Wed Oct 1 12:43:54 2003
Don and Elliott,
The HTML working group discussed my question of why and XHTML-Print
processor must be a conforming XML processor (in particular, why it must
support both UTF-8 and UTF-16 encodings) on October 1, 2003.
The answer is that XHTML-Print must be a conforming XML processor and
support both UTF-8 and UTF-16 encodings to preserve compatibility between
xml-based applications.
If XHTML-Print processors only supported UTF-8 then an xml-based application
could not be reliably depended upon to emit an XHTML-Print document that the
XHTML-print application could process. For example, an xml-based Xforms
application's output of an XHTML-Print document cannot be restricted by the
XHTML-Print specification to UTF-8 since the application may not be able to
control the encoding.
Section 4.3.3 [1] and Appendix F [2] of the XML specification [3] give
heuristics for determing a document's encoding when the charset parameter of
the MIME type [4] is absent.
An example UTF-16 decoder is available at [5] other encodings are at [6].
Jim Bigelow
[1] http://www.w3.org/TR/REC-xml#charencoding
[2] http://www.w3.org/TR/REC-xml#sec-guessing
[3] http://www.w3.org/TR/REC-xml
[4] http://www.ietf.org/rfc/rfc3023.txt
[5] http://interscript.sourceforge.net/interscript/doc/en_iscr_0282.html
[6] http://interscript.sourceforge.net/interscript/doc/en_iscr_0275.html
Jim
http://oz.boi.hp.com/~jhb/

FOLLOWUP 12:

From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
Mike,
I've neglected to update you on the discussions about UTF-8/UTF-16 support
for XHTML-Print. Please let us know you thoughts on the matter.
You can see these discussion using the following link to the W3C's HTML
Working Group issue database:
http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6774;user=g
uest
In summary:
HTML WG: must support UTF-8 & UTF-16 for interoperability with all other xml
and xml-derived applications and processors.
Lexmark: UTF-16 support is too expensive to support in a low-cost printer,
and too hard to reliably detect, ...
Oak/Zoran: UTF-16 wouldn't be too expensive to implement and enables a new
class of XHTML-Print producing devices
HP: UTF-16 allows for more compact representation of Asian character
documents and would not be too much to implement.
Jim Bigelow,
Editor: XHTML-Print & CSS Print Profile
W3C HTML and CSS Working Groups
http://www.w3.org/TR/xhtml-print/
http://www.w3.org/TR/css-print/
Hewlett-Packard
208-396-2068
jim.bigelow@hp.com

FOLLOWUP 13:

From: don@lexmark.com
Steven, et al:
The real problem is that the entire XML architecture was designed assuming
high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We
have already seen push back in other standards groups that consumer
electronic devices and other smaller, lighter devices cannot afford all the
luxuries demand by an obese XML architecture. Unless the XML community
accepts subsetting, we can't expect the broadest support for XML to happen
at the low end until the price/performance ratios experience another order
or two magnitude improvement. As recently reported in several of the trade
magazines focused on IT professionals, the deployment of XML and Web
Services are have significant negative impacts on the IT infrastructure
especially in the area of bandwidth utilization. This is just another
symptom of the same problem.
I know I will lose this argument in the W3C but the realities of the
XHTML-Print implementations will blow off UTF-16 as more fat with no
benefit and simply not support it, "interoperable" or not.
Sorry I'm not pure but practical.
*******************************************
Don Wright don@lexmark.com
Chair, IEEE SA Standards Board
Member, IEEE-ISTO Board of Directors
f.wright@ieee.org / f.wright@computer.org
Director, Alliances and Standards
Lexmark International
740 New Circle Rd C14/082-3
Lexington, Ky 40550
859-825-4808 (phone) 603-963-8352 (fax)
*******************************************
"Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM
To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
<w3c-html-wg@w3.org>, <don@lexmark.com>
cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
<www-html@w3.org>
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
> From: don@lexmark.com [mailto:don@lexmark.com]
> So let me understand this....
>
> Because people have poorly designed and written XML applications running
on
> 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow the
> control over whether UTF-8 or UTF-16 are emitted, we are expecting to
burden
> $49 printers with code to be able to detect and interpret both.
No Don. It is about interoperability and conforming to standards. XML
allows
documents to be encoded in either UTF8 or UTF 16: consumers must accept
both, producers may produce either. An XHTML-Print printer will be just a
consumer of an XML byte-stream at some IP address; we don't want to burden
every program in the world that can produce XML with a switch that says
"this output is going to a poor lowly XHTML Print processor that can't deal
with UTF-16, so please produce UTF-8", especially since UTF 16 is the easy
one to implement, and can only cost a few dozen bytes at best.
If we changed this, XHTML Print would have to go back to last call, and you
can bet your boots that the XML community would rise up against us, as it
has in the past, and I can tell you we don't want to go there, and we would
have a hundred people registering objections.
Conforming to XML requirements comes with the territory of being XHTML. The
XML community will not take lightly to us messing with their standards.
Best wishes,
Steven Pemberton

FOLLOWUP 14:

From: "Steven Pemberton" <Steven.Pemberton@cwi.nl>
> From: don@lexmark.com [mailto:don@lexmark.com]
> So let me understand this....
>
> Because people have poorly designed and written XML applications running
on
> 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow the
> control over whether UTF-8 or UTF-16 are emitted, we are expecting to
burden
> $49 printers with code to be able to detect and interpret both.
No Don. It is about interoperability and conforming to standards. XML allows
documents to be encoded in either UTF8 or UTF 16: consumers must accept
both, producers may produce either. An XHTML-Print printer will be just a
consumer of an XML byte-stream at some IP address; we don't want to burden
every program in the world that can produce XML with a switch that says
"this output is going to a poor lowly XHTML Print processor that can't deal
with UTF-16, so please produce UTF-8", especially since UTF 16 is the easy
one to implement, and can only cost a few dozen bytes at best.
If we changed this, XHTML Print would have to go back to last call, and you
can bet your boots that the XML community would rise up against us, as it
has in the past, and I can tell you we don't want to go there, and we would
have a hundred people registering objections.
Conforming to XML requirements comes with the territory of being XHTML. The
XML community will not take lightly to us messing with their standards.
Best wishes,
Steven Pemberton

FOLLOWUP 15:

From: Michael Sweet <mike@easysw.com>
BIGELOW,JIM (HP-Boise,ex1) wrote:
> Mike,
>
> I've neglected to update you on the discussions about UTF-8/UTF-16
> support for XHTML-Print. Please let us know you thoughts on the
> matter.
My concerns have always been concerning the detection between UTF-8
and UTF-16. After looking through the archive and the current XML
spec, it does look like the BOM is required at the beginning of any
UTF-16 XML document, so any autodetection problems can safely be
blamed on Microsoft or whatever vendor is producing a non-conforming
document.
I do like the idea of recommending (a SHOULD, not a MUST) that the
XHTML-Print client use the UTF-8 encoding, and add a note that the
typical XHTML-Print device has limited CPU/memory available and
the use of UTF-8 will potentially provide faster printing, etc.
--
______________________________________________________________________
Michael Sweet, Easy Software Products mike at easysw dot com
Printing Software for UNIX http://www.easysw.com

FOLLOWUP 16:

From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
From: elliott.bradshaw@zoran.com [mailto:elliott.bradshaw@zoran.com]
Sent: Thursday, October 09, 2003 2:14 PM
To: don@lexmark.com
Cc: BIGELOW,JIM (HP-Boise,ex1)
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
Don,
As you know I have been skeptical of feature creep all along. But I think
this one may be different...here's why.
When we originally conceived XHTML-Print the idea was that the client code
would be essentially a hand-coded print driver. But this W3C discussion
brings up the idea that people could use XML application development tools
as well. This could be in our interest if it gives people an easy way to
write XHTML-Print aware applications. (And it seems to be pretty
fundamental to the way they defined XML.)
It seems that such tools don't like to be constrained to only one of UTF-8
vs. UTF-16...it would be "unnatural" to limit a developer in this way. It
sort of reminds me of 10baseT vs. 100baseT, in which it seems odd to support
one but not the other.
How much complexity would this add to the $49 printer? Once we know whether
or not we are in UTF-16, it would add very little (if nothing else do a
brute force conversion from UTF-16 to UTF-8). Detection of UTF-16 is also
straightforward, as described in 4.3.3 of http://www.w3.org/TR/REC-xml,
which says the special Byte Order Mark is required at the beginning of
UTF-16. (It also says very clearly that UTF-16 support is required.)
So I think the cost is low, the benefit of XML-based application tools might
be significant, and technical alignment with XML makes it worth doing.
E.
----------------------------------------------------------------------------
----
Elliott Bradshaw
Director, Software Engineering
Zoran Imaging Division (formerly Oak Technology Imaging Group) 781 638-7534

FOLLOWUP 17:

From: "Steven Pemberton" <steven.pemberton@cwi.nl>
But support for UTF 16 adds a few dozen bytes of code, and no extra memory
requirements. It is simpler than UTF 8! What's the problem?
Steven
----- Original Message -----
From: <don@lexmark.com>
To: "Steven Pemberton" <Steven.Pemberton@cwi.nl>
Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>; <w3c-html-wg@w3.org>;
<don@lexmark.com>; <voyager-issues@mn.aptest.com>;
<elliott.bradshaw@zoran.com>; <www-html@w3.org>
Sent: Thursday, October 16, 2003 12:20 AM
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
> Steven, et al:
>
> The real problem is that the entire XML architecture was designed assuming
> high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We
> have already seen push back in other standards groups that consumer
> electronic devices and other smaller, lighter devices cannot afford all
the
> luxuries demand by an obese XML architecture. Unless the XML community
> accepts subsetting, we can't expect the broadest support for XML to happen
> at the low end until the price/performance ratios experience another order
> or two magnitude improvement. As recently reported in several of the
trade
> magazines focused on IT professionals, the deployment of XML and Web
> Services are have significant negative impacts on the IT infrastructure
> especially in the area of bandwidth utilization. This is just another
> symptom of the same problem.
>
> I know I will lose this argument in the W3C but the realities of the
> XHTML-Print implementations will blow off UTF-16 as more fat with no
> benefit and simply not support it, "interoperable" or not.
>
> Sorry I'm not pure but practical.
>
> *******************************************
> Don Wright don@lexmark.com
>
> Chair, IEEE SA Standards Board
> Member, IEEE-ISTO Board of Directors
> f.wright@ieee.org / f.wright@computer.org
>
> Director, Alliances and Standards
> Lexmark International
> 740 New Circle Rd C14/082-3
> Lexington, Ky 40550
> 859-825-4808 (phone) 603-963-8352 (fax)
> *******************************************
>
>
>
>
> "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM
>
> To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
> <w3c-html-wg@w3.org>, <don@lexmark.com>
> cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
> <www-html@w3.org>
> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
>
> > From: don@lexmark.com [mailto:don@lexmark.com]
>
> > So let me understand this....
> >
> > Because people have poorly designed and written XML applications running
> on
> > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow the
> > control over whether UTF-8 or UTF-16 are emitted, we are expecting to
> burden
> > $49 printers with code to be able to detect and interpret both.
>
> No Don. It is about interoperability and conforming to standards. XML
> allows
> documents to be encoded in either UTF8 or UTF 16: consumers must accept
> both, producers may produce either. An XHTML-Print printer will be just a
> consumer of an XML byte-stream at some IP address; we don't want to burden
> every program in the world that can produce XML with a switch that says
> "this output is going to a poor lowly XHTML Print processor that can't
deal
> with UTF-16, so please produce UTF-8", especially since UTF 16 is the easy
> one to implement, and can only cost a few dozen bytes at best.
>
> If we changed this, XHTML Print would have to go back to last call, and
you
> can bet your boots that the XML community would rise up against us, as it
> has in the past, and I can tell you we don't want to go there, and we
would
> have a hundred people registering objections.
>
> Conforming to XML requirements comes with the territory of being XHTML.
The
> XML community will not take lightly to us messing with their standards.
>
> Best wishes,
>
> Steven Pemberton
>
>
>
>
>
>
>

FOLLOWUP 18:

From: don@lexmark.com
One more thing, just one more thing. Every option or alternative adds one
more thing.
I think I'll pass on that one more thin mint.
*******************************************
Don Wright don@lexmark.com
Chair, IEEE SA Standards Board
Member, IEEE-ISTO Board of Directors
f.wright@ieee.org / f.wright@computer.org
Director, Alliances and Standards
Lexmark International
740 New Circle Rd C14/082-3
Lexington, Ky 40550
859-825-4808 (phone) 603-963-8352 (fax)
*******************************************
"Steven Pemberton" <steven.pemberton@cwi.nl> on 10/15/2003 07:26:24 PM
To: <don@lexmark.com>
cc: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
<w3c-html-wg@w3.org>, <don@lexmark.com>,
<voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
<www-html@w3.org>
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
But support for UTF 16 adds a few dozen bytes of code, and no extra memory
requirements. It is simpler than UTF 8! What's the problem?
Steven
----- Original Message -----
From: <don@lexmark.com>
To: "Steven Pemberton" <Steven.Pemberton@cwi.nl>
Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>;
<w3c-html-wg@w3.org>;
<don@lexmark.com>; <voyager-issues@mn.aptest.com>;
<elliott.bradshaw@zoran.com>; <www-html@w3.org>
Sent: Thursday, October 16, 2003 12:20 AM
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
> Steven, et al:
>
> The real problem is that the entire XML architecture was designed
assuming
> high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We
> have already seen push back in other standards groups that consumer
> electronic devices and other smaller, lighter devices cannot afford all
the
> luxuries demand by an obese XML architecture. Unless the XML community
> accepts subsetting, we can't expect the broadest support for XML to
happen
> at the low end until the price/performance ratios experience another
order
> or two magnitude improvement. As recently reported in several of the
trade
> magazines focused on IT professionals, the deployment of XML and Web
> Services are have significant negative impacts on the IT infrastructure
> especially in the area of bandwidth utilization. This is just another
> symptom of the same problem.
>
> I know I will lose this argument in the W3C but the realities of the
> XHTML-Print implementations will blow off UTF-16 as more fat with no
> benefit and simply not support it, "interoperable" or not.
>
> Sorry I'm not pure but practical.
>
> *******************************************
> Don Wright don@lexmark.com
>
> Chair, IEEE SA Standards Board
> Member, IEEE-ISTO Board of Directors
> f.wright@ieee.org / f.wright@computer.org
>
> Director, Alliances and Standards
> Lexmark International
> 740 New Circle Rd C14/082-3
> Lexington, Ky 40550
> 859-825-4808 (phone) 603-963-8352 (fax)
> *******************************************
>
>
>
>
> "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM
>
> To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
> <w3c-html-wg@w3.org>, <don@lexmark.com>
> cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
> <www-html@w3.org>
> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
>
> > From: don@lexmark.com [mailto:don@lexmark.com]
>
> > So let me understand this....
> >
> > Because people have poorly designed and written XML applications
running
> on
> > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow
the
> > control over whether UTF-8 or UTF-16 are emitted, we are expecting to
> burden
> > $49 printers with code to be able to detect and interpret both.
>
> No Don. It is about interoperability and conforming to standards. XML
> allows
> documents to be encoded in either UTF8 or UTF 16: consumers must accept
> both, producers may produce either. An XHTML-Print printer will be just a
> consumer of an XML byte-stream at some IP address; we don't want to
burden
> every program in the world that can produce XML with a switch that says
> "this output is going to a poor lowly XHTML Print processor that can't
deal
> with UTF-16, so please produce UTF-8", especially since UTF 16 is the
easy
> one to implement, and can only cost a few dozen bytes at best.
>
> If we changed this, XHTML Print would have to go back to last call, and
you
> can bet your boots that the XML community would rise up against us, as it
> has in the past, and I can tell you we don't want to go there, and we
would
> have a hundred people registering objections.
>
> Conforming to XML requirements comes with the territory of being XHTML.
The
> XML community will not take lightly to us messing with their standards.
>
> Best wishes,
>
> Steven Pemberton
>
>
>
>
>
>
>

FOLLOWUP 19:

From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
Don,
Here is a new section in the Design Rationale portion of the spec:
<h3 id="s.1.3.7">1.3.7 Character Model</h3>
<p>
The W3C architectural specification <cite>Character Model for the
World Wide Web 1.0</cite> [<a href="#ref_charmod">CHARMOD</a>] gives
the <em title="RECOMMENDED in RFC 2119 context"
class="RFC2119">RECOMMENDED</em> representation of characters in
XHTML-Print.
Authors of XHTML-Print producing applications
<em title="SHOULD in RFC 2119 context" class="RFC2119">SHOULD</em>
be aware that lost cost printers might be limited in both
processing power and memory and therefore,
that fully-normalized ([<a href="#ref_charmod">CHARMOD</a>],
<a href="http://www.w3.org/TR/charmod/#sec-FullyNormalized">4.2.3)
utf-8 encoded documents could print more quickly than documents
in other forms and encodings.
</p>
I hope that this section will help discourage UTF-16.
Jim

FOLLOWUP 20:

From: Henri Sivonen <hsivonen@iki.fi>
On Thursday, Oct 16, 2003, at 01:20 Europe/Helsinki, don@lexmark.com
wrote:
> The real problem is that the entire XML architecture was designed
> assuming
> high end boxes like the 3 GHz Pentium with 512 megabytes of memory.
Lesser devices can host expat. However, if a device can't host expat,
perhaps it would be better to use something other than XML to
communicate with the device.
> We have already seen push back in other standards groups that consumer
> electronic devices and other smaller, lighter devices cannot afford
> all the
> luxuries demand by an obese XML architecture. Unless the XML community
> accepts subsetting, we can't expect the broadest support for XML to
> happen
> at the low end until the price/performance ratios experience another
> order
> or two magnitude improvement.
If you subset XML, is support for the subset support for XML?
What's the point of building a language on application-specific
almost-XML? A Language built on such almost-XML breaks expectations
(either in software or in the minds of people who need to deal with the
language). If you can't use tools that are based on the assumption that
the data they process is *exactly* XML and the programmers' knowledge
about XML isn't guaranteed to apply, wouldn't it be less confusing to
invent another grammar entirely and not call it XML?
A well-defined extended subset of XML (for example: UTF-8 only,
normalization form C only, no doctype, no PIs, no CDATA sections, no
epilog, all HTML character entities predefined, namespace processing
mandatory) would be more useful that having specs layered on top of XML
1.0 trying to readjust what XML 1.0 is.
XHTML-Print printers get data over HTTP which is over TCP. It would be
ludicrous to tweak the TCP header format in the XHTML-Print spec.
> I know I will lose this argument in the W3C but the realities of the
> XHTML-Print implementations will blow off UTF-16 as more fat with no
> benefit and simply not support it, "interoperable" or not.
Converting UTF-16 to UTF-8 really isn't a big deal. It's basically a
matter of shifting bits.
Considering eliminating fat, I'd much rather eliminate character
entities[1] and references to the external DTD subset[2]. Character
entities are a burden in any case. They require either processing the
external DTD subset (bad for execution speed and memory requirements)
or implementing an extra feature which doesn't belong in an XML
processor (bad for conformance and yet redundant since there are
conforming ways of representing characters).
[1]
http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-
Print?id=6776;user=guest
[2]
http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-
Print?id=6773;user=guest
--
Henri Sivonen
hsivonen@iki.fi
http://www.iki.fi/hsivonen/

FOLLOWUP 21:

From: don@lexmark.com
Steven:
I think your answer proves my point that the XML commmunity did not and
does not consider the limitations of low cost, constrained embedded
environments when developing XML.
You make the assertion that no extra memory is required yet the reality is
quite the opposite.
Please tell me if I'm wrong, but my understanding of UTF-8 and UTF-16 is
that:
1) Every XHTML tag will require twice as many bytes when represented in
UTF-16 versus UTF-8
2) Every English XHTML-Print print job will be twice as big encoded with
UTF-16 versus UTF-8
3) Every "Latin 1" print job will be larger approaching 2X in size.
When you double the data's size, buffers have to double to be able to hold
and manipulate an equivalent amount of print stream content. There is real
cost and performance costs to be paid to deal with UTF-16 encoding
especially when dealing with western character sets. When a device is
designed to deal with the far east "characters" there are other penalties
to be paid in things like the size of the font load that mitigate the
UTF-16 versus UTF-8 encoding issue.
*******************************************
Don Wright don@lexmark.com
Chair, IEEE SA Standards Board
Member, IEEE-ISTO Board of Directors
f.wright@ieee.org / f.wright@computer.org
Director, Alliances and Standards
Lexmark International
740 New Circle Rd C14/082-3
Lexington, Ky 40550
859-825-4808 (phone) 603-963-8352 (fax)
*******************************************
"Steven Pemberton" <steven.pemberton@cwi.nl> on 10/15/2003 07:26:24 PM
To: <don@lexmark.com>
cc: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
<w3c-html-wg@w3.org>, <don@lexmark.com>,
<voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
<www-html@w3.org>
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
But support for UTF 16 adds a few dozen bytes of code, and no extra memory
requirements. It is simpler than UTF 8! What's the problem?
Steven
----- Original Message -----
From: <don@lexmark.com>
To: "Steven Pemberton" <Steven.Pemberton@cwi.nl>
Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>;
<w3c-html-wg@w3.org>;
<don@lexmark.com>; <voyager-issues@mn.aptest.com>;
<elliott.bradshaw@zoran.com>; <www-html@w3.org>
Sent: Thursday, October 16, 2003 12:20 AM
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
> Steven, et al:
>
> The real problem is that the entire XML architecture was designed
assuming
> high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We
> have already seen push back in other standards groups that consumer
> electronic devices and other smaller, lighter devices cannot afford all
the
> luxuries demand by an obese XML architecture. Unless the XML community
> accepts subsetting, we can't expect the broadest support for XML to
happen
> at the low end until the price/performance ratios experience another
order
> or two magnitude improvement. As recently reported in several of the
trade
> magazines focused on IT professionals, the deployment of XML and Web
> Services are have significant negative impacts on the IT infrastructure
> especially in the area of bandwidth utilization. This is just another
> symptom of the same problem.
>
> I know I will lose this argument in the W3C but the realities of the
> XHTML-Print implementations will blow off UTF-16 as more fat with no
> benefit and simply not support it, "interoperable" or not.
>
> Sorry I'm not pure but practical.
>
> *******************************************
> Don Wright don@lexmark.com
>
> Chair, IEEE SA Standards Board
> Member, IEEE-ISTO Board of Directors
> f.wright@ieee.org / f.wright@computer.org
>
> Director, Alliances and Standards
> Lexmark International
> 740 New Circle Rd C14/082-3
> Lexington, Ky 40550
> 859-825-4808 (phone) 603-963-8352 (fax)
> *******************************************
>
>
>
>
> "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM
>
> To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
> <w3c-html-wg@w3.org>, <don@lexmark.com>
> cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
> <www-html@w3.org>
> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
>
> > From: don@lexmark.com [mailto:don@lexmark.com]
>
> > So let me understand this....
> >
> > Because people have poorly designed and written XML applications
running
> on
> > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow
the
> > control over whether UTF-8 or UTF-16 are emitted, we are expecting to
> burden
> > $49 printers with code to be able to detect and interpret both.
>
> No Don. It is about interoperability and conforming to standards. XML
> allows
> documents to be encoded in either UTF8 or UTF 16: consumers must accept
> both, producers may produce either. An XHTML-Print printer will be just a
> consumer of an XML byte-stream at some IP address; we don't want to
burden
> every program in the world that can produce XML with a switch that says
> "this output is going to a poor lowly XHTML Print processor that can't
deal
> with UTF-16, so please produce UTF-8", especially since UTF 16 is the
easy
> one to implement, and can only cost a few dozen bytes at best.
>
> If we changed this, XHTML Print would have to go back to last call, and
you
> can bet your boots that the XML community would rise up against us, as it
> has in the past, and I can tell you we don't want to go there, and we
would
> have a hundred people registering objections.
>
> Conforming to XML requirements comes with the territory of being XHTML.
The
> XML community will not take lightly to us messing with their standards.
>
> Best wishes,
>
> Steven Pemberton
>
>
>
>
>
>
>

FOLLOWUP 22:

From: "Steven Pemberton" <steven.pemberton@cwi.nl>
Don,
I've been wondering for a long time if that was the misunderstanding, but I
was assured it wasn't.
UTF 16 and UTF 8 are *external* representations. The internal amount of
storage needed for them is identical, and completely up to you how you
store.
The only extra memory needed is the couple of dozen extra bytes of code to
convert UTF 16 into whatever internal representation you use.
Best wishes,
Steven
----- Original Message -----
From: <don@lexmark.com>
To: "Steven Pemberton" <steven.pemberton@cwi.nl>
Cc: <don@lexmark.com>; "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>;
<w3c-html-wg@w3.org>; <voyager-issues@mn.aptest.com>;
<elliott.bradshaw@zoran.com>; <www-html@w3.org>
Sent: Thursday, October 16, 2003 2:51 PM
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
>
> Steven:
>
> I think your answer proves my point that the XML commmunity did not and
> does not consider the limitations of low cost, constrained embedded
> environments when developing XML.
>
> You make the assertion that no extra memory is required yet the reality is
> quite the opposite.
>
> Please tell me if I'm wrong, but my understanding of UTF-8 and UTF-16 is
> that:
>
> 1) Every XHTML tag will require twice as many bytes when represented in
> UTF-16 versus UTF-8
> 2) Every English XHTML-Print print job will be twice as big encoded with
> UTF-16 versus UTF-8
> 3) Every "Latin 1" print job will be larger approaching 2X in size.
>
> When you double the data's size, buffers have to double to be able to hold
> and manipulate an equivalent amount of print stream content. There is
real
> cost and performance costs to be paid to deal with UTF-16 encoding
> especially when dealing with western character sets. When a device is
> designed to deal with the far east "characters" there are other penalties
> to be paid in things like the size of the font load that mitigate the
> UTF-16 versus UTF-8 encoding issue.
>
> *******************************************
> Don Wright don@lexmark.com
>
> Chair, IEEE SA Standards Board
> Member, IEEE-ISTO Board of Directors
> f.wright@ieee.org / f.wright@computer.org
>
> Director, Alliances and Standards
> Lexmark International
> 740 New Circle Rd C14/082-3
> Lexington, Ky 40550
> 859-825-4808 (phone) 603-963-8352 (fax)
> *******************************************
>
>
>
>
>
> "Steven Pemberton" <steven.pemberton@cwi.nl> on 10/15/2003 07:26:24 PM
>
> To: <don@lexmark.com>
> cc: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
> <w3c-html-wg@w3.org>, <don@lexmark.com>,
> <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
> <www-html@w3.org>
> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
>
> But support for UTF 16 adds a few dozen bytes of code, and no extra memory
> requirements. It is simpler than UTF 8! What's the problem?
>
> Steven
>
> ----- Original Message -----
> From: <don@lexmark.com>
> To: "Steven Pemberton" <Steven.Pemberton@cwi.nl>
> Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>;
> <w3c-html-wg@w3.org>;
> <don@lexmark.com>; <voyager-issues@mn.aptest.com>;
> <elliott.bradshaw@zoran.com>; <www-html@w3.org>
> Sent: Thursday, October 16, 2003 12:20 AM
> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
>
> >
> > Steven, et al:
> >
> > The real problem is that the entire XML architecture was designed
> assuming
> > high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We
> > have already seen push back in other standards groups that consumer
> > electronic devices and other smaller, lighter devices cannot afford all
> the
> > luxuries demand by an obese XML architecture. Unless the XML community
> > accepts subsetting, we can't expect the broadest support for XML to
> happen
> > at the low end until the price/performance ratios experience another
> order
> > or two magnitude improvement. As recently reported in several of the
> trade
> > magazines focused on IT professionals, the deployment of XML and Web
> > Services are have significant negative impacts on the IT infrastructure
> > especially in the area of bandwidth utilization. This is just another
> > symptom of the same problem.
> >
> > I know I will lose this argument in the W3C but the realities of the
> > XHTML-Print implementations will blow off UTF-16 as more fat with no
> > benefit and simply not support it, "interoperable" or not.
> >
> > Sorry I'm not pure but practical.
> >
> > *******************************************
> > Don Wright don@lexmark.com
> >
> > Chair, IEEE SA Standards Board
> > Member, IEEE-ISTO Board of Directors
> > f.wright@ieee.org / f.wright@computer.org
> >
> > Director, Alliances and Standards
> > Lexmark International
> > 740 New Circle Rd C14/082-3
> > Lexington, Ky 40550
> > 859-825-4808 (phone) 603-963-8352 (fax)
> > *******************************************
> >
> >
> >
> >
> > "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM
> >
> > To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
> > <w3c-html-wg@w3.org>, <don@lexmark.com>
> > cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
> > <www-html@w3.org>
> > Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
> >
> >
> > > From: don@lexmark.com [mailto:don@lexmark.com]
> >
> > > So let me understand this....
> > >
> > > Because people have poorly designed and written XML applications
> running
> > on
> > > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow
> the
> > > control over whether UTF-8 or UTF-16 are emitted, we are expecting to
> > burden
> > > $49 printers with code to be able to detect and interpret both.
> >
> > No Don. It is about interoperability and conforming to standards. XML
> > allows
> > documents to be encoded in either UTF8 or UTF 16: consumers must accept
> > both, producers may produce either. An XHTML-Print printer will be just
a
> > consumer of an XML byte-stream at some IP address; we don't want to
> burden
> > every program in the world that can produce XML with a switch that says
> > "this output is going to a poor lowly XHTML Print processor that can't
> deal
> > with UTF-16, so please produce UTF-8", especially since UTF 16 is the
> easy
> > one to implement, and can only cost a few dozen bytes at best.
> >
> > If we changed this, XHTML Print would have to go back to last call, and
> you
> > can bet your boots that the XML community would rise up against us, as
it
> > has in the past, and I can tell you we don't want to go there, and we
> would
> > have a hundred people registering objections.
> >
> > Conforming to XML requirements comes with the territory of being XHTML.
> The
> > XML community will not take lightly to us messing with their standards.
> >
> > Best wishes,
> >
> > Steven Pemberton
> >
> >
> >
> >
> >
> >
> >
>
>
>
>
>
>
>

FOLLOWUP 23:

From: Rowland Shaw <Rowland.Shaw@crystaldecisions.com>
...and for every Asian language, each character can take up to three bytes
(in UTF-8 vs. two in UTF-16)
Taking a complete random Japanese character (Hiragana Letter Small A)
U+3041, in UTF-8 as 0xE3 0x81 0x81 -- this assumes that you are willing to
deal with characters as a MBCS, and that you aren't going to convert to UCS2
internally.
English has the biggest saving by saving as UTF-8 (so let it), but for most
other languages, there is no benefit or worse, a 50% growth in sizes (vs.
UTF-16).
If UTF-16 is disallowed, it's no longer an XML application (which may be a
road to go down) by definition on the minimum bar set for XML (back in the
days of 486's and 8Mb machines). Thinking about it, my printer nowadays at
home has more RAM in it than my PC when XML was being created...
-----Original Message-----
From: don@lexmark.com [mailto:don@lexmark.com]
Sent: 16 October 2003 14:00
To: Steven Pemberton
Cc: don@lexmark.com; BIGELOW,JIM (HP-Boise,ex1); w3c-html-wg@w3.org;
voyager-issues@mn.aptest.com; elliott.bradshaw@zoran.com; www-html@w3.org
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
Steven:
I think your answer proves my point that the XML commmunity did not and
does not consider the limitations of low cost, constrained embedded
environments when developing XML.
You make the assertion that no extra memory is required yet the reality is
quite the opposite.
Please tell me if I'm wrong, but my understanding of UTF-8 and UTF-16 is
that:
1) Every XHTML tag will require twice as many bytes when represented in
UTF-16 versus UTF-8
2) Every English XHTML-Print print job will be twice as big encoded with
UTF-16 versus UTF-8
3) Every "Latin 1" print job will be larger approaching 2X in size.
When you double the data's size, buffers have to double to be able to hold
and manipulate an equivalent amount of print stream content. There is real
cost and performance costs to be paid to deal with UTF-16 encoding
especially when dealing with western character sets. When a device is
designed to deal with the far east "characters" there are other penalties
to be paid in things like the size of the font load that mitigate the
UTF-16 versus UTF-8 encoding issue.
*******************************************
Don Wright don@lexmark.com
Chair, IEEE SA Standards Board
Member, IEEE-ISTO Board of Directors
f.wright@ieee.org / f.wright@computer.org
Director, Alliances and Standards
Lexmark International
740 New Circle Rd C14/082-3
Lexington, Ky 40550
859-825-4808 (phone) 603-963-8352 (fax)
*******************************************
"Steven Pemberton" <steven.pemberton@cwi.nl> on 10/15/2003 07:26:24 PM
To: <don@lexmark.com>
cc: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
<w3c-html-wg@w3.org>, <don@lexmark.com>,
<voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
<www-html@w3.org>
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
But support for UTF 16 adds a few dozen bytes of code, and no extra memory
requirements. It is simpler than UTF 8! What's the problem?
Steven
----- Original Message -----
From: <don@lexmark.com>
To: "Steven Pemberton" <Steven.Pemberton@cwi.nl>
Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>;
<w3c-html-wg@w3.org>;
<don@lexmark.com>; <voyager-issues@mn.aptest.com>;
<elliott.bradshaw@zoran.com>; <www-html@w3.org>
Sent: Thursday, October 16, 2003 12:20 AM
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
> Steven, et al:
>
> The real problem is that the entire XML architecture was designed
assuming
> high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We
> have already seen push back in other standards groups that consumer
> electronic devices and other smaller, lighter devices cannot afford all
the
> luxuries demand by an obese XML architecture. Unless the XML community
> accepts subsetting, we can't expect the broadest support for XML to
happen
> at the low end until the price/performance ratios experience another
order
> or two magnitude improvement. As recently reported in several of the
trade
> magazines focused on IT professionals, the deployment of XML and Web
> Services are have significant negative impacts on the IT infrastructure
> especially in the area of bandwidth utilization. This is just another
> symptom of the same problem.
>
> I know I will lose this argument in the W3C but the realities of the
> XHTML-Print implementations will blow off UTF-16 as more fat with no
> benefit and simply not support it, "interoperable" or not.
>
> Sorry I'm not pure but practical.
>
> *******************************************
> Don Wright don@lexmark.com
>
> Chair, IEEE SA Standards Board
> Member, IEEE-ISTO Board of Directors
> f.wright@ieee.org / f.wright@computer.org
>
> Director, Alliances and Standards
> Lexmark International
> 740 New Circle Rd C14/082-3
> Lexington, Ky 40550
> 859-825-4808 (phone) 603-963-8352 (fax)
> *******************************************
>
>
>
>
> "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM
>
> To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
> <w3c-html-wg@w3.org>, <don@lexmark.com>
> cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
> <www-html@w3.org>
> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
>
> > From: don@lexmark.com [mailto:don@lexmark.com]
>
> > So let me understand this....
> >
> > Because people have poorly designed and written XML applications
running
> on
> > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow
the
> > control over whether UTF-8 or UTF-16 are emitted, we are expecting to
> burden
> > $49 printers with code to be able to detect and interpret both.
>
> No Don. It is about interoperability and conforming to standards. XML
> allows
> documents to be encoded in either UTF8 or UTF 16: consumers must accept
> both, producers may produce either. An XHTML-Print printer will be just a
> consumer of an XML byte-stream at some IP address; we don't want to
burden
> every program in the world that can produce XML with a switch that says
> "this output is going to a poor lowly XHTML Print processor that can't
deal
> with UTF-16, so please produce UTF-8", especially since UTF 16 is the
easy
> one to implement, and can only cost a few dozen bytes at best.
>
> If we changed this, XHTML Print would have to go back to last call, and
you
> can bet your boots that the XML community would rise up against us, as it
> has in the past, and I can tell you we don't want to go there, and we
would
> have a hundred people registering objections.
>
> Conforming to XML requirements comes with the territory of being XHTML.
The
> XML community will not take lightly to us messing with their standards.
>
> Best wishes,
>
> Steven Pemberton
>
>
>
>
>
>
>

FOLLOWUP 24:

From: elliott.bradshaw@zoran.com
Don,
I agree with the argument that a front end can convert from UTF-16 to UTF-8
or whatever internal form is used, and have essentially no impact on memory
needs.
"A couple of dozen bytes" might be a little optimistic for this logic :^)
, but it's pretty straightforward:
-look at first 16 bits to detect a UTF-16 mark
-for each double byte emit the UTF-8 (or other) equivalent
Of course a printer could choose to store Asian data differently than
Latin, and save some space compared to native UTF-8. This decision is
orthogonal to the form of the input. But this logic may not be worth it
and is not needed for compliance.
Frugally,
Elliott
--------------------------------------------------------------------------------
Elliott Bradshaw
Director, Software Engineering
Zoran Imaging Division (formerly Oak Technology Imaging Group)
781 638-7534
Rowland Shaw
<Rowland.Shaw@crystaldeci To: "'don@lexmark.com'" <don@lexmark.com>, Steven
sions.com> Pemberton <steven.pemberton@cwi.nl>
cc: "BIGELOW,JIM (HP-Boise,ex1)"
10/16/2003 09:16 AM <jim.bigelow@hp.com>, w3c-html-wg@w3.org,
voyager-issues@mn.aptest.com,
elliott.bradshaw@zoran.com, www-html@w3.org
Subject: RE: allow UTF-16 not just UTF-8
(PR#6774)
...and for every Asian language, each character can take up to three bytes
(in UTF-8 vs. two in UTF-16)
Taking a complete random Japanese character (Hiragana Letter Small A)
U+3041, in UTF-8 as 0xE3 0x81 0x81 -- this assumes that you are willing to
deal with characters as a MBCS, and that you aren't going to convert to
UCS2
internally.
English has the biggest saving by saving as UTF-8 (so let it), but for most
other languages, there is no benefit or worse, a 50% growth in sizes (vs.
UTF-16).
If UTF-16 is disallowed, it's no longer an XML application (which may be a
road to go down) by definition on the minimum bar set for XML (back in the
days of 486's and 8Mb machines). Thinking about it, my printer nowadays at
home has more RAM in it than my PC when XML was being created...
-----Original Message-----
From: don@lexmark.com [mailto:don@lexmark.com]
Sent: 16 October 2003 14:00
To: Steven Pemberton
Cc: don@lexmark.com; BIGELOW,JIM (HP-Boise,ex1); w3c-html-wg@w3.org;
voyager-issues@mn.aptest.com; elliott.bradshaw@zoran.com; www-html@w3.org
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
Steven:
I think your answer proves my point that the XML commmunity did not and
does not consider the limitations of low cost, constrained embedded
environments when developing XML.
You make the assertion that no extra memory is required yet the reality is
quite the opposite.
Please tell me if I'm wrong, but my understanding of UTF-8 and UTF-16 is
that:
1) Every XHTML tag will require twice as many bytes when represented in
UTF-16 versus UTF-8
2) Every English XHTML-Print print job will be twice as big encoded with
UTF-16 versus UTF-8
3) Every "Latin 1" print job will be larger approaching 2X in size.
When you double the data's size, buffers have to double to be able to hold
and manipulate an equivalent amount of print stream content. There is real
cost and performance costs to be paid to deal with UTF-16 encoding
especially when dealing with western character sets. When a device is
designed to deal with the far east "characters" there are other penalties
to be paid in things like the size of the font load that mitigate the
UTF-16 versus UTF-8 encoding issue.
*******************************************
Don Wright don@lexmark.com
Chair, IEEE SA Standards Board
Member, IEEE-ISTO Board of Directors
f.wright@ieee.org / f.wright@computer.org
Director, Alliances and Standards
Lexmark International
740 New Circle Rd C14/082-3
Lexington, Ky 40550
859-825-4808 (phone) 603-963-8352 (fax)
*******************************************
"Steven Pemberton" <steven.pemberton@cwi.nl> on 10/15/2003 07:26:24 PM
To: <don@lexmark.com>
cc: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
<w3c-html-wg@w3.org>, <don@lexmark.com>,
<voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
<www-html@w3.org>
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
But support for UTF 16 adds a few dozen bytes of code, and no extra memory
requirements. It is simpler than UTF 8! What's the problem?
Steven
----- Original Message -----
From: <don@lexmark.com>
To: "Steven Pemberton" <Steven.Pemberton@cwi.nl>
Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>;
<w3c-html-wg@w3.org>;
<don@lexmark.com>; <voyager-issues@mn.aptest.com>;
<elliott.bradshaw@zoran.com>; <www-html@w3.org>
Sent: Thursday, October 16, 2003 12:20 AM
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
> Steven, et al:
>
> The real problem is that the entire XML architecture was designed
assuming
> high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We
> have already seen push back in other standards groups that consumer
> electronic devices and other smaller, lighter devices cannot afford all
the
> luxuries demand by an obese XML architecture. Unless the XML community
> accepts subsetting, we can't expect the broadest support for XML to
happen
> at the low end until the price/performance ratios experience another
order
> or two magnitude improvement. As recently reported in several of the
trade
> magazines focused on IT professionals, the deployment of XML and Web
> Services are have significant negative impacts on the IT infrastructure
> especially in the area of bandwidth utilization. This is just another
> symptom of the same problem.
>
> I know I will lose this argument in the W3C but the realities of the
> XHTML-Print implementations will blow off UTF-16 as more fat with no
> benefit and simply not support it, "interoperable" or not.
>
> Sorry I'm not pure but practical.
>
> *******************************************
> Don Wright don@lexmark.com
>
> Chair, IEEE SA Standards Board
> Member, IEEE-ISTO Board of Directors
> f.wright@ieee.org / f.wright@computer.org
>
> Director, Alliances and Standards
> Lexmark International
> 740 New Circle Rd C14/082-3
> Lexington, Ky 40550
> 859-825-4808 (phone) 603-963-8352 (fax)
> *******************************************
>
>
>
>
> "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM
>
> To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
> <w3c-html-wg@w3.org>, <don@lexmark.com>
> cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
> <www-html@w3.org>
> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
>
> > From: don@lexmark.com [mailto:don@lexmark.com]
>
> > So let me understand this....
> >
> > Because people have poorly designed and written XML applications
running
> on
> > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow
the
> > control over whether UTF-8 or UTF-16 are emitted, we are expecting to
> burden
> > $49 printers with code to be able to detect and interpret both.
>
> No Don. It is about interoperability and conforming to standards. XML
> allows
> documents to be encoded in either UTF8 or UTF 16: consumers must accept
> both, producers may produce either. An XHTML-Print printer will be just a
> consumer of an XML byte-stream at some IP address; we don't want to
burden
> every program in the world that can produce XML with a switch that says
> "this output is going to a poor lowly XHTML Print processor that can't
deal
> with UTF-16, so please produce UTF-8", especially since UTF 16 is the
easy
> one to implement, and can only cost a few dozen bytes at best.
>
> If we changed this, XHTML Print would have to go back to last call, and
you
> can bet your boots that the XML community would rise up against us, as it
> has in the past, and I can tell you we don't want to go there, and we
would
> have a hundred people registering objections.
>
> Conforming to XML requirements comes with the territory of being XHTML.
The
> XML community will not take lightly to us messing with their standards.
>
> Best wishes,
>
> Steven Pemberton
>
>
>
>
>
>
>

FOLLOWUP 25:

From: don@lexmark.com
Steven:
Of course I knew this was jsut the external representation.
I'm trying to reduce conversions and reduce the sizes of buffers, etc.
necessary to do this work. I have no doubt it can be done, I'm just trying
to do things with smaller less powerful processors and with less available
memory than what programmers normally expect to be available in today's
environment.
*******************************************
Don Wright don@lexmark.com
Chair, IEEE SA Standards Board
Member, IEEE-ISTO Board of Directors
f.wright@ieee.org / f.wright@computer.org
Director, Alliances and Standards
Lexmark International
740 New Circle Rd C14/082-3
Lexington, Ky 40550
859-825-4808 (phone) 603-963-8352 (fax)
*******************************************
"Steven Pemberton" <steven.pemberton@cwi.nl> on 10/16/2003 09:10:59 AM
To: <don@lexmark.com>
cc: <don@lexmark.com>, "BIGELOW,JIM \(HP-Boise,ex1\)"
<jim.bigelow@hp.com>, <w3c-html-wg@w3.org>,
<voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
<www-html@w3.org>
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
Don,
I've been wondering for a long time if that was the misunderstanding, but I
was assured it wasn't.
UTF 16 and UTF 8 are *external* representations. The internal amount of
storage needed for them is identical, and completely up to you how you
store.
The only extra memory needed is the couple of dozen extra bytes of code to
convert UTF 16 into whatever internal representation you use.
Best wishes,
Steven
----- Original Message -----
From: <don@lexmark.com>
To: "Steven Pemberton" <steven.pemberton@cwi.nl>
Cc: <don@lexmark.com>; "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>;
<w3c-html-wg@w3.org>; <voyager-issues@mn.aptest.com>;
<elliott.bradshaw@zoran.com>; <www-html@w3.org>
Sent: Thursday, October 16, 2003 2:51 PM
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
>
> Steven:
>
> I think your answer proves my point that the XML commmunity did not and
> does not consider the limitations of low cost, constrained embedded
> environments when developing XML.
>
> You make the assertion that no extra memory is required yet the reality
is
> quite the opposite.
>
> Please tell me if I'm wrong, but my understanding of UTF-8 and UTF-16 is
> that:
>
> 1) Every XHTML tag will require twice as many bytes when represented in
> UTF-16 versus UTF-8
> 2) Every English XHTML-Print print job will be twice as big encoded with
> UTF-16 versus UTF-8
> 3) Every "Latin 1" print job will be larger approaching 2X in size.
>
> When you double the data's size, buffers have to double to be able to
hold
> and manipulate an equivalent amount of print stream content. There is
real
> cost and performance costs to be paid to deal with UTF-16 encoding
> especially when dealing with western character sets. When a device is
> designed to deal with the far east "characters" there are other penalties
> to be paid in things like the size of the font load that mitigate the
> UTF-16 versus UTF-8 encoding issue.
>
> *******************************************
> Don Wright don@lexmark.com
>
> Chair, IEEE SA Standards Board
> Member, IEEE-ISTO Board of Directors
> f.wright@ieee.org / f.wright@computer.org
>
> Director, Alliances and Standards
> Lexmark International
> 740 New Circle Rd C14/082-3
> Lexington, Ky 40550
> 859-825-4808 (phone) 603-963-8352 (fax)
> *******************************************
>
>
>
>
>
> "Steven Pemberton" <steven.pemberton@cwi.nl> on 10/15/2003 07:26:24 PM
>
> To: <don@lexmark.com>
> cc: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
> <w3c-html-wg@w3.org>, <don@lexmark.com>,
> <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
> <www-html@w3.org>
> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
>
> But support for UTF 16 adds a few dozen bytes of code, and no extra
memory
> requirements. It is simpler than UTF 8! What's the problem?
>
> Steven
>
> ----- Original Message -----
> From: <don@lexmark.com>
> To: "Steven Pemberton" <Steven.Pemberton@cwi.nl>
> Cc: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>;
> <w3c-html-wg@w3.org>;
> <don@lexmark.com>; <voyager-issues@mn.aptest.com>;
> <elliott.bradshaw@zoran.com>; <www-html@w3.org>
> Sent: Thursday, October 16, 2003 12:20 AM
> Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
>
>
> >
> > Steven, et al:
> >
> > The real problem is that the entire XML architecture was designed
> assuming
> > high end boxes like the 3 GHz Pentium with 512 megabytes of memory. We
> > have already seen push back in other standards groups that consumer
> > electronic devices and other smaller, lighter devices cannot afford all
> the
> > luxuries demand by an obese XML architecture. Unless the XML community
> > accepts subsetting, we can't expect the broadest support for XML to
> happen
> > at the low end until the price/performance ratios experience another
> order
> > or two magnitude improvement. As recently reported in several of the
> trade
> > magazines focused on IT professionals, the deployment of XML and Web
> > Services are have significant negative impacts on the IT infrastructure
> > especially in the area of bandwidth utilization. This is just another
> > symptom of the same problem.
> >
> > I know I will lose this argument in the W3C but the realities of the
> > XHTML-Print implementations will blow off UTF-16 as more fat with no
> > benefit and simply not support it, "interoperable" or not.
> >
> > Sorry I'm not pure but practical.
> >
> > *******************************************
> > Don Wright don@lexmark.com
> >
> > Chair, IEEE SA Standards Board
> > Member, IEEE-ISTO Board of Directors
> > f.wright@ieee.org / f.wright@computer.org
> >
> > Director, Alliances and Standards
> > Lexmark International
> > 740 New Circle Rd C14/082-3
> > Lexington, Ky 40550
> > 859-825-4808 (phone) 603-963-8352 (fax)
> > *******************************************
> >
> >
> >
> >
> > "Steven Pemberton" <Steven.Pemberton@cwi.nl> on 10/15/2003 09:18:15 AM
> >
> > To: "BIGELOW,JIM \(HP-Boise,ex1\)" <jim.bigelow@hp.com>,
> > <w3c-html-wg@w3.org>, <don@lexmark.com>
> > cc: <voyager-issues@mn.aptest.com>, <elliott.bradshaw@zoran.com>,
> > <www-html@w3.org>
> > Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
> >
> >
> > > From: don@lexmark.com [mailto:don@lexmark.com]
> >
> > > So let me understand this....
> > >
> > > Because people have poorly designed and written XML applications
> running
> > on
> > > 3 GHz Pentium 4s with 512 megabytes of real memory that do not allow
> the
> > > control over whether UTF-8 or UTF-16 are emitted, we are expecting to
> > burden
> > > $49 printers with code to be able to detect and interpret both.
> >
> > No Don. It is about interoperability and conforming to standards. XML
> > allows
> > documents to be encoded in either UTF8 or UTF 16: consumers must accept
> > both, producers may produce either. An XHTML-Print printer will be just
a
> > consumer of an XML byte-stream at some IP address; we don't want to
> burden
> > every program in the world that can produce XML with a switch that says
> > "this output is going to a poor lowly XHTML Print processor that can't
> deal
> > with UTF-16, so please produce UTF-8", especially since UTF 16 is the
> easy
> > one to implement, and can only cost a few dozen bytes at best.
> >
> > If we changed this, XHTML Print would have to go back to last call, and
> you
> > can bet your boots that the XML community would rise up against us, as
it
> > has in the past, and I can tell you we don't want to go there, and we
> would
> > have a hundred people registering objections.
> >
> > Conforming to XML requirements comes with the territory of being XHTML.
> The
> > XML community will not take lightly to us messing with their standards.
> >
> > Best wishes,
> >
> > Steven Pemberton
> >
> >
> >
> >
> >
> >
> >
>
>
>
>
>
>
>

FOLLOWUP 26:

From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
Don and Steven,
I want to expand on what you have said:
Don wrote:
> > 1) Every XHTML tag will require twice as many bytes when
> > represented in UTF-16 versus UTF-8
> > 2) Every English XHTML-Print print job will be twice as
> > big encoded with UTF-16 versus UTF-8
> > 3) Every "Latin 1" print job will be larger approaching
> > 2X in size.
> >
> > When you double the data's size, buffers have to double to
> > be able to hold and manipulate an equivalent amount of print
> > stream content.
This statement is only true for some print streams. See the discussion below
in "The problem space".
Steven wrote:
> UTF 16 and UTF 8 are *external* representations. The internal
> amount of storage needed for them is identical, and
> completely up to you how you store.
If a printer uses 16 bits internally to represent a character, then there
shouldn't be a difference in buffering requirements between utf-8 and utf-16
encoded files (see below for a more complete discussion). However, if a
printer uses 8 bits per character, then it has restricted itself to only
handle a subset of possible documents, those with ASCII characters. This is
a product-specific decision akin to that of whether to make a device print
in color or black & white or support landscape as well as portrait printing.
Therefore, I suggest that the spec say that a printer should support utf-16,
just as it now says it should support CSS, landscape printing, and color --
within the limits of the device. If a user buys a low-cost device that can
only print ASCII characters in portrait orientation, without color, style
sheets, or images, hopefully the price was inline with the printer's
abilities and other, more expensive, more capable devices are available as
needed.
Jim
The problem space
----------------------
There is a document composition continuum from documents with only text,
through mixed text and images, to documents that contain only images. At
the text-only end of the continuum, the effects on the document size of
UTF-16 vs. UTF-8 is a doubling of document size. At the image-only end of
the continuum, the effects on the document size of encoding in UTF-16 versus
UTF-8 are over-shadowed by the image data.
The table below illustrates three points on the document composition
continuum:
1. Text-only: a document that prints as one page of ASCII text (times, 10pt,
8in by 11in paper) [1]. Size, in bytes, is 6,282.
2. Text & Image: a one page document with one 3in x 5in image (166.7K bytes)
and the remainder text [2]. Size, in bytes, of document and image is
171,531.
3. Image-only: a one page document with eight 2in x 3.25in images (703.2K
bytes) and no text. [3] Size, in bytes, of document and eight images is
705,108.
Size (bytes): utf-8: %doc : utf-16: %doc
Text-only: 6,282: 100 : 12,566: 100
Text+Image: 4,776: 3.2 : 9,554: 5.4 (9,554 /(9,954+166,675)* 100)
Image-only: 1,916: .27 : 3,834: .54
There is another point of variability: the characters in the text portions
of the document. This is another continuum from ASCII only at one end to
Japanese, Chinese, Korean, and Hindi at the other.
"Table 1: UTF types" of [4] gives the following average bytes per code point
utf-8 utf-16
English 1 2
Latin-1 1.1 2
Greek,
Russian,
Arabic,
Hebrew 1.7 2
Japanese,
Chinese
Korean
Hindi 3 2
As the language/script of the text portion of the document changes from
English-only toward other scripts and languages, the size difference between
utf-8 and utf-16 decreases.
End-to-end solution
-------------------
If you look at the end-to-end solution, from the sending application to the
printer, the stages can be thought of as:
1. Sending Device: the data as represented in the sending device (a cell
phone for example)
2. Transmission: the data combined with markup and style information as and
XHTML-Print data stream and then encoded in either UTF-8 or UTF-16
3. Receiving Device: the printer -- breaking this into two parts gives:
3.a The XHTML-Print data stream as received
3.b The data without markup and style information and before printing. How
the data is stored is implementation dependent and how much memory is used
depends on how a character is represented -- 8 or 16 bits, and how much
buffer of the document is buffered. Each printer makes these choices,
8bits/char restricted the documents processed to Latin1 characters.
Stage Size utf-8 utf-16
1. app n - -
2. xmit n n-3n* 2n
3a. Pr n n-3n 2n
3b. Pr** n n-2n n-2n
* n-3n shows the variable sizing depending on characters being encode:
English only (n), CJK (3n)
** at Stage 3b, representing a character with 8bits restricts the characters
that can be represented to ASCII or Latin 1, 16 bits can represent all
characters.
Internal representation
If a printer uses 16 bits internally to represent a character, then there
shouldn't be difference in buffering requirements between utf-8 and utf-16
encoded files. However, if a printer uses 8 bits, then it has restricted
itself to only handle a subset of documents. This is a product-specific
decision akin to that of supporting color or not. Therefore, I suggest that
the spec say that a printer should support utf-16 just as it now say it
should support CSS, landscape printing, and color -- within the limits of
the device. If a user buys a low-cost device that can only print ASCII
characters in portrait orientation, without color, images or style,
hopefully the price is inline with the printer's abilities and other, more
expensive, more capable devices are available as needed.
[1] http://www.pwg.org/xhtml-print/W3C-Version/georgeb.html
[2] http://www.pwg.org/xhtml-print/W3C-Version/text+image.html
[3] http://www.pwg.org/xhtml-print/W3C-Version/image-only.html
[4] http://www-106.ibm.com/developerworks/library/utfencodingforms/

FOLLOWUP 27:

From: Michael Sweet <mike@easysw.com>
BIGELOW,JIM (HP-Boise,ex1) wrote:
> ...
> If a printer uses 16 bits internally to represent a character, then there
> shouldn't be a difference in buffering requirements between utf-8 and utf-16
> encoded files (see below for a more complete discussion). However, if a
> printer uses 8 bits per character, then it has restricted itself to only
> handle a subset of possible documents, those with ASCII characters. This is
> ...
I suggest there is another alternative - the implementation can
simply convert UTF-16 to UTF-8 as the document is being read, so
contrary to the previous comments there is no additional buffer
memory overhead, merely a small amount of code to convert from
UTF-16 to UTF-8.
Whether the implementation chooses to limit support to "latin"
text or not is another issue, but either way the *internal*
representation can be controlled by the vendor separate from the
external UTF-8/UTF-16/whatever representation.
--
______________________________________________________________________
Michael Sweet, Easy Software Products mike at easysw dot com
Printing Software for UNIX http://www.easysw.com

FOLLOWUP 28:

From: "Steven Pemberton" <steven.pemberton@cwi.nl>
UTF 8 and UTF 16 are just definitions of how you send a Unicode character
stream in an interoperable way over the wire. The character set is the same,
the characters are the same, it is just the encoding that is different.
It is orthogonal to questions of how characters are stored internally. You
can do what you like internally, it is completely up to you. It has no
effect on the memory requirements of the receiving device, because you have
to convert to your internal form anyway.
Steven
----- Original Message -----
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
To: "Steven Pemberton" <steven.pemberton@cwi.nl>; <don@lexmark.com>
Cc: <w3c-html-wg@w3.org>; <voyager-issues@mn.aptest.com>;
<elliott.bradshaw@zoran.com>; <www-html@w3.org>; <mike@easysw.com>
Sent: Friday, October 17, 2003 3:15 AM
Subject: RE: allow UTF-16 not just UTF-8 (PR#6774)
> Don and Steven,
>
> I want to expand on what you have said:
> Don wrote:
> > > 1) Every XHTML tag will require twice as many bytes when
> > > represented in UTF-16 versus UTF-8
> > > 2) Every English XHTML-Print print job will be twice as
> > > big encoded with UTF-16 versus UTF-8
> > > 3) Every "Latin 1" print job will be larger approaching
> > > 2X in size.
> > >
> > > When you double the data's size, buffers have to double to
> > > be able to hold and manipulate an equivalent amount of print
> > > stream content.
>
> This statement is only true for some print streams. See the discussion
below
> in "The problem space".
>
> Steven wrote:
> > UTF 16 and UTF 8 are *external* representations. The internal
> > amount of storage needed for them is identical, and
> > completely up to you how you store.
>
> If a printer uses 16 bits internally to represent a character, then there
> shouldn't be a difference in buffering requirements between utf-8 and
utf-16
> encoded files (see below for a more complete discussion). However, if a
> printer uses 8 bits per character, then it has restricted itself to only
> handle a subset of possible documents, those with ASCII characters. This
is
> a product-specific decision akin to that of whether to make a device print
> in color or black & white or support landscape as well as portrait
printing.
> Therefore, I suggest that the spec say that a printer should support
utf-16,
> just as it now says it should support CSS, landscape printing, and
color --
> within the limits of the device. If a user buys a low-cost device that
can
> only print ASCII characters in portrait orientation, without color, style
> sheets, or images, hopefully the price was inline with the printer's
> abilities and other, more expensive, more capable devices are available as
> needed.
>
> Jim
>
>
> The problem space
> ----------------------
> There is a document composition continuum from documents with only text,
> through mixed text and images, to documents that contain only images. At
> the text-only end of the continuum, the effects on the document size of
> UTF-16 vs. UTF-8 is a doubling of document size. At the image-only end of
> the continuum, the effects on the document size of encoding in UTF-16
versus
> UTF-8 are over-shadowed by the image data.
>
> The table below illustrates three points on the document composition
> continuum:
> 1. Text-only: a document that prints as one page of ASCII text (times,
10pt,
> 8in by 11in paper) [1]. Size, in bytes, is 6,282.
>
> 2. Text & Image: a one page document with one 3in x 5in image (166.7K
bytes)
> and the remainder text [2]. Size, in bytes, of document and image is
> 171,531.
>
> 3. Image-only: a one page document with eight 2in x 3.25in images (703.2K
> bytes) and no text. [3] Size, in bytes, of document and eight images is
> 705,108.
>
> Size (bytes): utf-8: %doc : utf-16: %doc
> Text-only: 6,282: 100 : 12,566: 100
> Text+Image: 4,776: 3.2 : 9,554: 5.4 (9,554 /(9,954+166,675)* 100)
> Image-only: 1,916: .27 : 3,834: .54
>
> There is another point of variability: the characters in the text portions
> of the document. This is another continuum from ASCII only at one end to
> Japanese, Chinese, Korean, and Hindi at the other.
>
> "Table 1: UTF types" of [4] gives the following average bytes per code
point
>
> utf-8 utf-16
> English 1 2
> Latin-1 1.1 2
> Greek,
> Russian,
> Arabic,
> Hebrew 1.7 2
> Japanese,
> Chinese
> Korean
> Hindi 3 2
>
> As the language/script of the text portion of the document changes from
> English-only toward other scripts and languages, the size difference
between
> utf-8 and utf-16 decreases.
>
>
> End-to-end solution
> -------------------
> If you look at the end-to-end solution, from the sending application to
the
> printer, the stages can be thought of as:
> 1. Sending Device: the data as represented in the sending device (a cell
> phone for example)
> 2. Transmission: the data combined with markup and style information as
and
> XHTML-Print data stream and then encoded in either UTF-8 or UTF-16
> 3. Receiving Device: the printer -- breaking this into two parts gives:
> 3.a The XHTML-Print data stream as received
> 3.b The data without markup and style information and before printing. How
> the data is stored is implementation dependent and how much memory is used
> depends on how a character is represented -- 8 or 16 bits, and how much
> buffer of the document is buffered. Each printer makes these choices,
> 8bits/char restricted the documents processed to Latin1 characters.
>
>
>
> Stage Size utf-8 utf-16
> 1. app n - -
> 2. xmit n n-3n* 2n
> 3a. Pr n n-3n 2n
> 3b. Pr** n n-2n n-2n
>
> * n-3n shows the variable sizing depending on characters being encode:
> English only (n), CJK (3n)
> ** at Stage 3b, representing a character with 8bits restricts the
characters
> that can be represented to ASCII or Latin 1, 16 bits can represent all
> characters.
>
> Internal representation
>
> If a printer uses 16 bits internally to represent a character, then there
> shouldn't be difference in buffering requirements between utf-8 and utf-16
> encoded files. However, if a printer uses 8 bits, then it has restricted
> itself to only handle a subset of documents. This is a product-specific
> decision akin to that of supporting color or not. Therefore, I suggest
that
> the spec say that a printer should support utf-16 just as it now say it
> should support CSS, landscape printing, and color -- within the limits of
> the device. If a user buys a low-cost device that can only print ASCII
> characters in portrait orientation, without color, images or style,
> hopefully the price is inline with the printer's abilities and other, more
> expensive, more capable devices are available as needed.
>
>
>
> [1] http://www.pwg.org/xhtml-print/W3C-Version/georgeb.html
> [2] http://www.pwg.org/xhtml-print/W3C-Version/text+image.html
> [3] http://www.pwg.org/xhtml-print/W3C-Version/image-only.html
>
> [4] http://www-106.ibm.com/developerworks/library/utfencodingforms/
>
>

FOLLOWUP 29:

From: don@lexmark.com
Steven:
You perception of how this works in an embedded device especially in a
printer that will use this in Bluetooth, UPNP and other environments is
clearly tainted by your experience of this with the Web and PCs.
0) Of course UTF-8 versus UTF-16 is orthogonal to the internal
representation of the "printer" but not until it is in the "printer" and
off the "network"
1) As defined to be used by Bluetooth and in other environments, the data
is PUSHed to the device rather than being pulled. You have less control
over the amount of data being sent.
2) The network buffers are in the same constrained memory space as the
processor for XHTML-Print. Chunks from the network have to be buffered by
the network process until they can be dealt with by the TCP processes which
buffers them until they can be dealt with by the XHTML-Print process. All
this is done in that same limited, constrained memory space. If I'm going
to maintain performance levels customers expect, I need to be able to
buffer up in multiple buffers this data equivalent amounts of CONTENT which
in English encoded UTF-16 is TWICE as many bytes as UTF-8. It is
unreasonable to expected the network or TCP process within the device to
convert UTF-16 to the internal format; that happens when it actually hits
the "printer." So while it might not take any more memory in the "printer"
because the content is converted to an internal format, before it reaches
the "printer" but while it is in the embedded physical device called a
printer, it does.
Do you get it yet? In the PC world, the user agent doesn't have to worry
about all the underlying details necessary to have the content delivered
from the network. We don't have that luxury in the embedded space. All
that work is done by the same processor and with the same limited memory.
How else do you think we can sell printers for $29??
*******************************************
Don Wright don@lexmark.com
Chair, IEEE SA Standards Board
Member, IEEE-ISTO Board of Directors
f.wright@ieee.org / f.wright@computer.org
Director, Alliances and Standards
Lexmark International
740 New Circle Rd C14/082-3
Lexington, Ky 40550
859-825-4808 (phone) 603-963-8352 (fax)
*******************************************
"Steven Pemberton" <steven.pemberton@cwi.nl> on 10/17/2003 08:55:07 AM
To: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>, <don@lexmark.com>
cc: <w3c-html-wg@w3.org>, <voyager-issues@mn.aptest.com>,
<elliott.bradshaw@zoran.com>, <www-html@w3.org>, <mike@easysw.com>
Subject: Re: allow UTF-16 not just UTF-8 (PR#6774)
UTF 8 and UTF 16 are just definitions of how you send a Unicode character
stream in an interoperable way over the wire. The character set is the
same,
the characters are the same, it is just the encoding that is different.
It is orthogonal to questions of how characters are stored internally. You
can do what you like internally, it is completely up to you. It has no
effect on the memory requirements of the receiving device, because you have
to convert to your internal form anyway.
Steven
----- Original Message -----
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
To: "Steven Pemberton" <steven.pemberton@cwi.nl>; <don@lexmark.com>
Cc: <w3c-html-wg@w3.org>; <voyager-issues@mn.aptest.com>;
<elliott.bradshaw@zoran.com>; <www-html@w3.org>; <mike@easysw.com>
Sent: Friday, October 17, 2003 3:15 AM
Subject: RE: allow UTF-16 not just UTF-8 (PR#6774)
> Don and Steven,
>
> I want to expand on what you have said:
> Don wrote:
> > > 1) Every XHTML tag will require twice as many bytes when
> > > represented in UTF-16 versus UTF-8
> > > 2) Every English XHTML-Print print job will be twice as
> > > big encoded with UTF-16 versus UTF-8
> > > 3) Every "Latin 1" print job will be larger approaching
> > > 2X in size.
> > >
> > > When you double the data's size, buffers have to double to
> > > be able to hold and manipulate an equivalent amount of print
> > > stream content.
>
> This statement is only true for some print streams. See the discussion
below
> in "The problem space".
>
> Steven wrote:
> > UTF 16 and UTF 8 are *external* representations. The internal
> > amount of storage needed for them is identical, and
> > completely up to you how you store.
>
> If a printer uses 16 bits internally to represent a character, then there
> shouldn't be a difference in buffering requirements between utf-8 and
utf-16
> encoded files (see below for a more complete discussion). However, if a
> printer uses 8 bits per character, then it has restricted itself to only
> handle a subset of possible documents, those with ASCII characters. This
is
> a product-specific decision akin to that of whether to make a device
print
> in color or black & white or support landscape as well as portrait
printing.
> Therefore, I suggest that the spec say that a printer should support
utf-16,
> just as it now says it should support CSS, landscape printing, and
color --
> within the limits of the device. If a user buys a low-cost device that
can
> only print ASCII characters in portrait orientation, without color, style
> sheets, or images, hopefully the price was inline with the printer's
> abilities and other, more expensive, more capable devices are available
as
> needed.
>
> Jim
>
>
> The problem space
> ----------------------
> There is a document composition continuum from documents with only text,
> through mixed text and images, to documents that contain only images. At
> the text-only end of the continuum, the effects on the document size of
> UTF-16 vs. UTF-8 is a doubling of document size. At the image-only end of
> the continuum, the effects on the document size of encoding in UTF-16
versus
> UTF-8 are over-shadowed by the image data.
>
> The table below illustrates three points on the document composition
> continuum:
> 1. Text-only: a document that prints as one page of ASCII text (times,
10pt,
> 8in by 11in paper) [1]. Size, in bytes, is 6,282.
>
> 2. Text & Image: a one page document with one 3in x 5in image (166.7K
bytes)
> and the remainder text [2]. Size, in bytes, of document and image is
> 171,531.
>
> 3. Image-only: a one page document with eight 2in x 3.25in images (703.2K
> bytes) and no text. [3] Size, in bytes, of document and eight images is
> 705,108.
>
> Size (bytes): utf-8: %doc : utf-16: %doc
> Text-only: 6,282: 100 : 12,566: 100
> Text+Image: 4,776: 3.2 : 9,554: 5.4 (9,554 /(9,954+166,675)* 100)
> Image-only: 1,916: .27 : 3,834: .54
>
> There is another point of variability: the characters in the text
portions
> of the document. This is another continuum from ASCII only at one end to
> Japanese, Chinese, Korean, and Hindi at the other.
>
> "Table 1: UTF types" of [4] gives the following average bytes per code
point
>
> utf-8 utf-16
> English 1 2
> Latin-1 1.1 2
> Greek,
> Russian,
> Arabic,
> Hebrew 1.7 2
> Japanese,
> Chinese
> Korean
> Hindi 3 2
>
> As the language/script of the text portion of the document changes from
> English-only toward other scripts and languages, the size difference
between
> utf-8 and utf-16 decreases.
>
>
> End-to-end solution
> -------------------
> If you look at the end-to-end solution, from the sending application to
the
> printer, the stages can be thought of as:
> 1. Sending Device: the data as represented in the sending device (a cell
> phone for example)
> 2. Transmission: the data combined with markup and style information as
and
> XHTML-Print data stream and then encoded in either UTF-8 or UTF-16
> 3. Receiving Device: the printer -- breaking this into two parts gives:
> 3.a The XHTML-Print data stream as received
> 3.b The data without markup and style information and before printing.
How
> the data is stored is implementation dependent and how much memory is
used
> depends on how a character is represented -- 8 or 16 bits, and how much
> buffer of the document is buffered. Each printer makes these choices,
> 8bits/char restricted the documents processed to Latin1 characters.
>
>
>
> Stage Size utf-8 utf-16
> 1. app n - -
> 2. xmit n n-3n* 2n
> 3a. Pr n n-3n 2n
> 3b. Pr** n n-2n n-2n
>
> * n-3n shows the variable sizing depending on characters being encode:
> English only (n), CJK (3n)
> ** at Stage 3b, representing a character with 8bits restricts the
characters
> that can be represented to ASCII or Latin 1, 16 bits can represent all
> characters.
>
> Internal representation
>
> If a printer uses 16 bits internally to represent a character, then there
> shouldn't be difference in buffering requirements between utf-8 and
utf-16
> encoded files. However, if a printer uses 8 bits, then it has restricted
> itself to only handle a subset of documents. This is a product-specific
> decision akin to that of supporting color or not. Therefore, I suggest
that
> the spec say that a printer should support utf-16 just as it now say it
> should support CSS, landscape printing, and color -- within the limits of
> the device. If a user buys a low-cost device that can only print ASCII
> characters in portrait orientation, without color, images or style,
> hopefully the price is inline with the printer's abilities and other,
more
> expensive, more capable devices are available as needed.
>
>
>
> [1] http://www.pwg.org/xhtml-print/W3C-Version/georgeb.html
> [2] http://www.pwg.org/xhtml-print/W3C-Version/text+image.html
> [3] http://www.pwg.org/xhtml-print/W3C-Version/image-only.html
>
> [4] http://www-106.ibm.com/developerworks/library/utfencodingforms/
>
>

FOLLOWUP 30:

From: Michael Sweet <mike@easysw.com>
don@lexmark.com wrote:
> ...
> 1) As defined to be used by Bluetooth and in other environments, the
> data is PUSHed to the device rather than being pulled. You have less
> control over the amount of data being sent.
> ...
The "push" model is also used for USB, parallel, and serial
printing, and the current print devices seem to have no problem
with flow control over these or network interfaces. It might
mean that customers will see slower printing with UTF-16 data,
but between the spec and any documentation you provide to
developers and customers, it shouldn't surprise anyone...
--
______________________________________________________________________
Michael Sweet, Easy Software Products mike@easysw.com
Printing Software for UNIX http://www.easysw.com

From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
My reply to Don's emailed question:
>Pemberton and others in the group ceased the e-mail thread. Did
>I convince them or have they given up on me?
Don,
I think that the case for and against UTF-16 support in XHTML-Print has been
made.
We discussed UTF-8/UTF-16 and the XHTML-Print spec in 10/22/03 HTML WG
phone conference. The group has officially voted to ask the Director to make
XHTML-Print W3C Working Draft 20 October 2003 [1] a Candidate
Recommendation, noting your dissenting opinion on required UTF-16 support.
Steven Pemberton feels that the director will agree to make the
specification a Candidate Recommendation.
You may register a formal objection [2] concerning UTF-16 support in
XHTML-Print, if you feel that your comments on this issue haven't
sufficiently represented your position. Please continue to CC:
voyager-issues@mn.aptest.com on any further discussions, since this provide
an archive.
The Disposition of Comments for XHTML-Print is at [3].
Jim
[1] http://www.w3.org/MarkUp/Group/2003/WD-xhtml-print-20031020/
[2]
http://www.w3.org/2003/06/Process-20030618/policies.html#WGArchiveMinorityVi
ews
[3] http://www.w3.org/MarkUp/Group/2003/xhtml-print-cr-doc-20031017.html

REPLY 1:

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Thank you for your comment on the XHTML-Print Last Call
Working Draft. It is recorded as issue 6774 [1] in the HTML
Working Group's issue tracking system.
The working group agrees that since XHTML-Print is a member
of the family of XHTML 1.0 languages documents encodings cannot
be restricted to UTF-8 but must also include UTF-16. The
specification will be modified to remove the sentence,
'The only valid value for the "charset" parameter is "utf-8".'
If you feel that this resolution of your comment is not acceptable, please
respond to this message with your comments.
Jim Bigelow
Editor
[1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6774;user=guest

REPLY 2:

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Don,
What do you think of the following compromise?
1. say nothing about whether a printer supports UTF-8 or UTF-16
2. require that conforming XHTML-Print documents be encoded in UTF-8 by
requiring that conforming clients (Section 2.2) creating documents that are
encoded in UF-8. This means adding the following to item 1 of Section 2.2:
1. Clients SHALL produce a well-formed XHTML-Print document as defined in XHTML
1.0 [XHTML1] and in Document Conformance. The document SHALL be encoded using
UTF-8 [RFC2279].
Jim Bigelow

REPLY 3:

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Don and Elliott,
The HTML working group discussed my question of why and XHTML-Print processor
must be a conforming XML processor (in particular, why it must support both
UTF-8 and UTF-16 encodings) on October 1, 2003.
The answer is that XHTML-Print must be a conforming XML processor and support
both UTF-8 and UTF-16 encodings to preserve compatibility between xml-based
applications.
If XHTML-Print processors only supported UTF-8 then an xml-based application
could not be reliably depended upon to emit an XHTML-Print document that the
XHTML-print application could process. For example, an xml-based Xforms
application's output of an XHTML-Print document cannot be restricted by the
XHTML-Print specification to UTF-8 since the application may not be able to
control the encoding.
Section 4.3.3 [1] and Appendix F [2] of the XML specification [3] give
heuristics for determing a document's encoding when the charset parameter of the
MIME type [4] is absent.
An example UTF-16 decoder is available at [5] other encodings are at [6].
Jim Bigelow
[1] http://www.w3.org/TR/REC-xml#charencoding
[2] http://www.w3.org/TR/REC-xml#sec-guessing
[3] http://www.w3.org/TR/REC-xml
[4] http://www.ietf.org/rfc/rfc3023.txt
[5] http://interscript.sourceforge.net/interscript/doc/en_iscr_0282.html
[6] http://interscript.sourceforge.net/interscript/doc/en_iscr_0275.html

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Thank you for your comment on the XHTML-Print Last Call
Working Draft. It is recorded as issue 6775 [1] in the HTML
Working Group's issue tracking system.
The working group agrees with your comments by modifying the
text of section 3.10 to read, "A printer must support
resources of type 'image/jpeg'."
If you feel that this resolution of your comment is not acceptable, please
respond to this message with your comments.
Jim Bigelow
Editor
[1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6775;user=guest

From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
To: w3c-html-editor@w3.org
Cc: xp@pwg.org
Subject: XHTML-Print: treating a missing media attribute as media="screen"
when printing not user's intent
Date: Thu, 4 Sep 2003 14:10:55 -0400
Message-ID: <020A3CF87FB5AC47AA67966B33845755050DB594@xboi22.boise.itc.hp.com>
Sections 3.13 and 3.15 of the W3C Last Call Working Draft of XHTML-Print [1]
state, "The absence of the media attribute MUST be treat[ed] as if the media
attribute had the value 'screen.'"
At the risk of be accused of mind reading, I think that most document
authors do not write style sheets for printing but would like the styles to
be applied when printing as well as browsing. Therefore changing the value
"screen" in the statement shown above to the value "all" would give more
consistent results when browsing and printing.
[1] http://www.w3.org/TR/2003/WD-xhtml-print-20030729/
Jim Bigelow
Hewlett-Packard Co.

REPLY 1:

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Jonny wrote:
I am starting to believe that this error isn't a bug (yes, the default
value *is* "all"), but a virus the way it keeps replicating. Anyone
willing to guess which spec it will infect next?
--
Jonny Axelsson,
Web Standards,
Opera Software

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Thank you for your comment on the XHTML-Print Last Call
Working Draft. It is recorded as issue 6870 [1] in the HTML
Working Group's issue tracking system.
The working group has elected to implement you suggestions.
Jim Bigelow
Editor
[1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6870;user=guest

From: Henri Sivonen <hsivonen@iki.fi>
From: Henri Sivonen <hsivonen@iki.fi>
To: www-html-editor@w3.org
Subject: support for character entities too expensive for low-cost printers
Date: Sun, 3 Aug 2003 22:01:47 +0300
Message-Id: <EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi>
X-Archived-At: http://www.w3.org/mid/EE667E7F-C5E4-11D7-B77B-003065B8CF0E@iki.fi
3.17 Character Entities
The specification mentions that character entities are defined but
doesn't say whether printers should support them.
I think requiring XHTML-Print implementations to support character
entities would be a very bad idea. Support for character entities is
the only feature of XHTML-Print that requires the printer to process
external entities. The burden of implementing a DTD catalog and parsing
the huge (relative to the size of the usual XHTML documents) DTD files
is significant compared to using a non-validating XML processor and not
processing enternal entities at all.
Since XHTML-Print is intended to be used with low-cost printers and the
overwhelmingly most likely use case is that the documents are generated
by software as opposed to being written by hand by humans, I suggest
explicitly stating that printers should not be expected to support
character entities (or any other features of XML that depend on the
external entities to be processed, such as attribute defaulting).
[extracted from issue 6548]
--
Henri Sivonen
hsivonen@iki.fi
http://www.iki.fi/hsivonen/

FOLLOWUP 1:

From: Henri Sivonen <hsivonen@iki.fi>
On Saturday, Sep 27, 2003, at 00:26 Europe/Helsinki, Jim Bigelow wrote:
> The working group does not agree with you concerning
> requiring support a set of predefined character entities.
> The group feels that the set of required character
> entities has a small memory foot print when implemented as
> a data set. Furthermore, such a data set does not require
> that a printer read the DTD. Therefore, no change to the
> specification is planned in this regard.
The problem is that implementing such data set without reading the DTD
would mean that the parser would not be a XML processor as defined in
the XML spec. Using a modified parser would break one of XML's
benefits: the ability to use a ready-made off-the-shelf parser whose
functionality is well defined. Also, having such almost-XML processors
around could cause interoperability problems, since different parsers
would have different idea of what the pre-defined entities were and,
therefore, what entity references rendered a document not well-formed.
--
Henri Sivonen
hsivonen@iki.fi
http://www.iki.fi/hsivonen/

REPLY 1:

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Thank you for your comment on the XHTML-Print Last Call
Working Draft. It is recorded as issue 6776 [1] in the HTML
Working Group's issue tracking system.
The working group does not agree with you concerning
requiring support a set of predefined character entities.
The group feels that the set of required character
entities has a small memory foot print when implemented as
a data set. Furthermore, such a data set does not require
that a printer read the DTD. Therefore, no change to the
specification is planned in this regard.
If you feel that this resolution of your comment is not acceptable, please
respond to this message with your comments.
Jim Bigelow
Editor
[1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6776;user=guest

REPLY 2:

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Henri Sivonen wrote:
> The problem is that implementing such data set without reading the DTD
> would mean that the parser would not be a XML processor as defined in
> the XML spec. Using a modified parser would break one of XML's
> benefits: the ability to use a ready-made off-the-shelf parser whose
> functionality is well defined.
An XHTML-Print processor is only required to deal with XHTML-Print documents
> Also, having such almost-XML processors
> around could cause interoperability problems, since different parsers
> would have different idea of what the pre-defined entities were and,
> therefore, what entity references rendered a document not well-formed.
>
The pre-defined entities that an XHTML-Print processor must support is
well-defined. These entities are specified in the XHTML-Print specification in
[1]. No other entities are part of XHTML-Print and users do not have a means to
create new entities. Therefore, a confroming printer need only implement means
to recognize the set of pre-defined entities and replace them with required
Unicode code points. It is then up to the implementation of a conforming printer
on how best to process the pre-defined set of entities.
Some implementations have done this via a data table that is compiled into the
code, thereby relieving the printer of the need to redundently access the same
information from the DTD for each XHTML-Print document.
However, the specification does not constrain how a confroming printer should
provide support for the set of pre-defined entities.
Jim Bigelow
Editor
[1] http://www.w3.org/TR/2003/WD-xhtml-print-20030729/#s_charentities

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Thank you for your comment on the XHTML-Print Last Call
Working Draft. It is recorded as issue 6777 [1] in the HTML
Working Group's issue tracking system.
The working group made the change you suggested.
Jim Bigelow
Editor
[1]http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6777;user=guest

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Thank you for your comment on the XHTML-Print Last Call
Working Draft. It is recorded as issue 6871 [1] in the HTML
Working Group's issue tracking system.
The working group has elected to implement you suggestions.
Jim Bigelow
Editor
[1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6871;user=guest

From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com]
From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com]
Sent: Thursday, July 31, 2003 1:29 PM
To: BIGELOW,JIM (HP-Boise,ex1)
Cc: xp@pwg.org
Subject: Required support for script, noscript, and hidden
2. Required support for script, noscript, and hidden. I don't mind this
change, exactly. But (at the risk of re-opening a long debate) if the
assumption is that an XHTML-Print client is generating data specifically in
this language, then it should never generate these cases. So mandating
support seems redundant. On the other hand, if the intent is to gracefully
degrade when receiving data from other sources, then there are other issues
(e.g. frames) that also come up.
[extracted from issue 6536]
------------------------------------------
Elliott Bradshaw
Director, Software Engineering
Oak Technology Imaging Group
781 638-7534

REPLY 1:

From: Jim Bigelow <voyager-issues@mn.aptest.com>
> From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com]
> Sent: Thursday, July 31, 2003 1:29 PM
> To: BIGELOW,JIM (HP-Boise,ex1)
> Cc: xp@pwg.org
> Subject: Required support for script, noscript, and hidden
>
> 2. Required support for script, noscript, and hidden. I don't mind this
> change, exactly. But (at the risk of re-opening a long debate) if the
> assumption is that an XHTML-Print client is generating data specifically in
> this language, then it should never generate these cases. So mandating
> support seems redundant. On the other hand, if the intent is to gracefully
> degrade when receiving data from other sources, then there are other issues
> (e.g. frames) that also come up.
>
Adding support for <noscript> allows a document author to use a single document
and have the script execute when browsing and the content of the noscript
element be displayed when printing. The PWG version of XHTML-Print
specifically said that the content of the script element should not be printed
(Section 1.3.1) however it doesn't indicate how a printer was to recognize the
script element treat it differently than all other unknown elements. This
change indicates how the printer should recognize and script, that the content
should be discarded, and the alternate content in the noscript be printed.
So, I think this change cleans up the intent already expressed in previous
versions and does not open to larger issue of graceful degradation in the face
of non-XHTML-Print documents.
Jim.

REPLY 2:

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Thank you for your comment on the XHTML-Print Last Call
Working Draft. It is recorded as issue 6778 [1] in the HTML
Working Group's issue tracking system.
The working group does not agree that support for the script
implies support for document types other than XHTML-Print.
Therefore, no changes to the specificaton are planned regarding
this issue.
If you feel that this resolution of your comment is not acceptable, please
respond to this message with your comments.
Jim Bigelow
Editor
[1]http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6778;user=guest

Printers must support W3C and PWG MIME Type and DTD. PWG versions deprecated.

ORIGINAL MESSAGE:

From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com]
From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com]
Sent: Thursday, July 31, 2003 1:29 PM
To: BIGELOW,JIM (HP-Boise,ex1)
Cc: xp@pwg.org
Subject: change of MIME type to application/xhtml+xml not compatible with UPnP
4. Section 2.1, last paragraph. Changing the MIME type makes sense. But I
assume that "application/xhtml+xml" could refer to other kinds of data
besides XHTML-Print. In other words, the receiving side can't tell that
this data is XHTML-Print. Unless he looks at the DOCTYPE...right?
I'm wondering if this change will be a problem for protocols such as UPnP
that use the MIME type to distinguish "document format" (in the Semantic
Model sense) when advertising capabilities. For example,
http://www.upnp.org/download/Service_print_v1_020808.pdf says
"All UPnP printers MUST support at least the
'application/vnd.pwg-xhtml-print' document format[XHTML-PRINT] ..."
This would have to change to something new, in a way that specifically
refers to XHTML-Print.
[extracted from issue 6536]
------------------------------------------
Elliott Bradshaw
Director, Software Engineering
Oak Technology Imaging Group
781 638-7534

FOLLOWUP 1:

From: elliott.bradshaw@zoran.com
I am not sure that this resolution solves the problem.
Protocols such as UPnP and Bluetooth need a unique MIME type to describe
support for documents formatted as XHTML-Print.
I agree tha the current type application/vnd.pwg-xhtml-print+xml should be
migrated to something more official, which would require such protocols to
make revisions that moves away from the deprecated name. But they still
need a unique way to identify XHTML-Print.
Perhaps those groups have come up with another way to solve this, but to me
a unique MIME type would be the right way to go.
Can the W3C register a new MIME type for this purpose?
Best regards,
Elliott
--------------------------------------------------------------------------------
Elliott Bradshaw
Director, Software Engineering
Zoran Imaging Group (formerly Oak Technology Imaging Group)
781 638-7534
Jim Bigelow
<voyager-issues@mn.a To: ElliottBradshaw@oaktech.com
ptest.com> cc:
Subject: Re: change of MIME type to
09/26/2003 06:24 PM application/xhtml+xml not compatible with UPnP
(PR#6780)
Thank you for your comment on the XHTML-Print Last Call
Working Draft. It is recorded as issue 6780 [1] in the HTML
Working Group's issue tracking system.
The working group decided that the MIME type
"application/vnd.pwg-xhtml-print+xml" must be recognized as referring to a
conforming XHTML-Print document, along with the MIME Type
"application/xhtml+xml". However, the
"application/vnd.pwg-xhtml-print+xml"
MIME type is deprecated in favor of the MIME Type "application/xhtml+xml.
Future
releases of this specification may remove the required support for the MIME
type
"application/vnd.pwg-xhtml-print+xml"
If you feel that this resolution of your comment is not acceptable, please
respond to this message with your comments.
Jim Bigelow
Editor
[1]
http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6780;user=guest

REPLY 1:

From: Jim Bigelow <voyager-issues@mn.aptest.com>
> From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com]
> Sent: Thursday, July 31, 2003 1:29 PM
> To: BIGELOW,JIM (HP-Boise,ex1)
> Cc: xp@pwg.org
> Subject: change of MIME type to application/xhtml+xml not compatible with
UPnP
>
> 4. Section 2.1, last paragraph. Changing the MIME type makes sense. But I
> assume that "application/xhtml+xml" could refer to other kinds of data
> besides XHTML-Print. In other words, the receiving side can't tell that
> this data is XHTML-Print. Unless he looks at the DOCTYPE...right?
>
> I'm wondering if this change will be a problem for protocols such as UPnP
> that use the MIME type to distinguish "document format" (in the Semantic
> Model sense) when advertising capabilities. For example,
> http://www.upnp.org/download/Service_print_v1_020808.pdf says
>
> "All UPnP printers MUST support at least the
> 'application/vnd.pwg-xhtml-print' document format[XHTML-PRINT] ..."
>
> This would have to change to something new, in a way that specifically
> refers to XHTML-Print.
>
Your point also holds for Bluetooth Basic Print Profile (v .95)
(http://www.bluetooth.com/pdf/Basic_Printing_Profile_0_95a.pdf). I think that
XHTML-Print must continue to support the MIME type of
'application/vnd.pwg-xhtml-print' and support for "application/xhtml+xml" should
be optional. I'll argue for this during the working group review.
-- Jim

REPLY 2:

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Thank you for your comment on the XHTML-Print Last Call
Working Draft. It is recorded as issue 6780 [1] in the HTML
Working Group's issue tracking system.
The working group decided that the MIME type
"application/vnd.pwg-xhtml-print+xml" must be recognized as referring to a
conforming XHTML-Print document, along with the MIME Type
"application/xhtml+xml". However, the "application/vnd.pwg-xhtml-print+xml"
MIME type is deprecated in favor of the MIME Type "application/xhtml+xml. Future
releases of this specification may remove the required support for the MIME type
"application/vnd.pwg-xhtml-print+xml"
If you feel that this resolution of your comment is not acceptable, please
respond to this message with your comments.
Jim Bigelow
Editor
[1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6780;user=guest

From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
To: www-html-editor@w3.org
Cc: xp@pwg.org
Subject: Relaxing XHTML-Print's restriction to UTF-8 to include UTF-16
Date: Tue, 2 Sep 2003 20:42:14 -0400
Message-ID: <020A3CF87FB5AC47AA67966B3384575504D1D0AD@xboi22.boise.itc.hp.com>
X-Archived-At: http://www.w3.org/mid/020A3CF87FB5AC47AA67966B3384575504D1D0AD@xboi22.boise.itc.hp.com
> From: Henri Sivonen [mailto:hsivonen@iki.fi]
...
> It is said that if a "charset" parameter is present for the
> application/xhtml+xml MIME type, the only valid value is "utf-8". It
> would make sense to allow "utf-16" as well. All XML processors are
> required to support UTF-16 in addition to UTF-8, so allowing
> UTF-16 for XHTML-Print doesn't cause any additional burden
> to implementations. Also, the payload of
> Application/Vnd.pwg-multiplexed chunks is defined
> as octets, so UTF-16 strings can be delivered as
> Application/Vnd.pwg-multiplexed chunks without any further encoding.
>
I tend to agree with Henri when he says that support UTF-16 would not be
much more expensive than UTF-8. Does anyone on this list or the PWG's
XHTML-Print list disagree?
Jim

From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
From: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
To: www-html-editor@w3.org
Cc: xp@pwg.org
Subject: RE: XP> FW: Last call announcement for XHTML Print
Date: Thu, 31 Jul 2003 13:53:00 -0700
Message-ID: <020A3CF87FB5AC47AA67966B3384575503C7DBE0@xboi22.boise.itc.hp.com>
X-Archived-At: http://www.w3.org/mid/020A3CF87FB5AC47AA67966B3384575503C7DBE0@xboi22.boise.itc.hp.com
Elliott,
You wrote:
>
> I reviewed the public version and here are a few comments.
>
...
>
>
> 5. Section 2.3.1, "Images" section, fourth bullet. It used
> to say "Image data within the object element need not be
> supported." and now it says "A printer MAY choose to omit
> images referenced by a URI [RFC2396] containing a scheme name
> other than cid [RFC2392] and http [RFC2616] ." I'm confused.
>
The rewording is an attempt to say, in the positive, what URI types must be
supported and by implication that support for the data URI is not required.
Perhaps it should actually say that in the positive :-). For example,
A printer must support images referenced by a URI [RFC2396] containing a
scheme name cid [RFC2392] and http [RFC2616], support for other scheme names
is optional. However, support for a URI containing the data scheme name [REF
NEEDED] is not required unless the printer chooses to implement the method
for supporting in-line data given in Appendix B.3.
Jim

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Thank you for your comment on the XHTML-Print Last Call
Working Draft. It is recorded as issue 6781 [1] in the HTML
Working Group's issue tracking system.
The working group decided to change the wording of section 2.3.1 to,
"A printer must support images referenced by a URI [RFC2396] containing a
scheme name cid [RFC2392] and http [RFC2616], support for other scheme names
is optional."
If you feel that this resolution of your comment is not acceptable, please
respond to this message with your comments.
Jim Bigelow
Editor
[1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6781;user=guest

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Thank you for your comment on the XHTML-Print Last Call
Working Draft. It is recorded as issue 6783 [1] in the HTML
Working Group's issue tracking system.
The working group has elected to implement you suggestions.
Jim Bigelow
Editor
[1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6783;user=guest

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Thank you for your comment on the XHTML-Print Last Call
Working Draft. It is recorded as issue 6784 [1] in the HTML
Working Group's issue tracking system.
The working group has elected to implement you suggestions.
Jim Bigelow
Editor
[1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6784;user=guest

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Thank you for your comment on the XHTML-Print Last Call
Working Draft. It is recorded as issue 6785 [1] in the HTML
Working Group's issue tracking system.
The working group has elected to implement you suggestions.
Jim Bigelow
Editor
[1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6815;user=guest

From: Susan Lesch [mailto:lesch@w3.org]
These are minor editorial comments for your XHTML-Print Last Call Working
Draft [1]. Kudos to the editor and your group(s). It looks great.
It may make sense to mark up elements and attributes globally
<code>thus</code>, as they are in 1.3.1 and some other places (that
eliminates the need for quotes in the 4.1 heading).
[extracted from 6899]
[1] http://www.w3.org/TR/2003/WD-xhtml-print-20030729/
Best wishes for your project,
--
Susan Lesch http://www.w3.org/People/Lesch/
mailto:lesch@w3.org tel:+1.858.483.4819
World Wide Web Consortium (W3C) http://www.w3.org/

From: Jim Bigelow <voyager-issues@mn.aptest.com>
Thank you for your comment on the XHTML-Print Last Call
Working Draft. It is recorded as issue 6786 [1] in the HTML
Working Group's issue tracking system.
The working group has elected to implement you suggestions.
Jim Bigelow
Editor
[1] http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6786;user=guest