I think Microsoftâ€™s proposed solution (authoritative=true) could work
as a stop-gap measure, but I think we need to think about a
significantly different approach entirely. For example, I think HTML
should have its own mechanism for setting the processing of embedded
resources. I've proposed just such a mechanism in bugzilla[1].
I think we need to look at this with fresh eyes. The http content-type
header was intended to serve double duty. First it provides access to
mime types without needing to retrieve the entire resource to perform
sniffing or otherwise examine the resource. Second it served as a
mechanism for authors to alter the MIME type treatment of a file.
There are problems with combining these two roles into one. There are
also problems with not including such a mechanism within HTML itself.
Some of those issues are covered in a wik page[2] on the topic.
Ideally, agents should be able to query the intrinsic type of
resources across the network without needing to retrieve the resource.
Also authors should be able to use the same resource with the same
resource identifier to alter the treatment of a resource. The http
content type header cannot serve both of these functions at the same
time. It's time to have new headers and other new mechanisms to
address all of these issues. Add to these problems the fact that http
content type headers cannot address, the issue of compound document
types (multiple parts, etc), and content type headers again cannot
meet the needs of modern resources.
What I think we need is 1) an entirely new http header (and this is
probably something for the http wg to consider) that can return an
array of intrinsic content types for each resource (perhaps the
sniffing code could be moved from the open source browser projects to
the open source server projects to generate this header) and 2) a
separate header for author control over the processing of a resource.
However, this second function should not be needed for HTML since HTML
should include its own attributes for controlling the processing of
resources (as proposed in bugzilla). Together these mechanisms
address the problems identified in the wiki.
Finally, consider the problem that apache still has a long-standing
bug that makes it impossible to configure the server to return no
content type header when the content type of a file is unknown. This
is over a decade after the spec and the creation of apache. Certainly
apache addressed a need to handle files with no filename extension and
send permit administrators to configure the server to send text/plain
in such circumstances (as Roy Fielding has pointed out on numerous
occasions[3]). However, apache goes further and sends "text/plain" for
every unknown (unmapped) filename extension. Basically httpd's
DefaultType should not even exist and instead there should be a
setting to sniff extension-less filenames for text/plain type.
Nevertheless, this long history created some of the need for client UA
sniffing in the first place and I'm afraid I don't see a way back to
no sniffing given this history. The only way out now is to come up
with new replacement mechanisms to achieve the goals originally set
for the http content type header.
In summary, we need:
* an http mechanism for discovery of the intrinsic type of resources
including an array of multiple types in the case of multipart of
compound documents
* an HTML mechanism for controlling the processing of resources
* perhaps an http mechanism also for controlling the processing of
resources, but not for use in HTML
Take care,
Rob
[1]: <http://www.w3.org/Bugs/Public/show_bug.cgi?id=5776>
[2]: <http://esw.w3.org/topic/HTML/ContentTypeIssues>
[3]: <http://lists.w3.org/Archives/Public/public-html/2008Jul/0038.html>
On Jul 6, 2008, at 1:40 PM, Julian Reschke wrote:
>
> Ian Hickson wrote:
>> ...
>> If you would like the document to be processed as plain text, then
>> there
>> might not be a good answer for you, sorry. Your use case is
>> incompatible
>> with the use case of the many users who want to see feeds sent as
>> text/plain handled as feeds. Enough people mislabel their feeds as
>> text/plain that in practice documents labeled as text/plain are, in
>> some
>> browsers, sniffed for feeds before being treated as plain text.
>> ...
>
> With the current text in HTML5, there's not only no "good answer"
> but no
> answer at all (except by telling users to configure their UAs to
> respect
> mime types).
>
> Sam's use case could be made compatible by making the response
> distinguishable from one sent by a misconfigured server.
>
> At this point it seems to me that you are simply not interested in
> that
> case. Is this correct?
>
> BR, Julian
>
>