Hi Leif,
On Aug 21, 2007, at 8:56 AM, Leif Halvard Silli wrote:
>
> 2007-08-21 14:21:01 +0200 Julian Reschke <julian.reschke@gmx.de>:
>
>> Leif Halvard Silli wrote:
>>> ...
>>> Later Julian Reschke replied:
>>>> I think they do.
>>>> XHTML: <http://tools.ietf.org/html/rfc3236#section-2>
>>>> Template: <http://tools.ietf.org/html/rfc4288#section-4.11>
>>> One of Karl points was probably that one actually recommend
>>> several extensions for (in this case) XHTML. By recommending
>>> only .XHTML, XHTML-files would in most cases automatically be
>>> served as 'application/xhtml+xml', and thus authors/users would
>>> experience the effects of XHTML.
>> RFC3236 mentions XHTML, XHT and HTML.
>
> Like I said.
>
>> Apache 2.2.x comes with a preconfigured mapping file (mime.types)
>> which has
>> application/xhtml+xml xhtml xht
>> so as far as I can tell, it already does what you're looking for
>> (and probably has for a long time).
>
> I am aware of this. And allthough there are more web servers than
> Apache, and more browsers than Firefox, this might serve (sic) as
> an example. (By asking Ian for examples of files.XHTML being served
> as text/html, I suspect he expects to hear that there are very
> _few_ such examples. In contrast, Ian has often been keen to
> demonstrate that things doesn't work, e.g. showing how images being
> served as text, will still being treated as an image by
> browsers ... and other such things.)
>
> The main thing that I agree very strongly with Karl in is that the
> offline and online "gap" should be bridged, and that this can
> happen through setting up clear/strict recommendations for which
> extensions to use - which all sides (authors, authoring software,
> browsers, servers) should pay attention to. This bridging should
> include official language and charset extensions, taking example
> from Apache, which also allready offer its own such extensions, and
> have done so for a very long time allready.
I'm not so sure I would characterize this as a problem between the
online and offline worlds. The mappings of filename extensions to
MIME types are already quite common in both worlds. The problem
arises with mis-configured servers or non-configured servers for new
MIME types and new file extensions. As I understand it it also comes
from servers trying to send default MIME types for files it's not
sure about (instead of just admitting it doesn't know).
For character encodings I think things are somewhat a mess. Most
authors are not that aware of character encodings. To me its really
the type of thing authors should not have to worry about (if it had
been handled in a sane way form the start). Adding filename
extensions for encoding could be one approach (as a longtime Mac
user, it doesn't really appeal to me too much, but we did make the
adjustment to filename extensions for file types). However, I think
Unicode has really introduced a better approach with, well, Unicode
itself. But also the introduction of the Byte-order-mark, that does a
fairly good job of identifying UTF-8 and UTF-16 encodings as those
encodings. A logical extension off this (outside our scope) would be
some sort off byte registry for character encodings. Each character
encoding could have its own one or two-byte sequence that each file
started with. Once text editors had been updated to handle these
registered bytes, authors would never have to think about it again.
Every text file would always have its encoding tattooed on its forehead.
Finally, for languages, its useful for servers to have metadata about
language at its disposal to quickly deliver to clients. However, i
like the way HTML handles that already through the i18N language
features. Apache can even be configured to sniff inside the files as
they're added to the server to gather this data for quick indexing
for later.
So all of these pieces of metadata each have their own place I think.
The safest thing is to keep the authoritative data inside the file
itself, and then extract it and index it in filesystem metadata or
elsewhere for quick retrieval. Many filesystems (and WebDAV too)
support extended filesystem attributes. Some tools have started to
store this information there. Systems like Apple's Splotlight extract
authoritative metadata from files and store it in a sqlite database
for indexing (but also makes use of filesystem attributes and
extended attributes alongside the sql). To me those approaches
represent best practice. Filenames (and their extensions) can be too
easily and inadvertently changed: losing that metadata. The best
thing to do is keep it inside the file (with the exception of file
type which has now had a long tradition of filename extension mapping).
Take care,
Rob