Hello,
If you don't mind going a few years back, I would like to get your
recollections of (and opinion on) the character encoding list
accepted by the markup validator.
http://dev.w3.org/cvsweb/validator/htdocs/config/charset.cfg
Technically speaking, we do not really need this list any more. To
know whether an encoding is technically supported, we have a small
routine with Encode::decode() that does the job just fine. The Encode
module seems to support a wide variety of encodings, too, much wider
than the list we have.
e.g iso_8859-1 - http://qa-dev.w3.org/wmvs/HEAD/dev/tests/197-
iso88591_alias.html
I haven't yet tested whether Encode supports all IANA listed
characters, but if it does not, then we could always pass the
character encoding declared through something like I18N::Alias, as
suggested in
http://www.w3.org/Bugs/Public/show_bug.cgi?id=197
Therefore, there is no technical reason why we should enforce the use
of a small list of accepted charsets.
However, the charset.cfg documents itself with (since revision 1.11
committed by Bjoern):
[[
The Validator will refuse to decode documents in an encoding
other than those listed here. The list is independent of what
is supported on a specific system but subject to the Validator
policy for acceptable encodings.
]] -- http://dev.w3.org/cvsweb/validator/htdocs/config/
charset.cfg.diff?r1=1.10&r2=1.11&f=h
Sounds reasonable, but what's the policy? And where does it come from?
All I can find so far in normative documents systematically points to
the IANA registry.
http://www.iana.org/assignments/character-sets
And searching the lists archives does not give me a clear lead on
whether there used to be a policy in the validator to favor such
charset or other.
Anyone has any thought/recollection on this?
Thanks.
--
olivier