I think we should be very cautious about overloading the MIME charset
or Accept-charset to mean more than just a character encoding.
(I'll like to put in a word for reviving "Charset considered
Harmful as an informational RFC" ;)
In the context of HTML, the issues of what characters can be usefully used
or rendered as gypths seem more closely allied to SGML than
the MIME transport layer. (Though I don't know any SGML mechanism i
to solve it all.)
Remember that I could send all these funky characters using numeric
character references, and a charset of US-ASCII.
A true multi-lingual document is likely to have a mix of languages
and scripts. That's why we have LANG attributes, to help the
software out in choosing appropriate fonts, hyphnation methods, spelling
dictionaries, etc. We haven't really addressed all the possible cases
where one language uses multiple scripts/writing systems.
(Hindi/Urdu _may_ be one such example I'm just guessing here.)
I'd like to see more people implement the i18n spec before we fight
about what comes next, however...