Apologies for breaking mail client threading, I hit delete instead of
reply and had to reconstitute this from the archives.
> I haven't noticed this, but it's possible that Safari is defaulting
> to UTF-8 for me. We *absolutely* should be using UTF-8 everywhere;
> ISO-8859-1 is wrong. Changing Apache's default would only help with
> Apache; webrick and lighttpd would still be broken. So we'll need to
> add a UTF-8 content-type everywhere. That shouldn't be all that hard.
Good point.
I've noticed that at the moment, it's being done inside some controller
methods. Maybe it's something the dispatcher should be doing.
Until then, though, it seems that the workaround (for Apache) is...
AddDefaultCharset utf-8
>> Interestingly, when you create a page which has multi-byte
>> characters in
>> it, those characters encode using XML entities. This is probably a
>> problem in itself, as XML-encoding the characters results in around 8
>> bytes per character, whereas UTF-8 results in an average of 2-3.
>> This is almost certainly an editor or web browser issue; I've created
> posts with UTF-8 characters using Ecto on OS X.
Having just performed the workaround mentioned above, I've discovered that
this second issue is fixed as a side-effect of fixing the first.
The original page in the database actually had "&#12400;&#12363;" as its
content. The new one has the correct characters.
My guess is, Firefox said "oh, this page isn't UTF-8, therefore the site
probably doesn't support UTF-8 comments, so let's submit this encoded."
Typo then ignored the encoding and stored it as-is.
Whereas now... the page advertises itself as UTF-8, and a new page created
is correctly stored as the original UTF-8 content.
Aren't browsers just too smart for their own good? :-)
TX