Wasn't working for me, on Chrome, and I didn't bother to mess with encodings the first time around.

Looks like it's EUC-JP. Chrome has a general autodetect feature, but doesn't seem to have an Asian-specific autodetect. Perhaps because whatever false charset it chose actually worked for some parts of the document. I could see some of the hiragana words fine, at any rate, like まえがき, but the majority of it was garble.

But of course, after I forced Chrome to use EUC-JP, then TJP didn't like the encoding used to write まえがき.

This is what mine was setup as... It automatically detected the right encoding from the get go.

I still avoid Chrome because of the lingering bugs dealing with East Asian encodings. I test with it and what not... but they need to fix up that part of the code before I consider really trusting it as my main browser.

phreadom wrote:I still avoid Chrome because of the lingering bugs dealing with East Asian encodings. I test with it and what not... but they need to fix up that part of the code before I consider really trusting it as my main browser.

Well, I mean, the bug in this case is clearly with the server/web page, which doesn't properly identify the encoding it's using... browsers are _supposed_ to default to latin1 in that case (but good browsers obviously give the user more choice than that), so it's not exactly a "lingering bug" so much as a missing important feature (which, yeah, sometimes that can be considered a bug). I haven't had any problems with Asian encodings except in cases like this (actually, I think this is the only case I've had so far).

I avoid using Chrome for intranet sites, though, because it still lacks an easy way to deal with long-term SSL security exceptions. :/

I don't think that's the problem. Even if the site does specify utf-8, the problem is that Japanese text is getting written using Chinese hanzi where there should be Japanese kanji. Even within a page full of all Japanese writing. So you end up with things like sans-serif kana, and serif hanzi mixed together and it looks terrible. And that's by the default settings.

You apparently can tell it to use a different font in your browser settings for that kind of text and get it to work acceptably... but I don't think that really addresses the underlying issue.

I think those cover what I'm talking about. (the second is just my own little summary with other examples and lots of extra links for newbies)

I think there are other ways of also forcing it to use a specific encoding by using in-line styling etc... like some of the wikipedia pages... but I think as the bug reports on chrome show, even that isn't necessarily the best solution. It's a bit tricky for me to wrap my head around.

phreadom wrote:..... Even if the site does specify utf-8, the problem is that Japanese text is getting written using Chinese hanzi where there should be Japanese kanji. Even within a page full of all Japanese writing. So you end up with things like sans-serif kana, and serif hanzi mixed together and it looks terrible.

Er, no. Unless you are using PRC simplified hanzi such as 话, kanji/hanzi are written *exactly* the same way in ISO10646/Unicode (i.e. UTF-8) coding. That's the whole point of the "Han Unification" which is one of the principles of Unicode. What you see on the page in the way of serifs, choice of glyphs (i.e. the way a character looks), etc. has nothing to do with the way the characters are coded. The designer of a WWW page may choose to say which fonts are to be used, e.g. via a style sheet, but that has nothing to do with the coding, or the way the " text is getting written".

And that's by the default settings.

And that is probably where your problem lies.

You apparently can tell it to use a different font in your browser settings for that kind of text and get it to work acceptably... but I don't think that really addresses the underlying issue.

The issue that got us on this topic is the failure of some browsers to recognize EUC-JP coding automatically when the page itself doesn't declare its coding (as it should.) It is made worse by some WWW servers as a default setting the MIME header at the start of every page to ISO-8859-1 (i.e. Latin-1).

Those specifically explain what the problem is in Chrome (or at least one of them), and why it doesn't work like it does in Firefox etc. And it doesn't sound to me quite like the specific problem you guys are saying it is. Even with the correct page encoding the browser still can't figure it out. That's what I was referring to.

To further illustrate the absurdity of this... let's take another example from our original page.

When I open this in Chrome, the browser tells me the page is in Japanese and asks me if I want to translate it into English, in spite of there being no encoding set to tell it this. It has to guess it from the content.

However... the text on the page is messed up because it's displaying in Simplified Chinese, because it says that's what it auto-detected it as... in spite of just telling us it auto-detected it as Japanese and asked us if we wanted to translate it.

chromeissue1.png (138.31 KiB) Viewed 4634 times

So maybe I really am missing something here, but these really don't seem like the same issue(s) you guys are specifically referring to...

phreadom wrote:Jim, did you read over the some of the comments on this bug (#28, #34, etc)?

I glanced at them, but as I don't use Chrome, it wasn't a lot of interest.

I may have taken your "the problem is that Japanese text is getting written using Chinese hanzi" a bit literally. I suspect what you meant was "Japanese text is getting displayed using Chinese hanzi", which is, of course a different ケトル of 魚.

When I open this in Chrome, the browser tells me the page is in Japanese and asks me if I want to translate it into English, in spite of there being no encoding set to tell it this. It has to guess it from the content.

However... the text on the page is messed up because it's displaying in Simplified Chinese, because it says that's what it auto-detected it as... in spite of just telling us it auto-detected it as Japanese and asked us if we wanted to translate it.