Does anyone know of any good (free) online sources to read Japanese stories online besides TJP? I have a pretty small vocabulary and lists don't really work well for me when I'm studying on my own (I'm in a 101 class right now and already know pretty much everything because the vocabulary is mostly thematic and I use it a lot in class so it stays in my head. Also, the fact that I studied through a 101-level book on my own didn't hurt.).

Most of the new words I come across are from songs, which aren't really accessible, at least legally, to people in America. Also, the best way to memorize words for me is to remember a context they're in for example, in 井上ジョーの(Joe Inoue's) CLOSER, I remembered 最近, 体験, 一体, and 恵む because the verse 「あなたが最近体験した幸せは一体何ですか。恵まれ過ぎていて思い出せないかも。」. Normally I'll only remember a part of a line and maybe one or two words. I remembered so many from this because I just had some of the kanji sent to be in the JTP kanji mailing list.

But I digress, sometimes, when all else fails, I will try and find lists, but most of them are very basic vocabulary that I already know. Does anyone know sites for good vocabulary lists that goes beyond basic expressions, house and restaurant words, etc.?

xhilononi234 wrote:Does anyone know of any good (free) online sources to read Japanese stories online besides TJP? I have a pretty small vocabulary and lists don't really work well for me when I'm studying on my own !

Thank you, everyone. I just need to figure out how to find a way to get Old Stories of Japan to work right. I clicked on Japanese and the characters aren't displayed properly I'll try messing with things! (I changed my browser. it worked with Internet Explorer, but not with mozilla

xhilononi234 wrote:Thank you, everyone. I just need to figure out how to find a way to get Old Stories of Japan to work right. I clicked on Japanese and the characters aren't displayed properly I'll try messing with things! (I changed my browser. it worked with Internet Explorer, but not with mozilla

xhilononi234 wrote:Thank you, everyone. I just need to figure out how to find a way to get Old Stories of Japan to work right. I clicked on Japanese and the characters aren't displayed properly I'll try messing with things! (I changed my browser. it worked with Internet Explorer, but not with mozilla

The encoding isn't detected correctly (and should be specified by the people who made the pages technically), so you need to go to "View" → "Character Encoding" → "Japanese (EUC-JP)", and if you don't see it listed there, then go to "View" → "Character Encoding" → "More Encodings" → "East Asian" → "Japanese (EUC-JP)" and then the page will work fine.

("View" being up in the menu at the top of your browser... File, Edit, View, History, etc.)

I just tested it out here.

I'd really recommend sticking with FireFox, that way you can also use Rikaichan!

The encoding isn't detected correctly (and should be specified by the people who made the pages technically)

"Technically", nothing. If your web page uses anything other than plain ASCII text, it should declare the encoding in the HTTP header, period. Any webmaster who doesn't is clueless and asking for trouble.

Sadly, Japanese websites are often designed by such clueless people. It really isn't that hard to configure a web server to declare the proper encoding (unless you're on a service such as, oh, geocities.co.jp, but in that case it's GeoCities who's clueless).

So the page is invalid and hence doesn't work. You have to manually override this with a valid encoding for it to work. IE only works because IE was made to let people get away with horribly broken and sloppy invalid code... which is not a good thing in reality.

Anyway... maybe they meant "x-euc-jp" which might have been a valid experimental encoding years ago... I really don't know. The page is invalid for a number of reasons aside from that, so it's kind of irrelevant to discuss it further. Enough thread hijacking.

Actually, the browser isn't supposed to use the meta tag to guess the encoding anyway if the document was transmitted by HTTP (as opposed to, say, being opened on your hard drive). Some browsers probably will, but you're supposed to specify the encoding in the HTTP header, too.

While yes, it should be set on the server level, people don't always have access to their hosts today, so specifying it in the document is also a requirement (recommendation) so that regardless of the server or medium you're loading the page from, the browser will always know which encoding the document is using.

So, how does your browser actually determine the character encoding of the stream of bytes that a web server sends? I’m glad you asked. If you’re familiar with HTTP headers, you may have seen a header like this:

Briefly, this says that the web server thinks it’s sending you an HTML document, and that it thinks the document uses the UTF-8 character encoding. Unfortunately, in the whole magnificent soup of the world wide web, very few authors actually have control over their HTTP server. Think Blogger: the content is provided by individuals, but the servers are run by Google. So HTML 4 provided a way to specify the character encoding in the HTML document itself. You’ve probably seen this too:

Briefly, this says that the web author thinks they have authored an HTML document using the UTF-8 character encoding.

Both of these techniques still work in HTML5. The HTTP header is the preferred method, and it overrides the <meta> tag if present. But not everyone can set HTTP headers, so the <meta> tag is still around. In fact, it got a little easier in HTML5. Now it looks like this:

To address server or configuration limitations, HTML documents may include explicit information about the document's character encoding; the META element can be used to provide user agents with this information.

For example, to specify that the character encoding of the current document is "EUC-JP", a document should include the following META declaration:

The META declaration must only be used when the character encoding is organized such that ASCII-valued bytes stand for ASCII characters (at least until the META element is parsed). META declarations should appear as early as possible in the HEAD element.

For cases where neither the HTTP protocol nor the META element provides information about the character encoding of a document, HTML also provides the charset attribute on several elements. By combining these mechanisms, an author can greatly improve the chances that, when the user retrieves a resource, the user agent will recognize the character encoding.

To sum up, conforming user agents must observe the following priorities when determining a document's character encoding (from highest priority to lowest):

An HTTP "charset" parameter in a "Content-Type" field.

A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset".

The charset attribute set on an element that designates an external resource.

In addition to this list of priorities, the user agent may use heuristics and user settings. For example, many user agents use a heuristic to distinguish the various encodings used for Japanese text. Also, user agents typically have a user-definable, local default character encoding which they apply in the absence of other indicators.

User agents may provide a mechanism that allows users to override incorrect "charset" information. However, if a user agent offers such a mechanism, it should only offer it for browsing and not for editing, to avoid the creation of Web pages marked with an incorrect "charset" parameter.

Historically, the character encoding of an HTML document is either specified by a web server via the charset parameter of the HTTP Content-Type header, or via a meta element in the document itself. In an XML document, the character encoding of the document is specified on the XML declaration (e.g., <?xml version="1.0" encoding="EUC-JP"?>). In order to portably present documents with specific character encodings, the best approach is to ensure that the web server provides the correct headers. If this is not possible, a document that wants to set its character encoding explicitly must include both the XML declaration an encoding declaration and a meta http-equiv statement (e.g., <meta http-equiv="Content-type" content="text/html; charset=EUC-JP" />). In XHTML-conforming user agents, the value of the encoding declaration of the XML declaration takes precedence.

So in short, for what appears to be all modern versions of HTML, XHTML, and XML for web content, the order is defined as follows, and should be correctly defined within the document itself as a best practice both to avoid cases where the server either isn't configured correctly or doesn't support sending content type headers, and to avoid security issues;

And that's the real problem. Hosts should provide an option for setting that sort of thing. (The best way would be .htaccess, but other solutions should be possible in an environment that doesn't allow .htaccess.) The impression I have is that hosts that don't allow it are basically just too lazy to implement it.

I guess they just feel that they have other priorities, but you'd think that a reasonably correct implementation of HTTP, which is the backbone of the entire web, would have at least a little priority...

And, of course, the very fact that a web browser, even today, mangled somebody's page demonstrates that it's a real problem, not just an imaginary problem that nobody ever actually has. (Heck, I run into this sort of problem from time to time myself, even in Firefox.)

And that's the real problem. Hosts should provide an option for setting that sort of thing. (The best way would be .htaccess, but other solutions should be possible in an environment that doesn't allow .htaccess.) The impression I have is that hosts that don't allow it are basically just too lazy to implement it.

I guess they just feel that they have other priorities, but you'd think that a reasonably correct implementation of HTTP, which is the backbone of the entire web, would have at least a little priority...

And, of course, the very fact that a web browser, even today, mangled somebody's page demonstrates that it's a real problem, not just an imaginary problem that nobody ever actually has. (Heck, I run into this sort of problem from time to time myself, even in Firefox.)

- Kef

I wouldn't say the browser itself mangled the page as though it were the browsers fault. The server didn't provide a content type, and the web page itself provided an invalid type. So the browser probably tried to honor the content type provided and that ended up breaking things. Perhaps we could say that if an invalid content type is given, perhaps the browser should try sniffing the content type from the page (which is in EUC-JP itself). But that's not the standard. The standard is to provide the content type in the HTTP headers, or to provide it in the code (in the xml declaration, or in the meta tag, in that order), or to fall back to utf-8. So even were the browser to have done exactly that, it would have still broken because the page wasn't in utf-8 either. Beyond that is outside of the scope of the standards.

I don't think the standards people make such a big deal about the content type being delivered by the server (and they don't as far as I can see).

Even though that's listed as the most correct way to do it as a recommendation, they're pretty clear about providing it in the code yourself as a fall-back, in case you're loading it from your hard drive, or viewing it by some other means that doesn't support content typing... or perhaps its old server software that doesn't support configuring the content type etc.

You're making much too big a deal about setting it on the server when you should just accept the fact that that's the very reason the ability to set it exists in the code in the first place is to avoid making that such a big deal. Set it correctly in your code and the problem is solved. If the server sets it (correctly), then great. If not, it doesn't matter at all because you set it correctly in your code as well (as you should, otherwise your code would be invalid if you checked the code itself alone, as through the input method of the w3 validator etc).

As a matter of fact, it's not mandatory per the HTTP spec that Content-type be set on the server either. It recommends it (SHOULD), but does not require it (MUST), and there is a real difference there.... so you don't exactly have the grounds to argue that people aren't correctly implementing the spec or following it in their servers either. It's best practice today if they do, and allow you the means of configuring it yourself, but even when they started supporting sending the content-type they didn't always offer you the ability to configure it yourself etc...

The key is that they don't have to, and they're not violating the spec if they don't.

However if you allow your page to be sent without a content type because you refuse to set it, or set it incorrectly, or think that the server should have it configured when they're technically not required to do so... the fault rests squarely on your shoulders... thus set the content type yourself on every document and set it correctly. (And there ARE specs for which content types are valid, so if you break that spec in setting your content type, the fault is still yours.)