This time John has attacked the theory that Chinese is hard. The chief reason that full (or even half-ass) mastery of Chinese is difficult is those darned Chinese characters, so that’s the focus of John’s analysis.

He provides stroke count statistics for groups of the most commonly used Chinese characters. The result is somewhat heartening. Check it out.

This kind of statistical work has certainly been done before by Chinese scholars, but it’s not very easy information to find online. I made a half-hearted attempt and didn’t find it. (Yes, that’s a challenge to you readers to prove that you’re better than me.) Plus, John offers it in English.

What I did find was some software that could be interesting: 汉字经 and HanziStatics [sic] (汉字统计程序). If anyone has some free time to check those out, let me know what you think (Chinese ability almost certainly required).

Related Content

15 Comments to “Chinese Character Stroke Stats”

I was pretty shocked when I saw he’d gutted his site and wiped the archives clean. John used to have a lot of posts up that I liked. As far as the difficulty of Chinese characters is concerned, I’m torn. On one hand, I’ve often agreed with what he’s saying. On the other hand, my friend Martin recently asked a dozen native Chinese speakers (in Taiwan) if they could write the word “key” (鑰匙), and they all got it wrong. I’ve also seen educated people come up blank and not even know how to get started when writing some pretty basic words, such as names for body parts. While some educated English speakers might have a few spelling demons, I’ve never seen them just come up with a blank and be at a total loss as to how to even start writing a word.

Whoa, there! I’m just an innocent bystander, man. I didn’t come up with this madness. The Chinese decided to use like 龠 and god help us, 龜 as components in characters and that’s what I’m stuck with. It’s idiotic, but then again so is the fact that “through”, “though”, and “rough” don’t even remotely rhyme. Personally, I look forward to alien cyborg colonization forces invading and civilizing us into using a reasonable language…

Mark, you say that now, but wait until you have to conjugate a verb–in binary!

(that last “that’s what you get” wasn’t actually aimed at you, just in response to your comment in more of a “that’s what happens when” sense–though I do hold you personally responsible for the madness that is traditional Chinese )

Nice post by John, but this discussion isn’t taking place in a vacuum. We’re implicitly comparing Chinese to the roman alphabet, right? So game, set and match right there.

I’ve never run into anyone who has seriously claimed that Chinese is difficult to learn because Chinese characters have a lot of strokes. So this entire discussion is a strawman: analogous to having one Chinese student claim that English is damn difficult because some words have a lot of letters in them, with the rebuttal being that — hey — the median length of words in the OED is only around six letters and the word “the” is statistically predominant so the language is easier than you think.

The big issues as far as I’m concerned are the weak phonetic link between written and spoken Chinese, the prevalence of duoyinci, the enormous difference between formal and casual written practices, and the ease of confusing minor radicals when writing less commonly used characters. And tonal variation makes everything harder by screwing with your memory.

Chinese is much harder to learn than Spanish. I always wonder what foreigners who say otherwise are thinking. Do they really believe this and do they have any frame of reference, or are they just being self-deprecating. I suspect for the vast majority of people with good Chinese (ie. those who have spent time in China), people are just being reflexively Chinese. When people say this without being self-deprecating, anyway, I either start assuming they have a family background, or have a pretty ridiculous notion of what constitutes fluency.

I’m glad that we’ve established that so conclusively, and that silly things like any support whatsoever aren’t required.

I absolutely sucked at Spanish, and Chinese for me has come much more easily. I’m neither self-deprecating nor do I think I have a ridiculous notion of what consitutes fluency. Maybe I’m strange, but I don’t find Chinese that daunting.

And I have run into plenty of people, particularly people with no real knowledge of the language or who are just starting out (as my friend who introduced the question is) that claim Chinese is hard because the characters are complicated and, while no means perfect, stroke count is a measure of character complexity. There’s lot of other factors that going into making a character “easy” or “hard,” but I think in most cases a character with six strokes is a helluva lot easier than a character with 15.

Anybody more skilled at Google than I am can find charts showing how long it takes English speakers to reach proficiency at various foreign languages, or for babies to reach a basic proficiency, and I recall Spanish as being one of the easiest, and Chinese far down the list…that would be about as scientific as you could get for something that’s completely obvious.

For an English speaker, characters would be more difficult if every single one was six strokes. Because there’s no phonetic relation, and because they’re intimidatingly alien-seeming. Compare that to Spanish, where you don’t need to study the written form for more than 5 minutes. Chinese would be more difficult even if you only bothered to study pinyin, and didn’t once look at a character.

Some people may be mystically tuned to Chinese. Perhaps it’s a former life thing.

I’m neither self-deprecating nor do I think I have a ridiculous notion of what consitutes fluency.

Comment wasn’t intended as a swipe at you John. I know you’ve been in China for a while and are doing translation work, so I’d expect your Chinese to be pretty good. I expect we’ve all run into the kind of people I was referring to though.

Since your post is called “characters aren’t really that hard” rather than “Chinese is easy because characters aren’t hard”, my earlier post was probably a bit off-topic. No offense intended, anyway.

Jeff says “how long it takes an English speaker to reach proficiency”, and I think that is the point which people always forget when they compare the difficulty of learning different languages — people can learn a language faster if it is similar to a language they already know. I think this is probably one of the main reasons why west europeans who have gone through compulsory primary and secondary school english classes tend to speak better english (on average) than chinese and japanese students who have learnt english for just as many years.

One of my long-term goals is to learn how to write a large number of chinese characters, and so I have thought a lot about how to achieve this. I think identifying which characters are easier to remember and which are harder is very important. No doubt this varies from person to person, but I would expect there to be some universal (or near-universal) trends. But I certainly do not think that the difficulty of learning to read or write a character is directly related to how many strokes it is made up of. My own theory, not yet verified, is that it has more to do with whether the character can be broken into radicals or not, and whether these radicals appear in other characters that you are already familiar with. There are some characters with only a few strokes that I had a great deal of difficult remembering how to write because they were relatively “unique” (like 麦). In his original post, John B brought up 就 and 想, saying that he had less trouble learning the latter. I suggest that this is because 想 consists of three common radicals, of which at least the top two are very easy to remember (symmetrical, etc), plus breaking it down as phonetic 相 plus semantic 心 is quite obvious. On the other hand, 就 contains two radicals which I too find more difficult to write (尤 contains only four strokes, but it is not common and its appearance is, for lack of a better word, “weird”), and the phonetic and semantic connections between 就 and its component radicals is not very clear.

In Chapter 1 “The Chinese Written Language” of Tao: The Watercourse Way, Alan Watts talks about how there are fewer strokes in many Chinese words than there are in their English equivalent:

…our customary bafflement by Chinese ideograms is really a matter of uninformed prejudice. They are supposed to be outlandish, weird, devious, and as tricky as “the mysterious East.” Although the K’ang-hsi dictionary of +1716 lists about 40,000 ideograms, a reasonably literate person needs about 5,000, and a comparably literate Wetserner would know quite that many words of his own language. The difficulty of recognizing and identifying ideograms is surely no greater than with such aother complex patterns as the various kinds of flowers, plants, butterflies, trees, and wild animals.
In other words, Chinese is simpler than it looks, and may, in general, be both written and read more rapidly than English. The English MAN requires ten strokes of the pen, whereas the Chinese 人 requires but two. TREE needs thirteen, but 木 only four. Water is sixteen, but 水 is five. Mountain is eighteen, but 山 is three…
To simplify matters further, Chiense makes no rigid distinctions between parts of speech. Nouns and verbs are often interchangeable, and may also do duty as adjectives and adverbs. When serving as nouns they do not require the ritual nuisance of gender, wherewith adjectives must agree, nor are they declined, and when used as verbs they are not conjugated…

Anyway, I’m not sure if I’m really convinced of his opinion, especially after studying Chinese for some time, but I do think his opinion provides an interesting contrast to what people usually think of Chinese.

That’s an interesting point, and it’s one I’ve thought about myself as well. Off the top of my head I can list a few reasons it doesn’t work, though:

English words are made up of letters, whereas Chinese characters are made up of radicals (mostly). English has 26 letters, or 52 if you count capitals separately. There are a lot more Chinese radicals than 52, and many have several variations. Furthermore, some characters have no clear radicals or somewhat unique component parts.

In English letters are strung together in a linear fashion, so remembering the spelling of a word is a one-dimensional affair. In Chinese, assembling radicals is two-dimensional, which makes memorization much more difficult.

Based on just the text you quoted, it appears that Watts is confusing words with morphemes. Yes, the Chinese character 水 corresponds to the English “water,” but the Chinese 木 doesn’t correspond to the English “tree” because “tree” is a word, but 木 in modern Chinese is not (in Japanese it is, though). So for the English word “tree” you need to choose the Chinese word 树. (Oh, how inconvenient… it has more strokes.)

About “English letters are strung together in a linear fashion, so remembering the spelling of a word is a one-dimensional affair”.

I’ve noticed that Chinese people for the most part, yes there are exceptions and classical mystery Chinese, but people can ‘guess’ what sounds of words are, decipher to me unintelligible handwriting, and ‘know’ which characters can and cannot combine. I’m sure there is some link between sounds, representation, meaning and grammar, but I don’t know what it is that makes ‘it click’.

My point is this, why is it in Engsh I oly n&d to pr#vde th# in(#als and fi#&al to be decipherable. In fact I can evrrn abd althrst unintelligible gibberish and the ‘word/meaning’ is still there. This is really magic to me. My Chinese isn’t good enough yet to know, but does something like this exist in Chinese?