Chinese Language Stack Exchange is a question and answer site for students, teachers, and linguists wanting to discuss the finer points of the Chinese language. It's 100% free, no registration required.

I have checked a frequency list and added characters from it now and then (mostly for fun, I don't think this is the best way of learning new characters). I'm now at 5400 and i think I stopped learning really useful characters at around 4000. After that, there are many place names, surnames and characters that only appear in one or two words that are themselves not very common. 3500-4000 will enable you to read most normal texts in modern Chinese. (Note: I'm referring to traditional characters here, not sure exactly how simplified would differ.)
–
Olle LingeMay 27 '14 at 1:10

@ash Nice, I wasn't aware of that Google data, and great analysis by yourself!
–
CocowallaSep 4 '14 at 15:04

1

One reason not to overemphasize frequency is that you cannot become fluent by recognizing individual characters. You need comfort with actual usage on signs and in sentences. You need to read a lot. Start with graded readers of course but get to actual Chinese written matter.
–
Colin McLartySep 26 '14 at 16:01

7 Answers
7

Depending on how 'in touch' with the language (in terms of understanding grammatical constructions and context clues) you may be considered fairly fluent with 80% or so character recognition

Think about your vocabulary in English. If you pick up a book with many words you don't know, you may still be able to comprehend it based on the context of the words that you might not know. Same applies for Chinese.

If you are interested in numbers and facts, here are the results of some studies:

Cumulative Character Frequency for the Top N Characters based on two studies (Huang 1994 and Da 2004)

Top 250 characters: 64.4% / 57.1%

Top 500 characters: 79.2 / 72.1%

Top 1000 characters: 91.1 / 86.2%

Top 1500 characters: 95.7 / 92.4%

Top 2000 characters: 97.9 / 95.6%

Top 3000 characters: 99.4 / 98.3%

The above table tells us that the top 1000 characters account for between 86% and 91% of the characters occurring in the real world. Assuming, with great hope, that there is a good correlation between the top 1000 characters found in these studies and the 1000 characters that most second year college students are supposed to master, we can conclude that there is a light at the end of the tunnel. Finally, while we wouldn't advocate studying characters solely based on their high frequency, we believe that studying such lists is a reasonable supplement to conventional study programs. Such lists also provide a sense of just where one is on the path to full Chinese literacy.

The results from the above studies have been used to generate two different sets of flashcards.

The Top 1000 Traditional Characters is based on C .H. Tsai's combined 1993–1994 data.
The Top 1000 Simplified Characters is based on Jun Da's combined classical and modern Chinese data, which has the advantage of being based on more recent data as well as more formal publications.

I had heard various numbers over the years, so I guessed at 4000 and generated computer flashcards for reading and writing all the Chinese vocabulary I'll need for the foreseeable future. There are currently 32614 cards with 4166 characters and 18385 words. They are divided into separate files, each with about 100 cards in it. You can download the flashcards for the first 700 characters from my Google Code page. I'll publish the remaining characters after I'm happy with the current set.

The cards cover English definition, simplified character, traditional character, and pinyin pronunciation of each character, as well as the most common words using those characters.

The software is Mnemosyne, which uses the spaced repetition technique to schedule when you review the cards you already know. It's not fast, but it seems to stick a lot better than manually reviewing vocabulary lists.

Here's a screenshot:

In a couple of more recent projects, I tried to build some tools to help with reading practice. I sieved through Chinese text looking for sentences that use only the characters I know. My first attempt was with Twitter updates, but it was kind of a slow process to find and translate them. More recently, I found a huge collection of translated sentences on Tatoeba. I've collected all the sentences that used the 500 most common traditional characters and posted them on my Google Code page. They're sorted with the most common characters at the start of the page.

According to Zhonghua Zihai, the largest Chinese character dictionary, there are more than 85,000 Chinese characters! However, research (Huang 1994 and Da 2004) shows that the most frequently used 1200 Chinese Characters account for about 90% of the characters occurring in the real world. Therefore, this is about the number of characters needed for a learner to reach intermediate level, which is also the level to be able to communicate effectively, or in other words, to be able to carry on a daily conversation in Chinese for average Chinese learners or to be able to do business in Chinese for business people.