In a probably futile effort to stave off Alzheimers by torturing my brain, I have been trying for a while now to learn to read Chinese. Chinese is crazy difficult, for a number of reasons that I won’t go into here. I don’t have the time to devote to it that it would take to become actually fluent, which as best I can tell would require dying and being reincarnated as a Chinese person plus about 20 years of full time study. So my goal is more modest, just reasonable reading comprehension, mainly for reading patents and doing patent-related text mining. I try to spend half hour or so a day reviewing vocabulary and reading things, to reassure myself that I’m still dumber than a Chinese ten year old.

Trying to learn to read Chinese might seem like a waste of time — why not just use Google translate? It turns out that for most anything of adult reading level and complexity, the output of Google translate for Chinese to English is essentially incomprensible gibberish. Machine translation between Chinese and English is a seriously non-trivial undertaking because the two languages are so different on so many dimensions. Personally, I’m skeptical that the current statistically based approach to machine translation can be made to work here; I would bet we won’t have credible machine translation of Chinese until we figure out how to take grammar and semantics into account. My admittedly non-expert take is that the effectiveness of statistically based translation falls off much faster than linearly with increase in corpus size, and with a corpus the size of Google’s, I doubt that even an order of magnitude increase would bring much improvement in the quality of the translations.

Anyway, a year or so ago the U.S. Patent Office started providing an online search tool for Chinese patents and patent applications. This is quite handy as a source of practice material for reading. You can search in English, and whatever patents you find can be viewed and copied either in the original Chinese or in English machine translation (again, mostly gibberish). But in quite a few cases, patents filed in China are translated (by human translators, not by machine) and filed in the U.S. also, so it’s fairly easy to find Chinese patents and good English translations to go with them.

At this point I’ve read 30 or so (not always the entire text, they can be kind of boring and repetitious), so I thought I would go through my vocabulary list and pull out some of the patent-related terms. A source that I use a lot is the cedict open source Chinese-English dictionary, and in the course of reading all those patents there were some patent-related terms that weren’t in cedict or the other dictionary that I mainly rely on (WenLin), so it was necessary to figure them out via a combination of translating short phrases in Google translate, checking other online tools like iciba.com, nciku.com, and Baidu, and inferring meanings from usage based on experience with patent phraseology in English.

To the extent that these terms were not in cedict (about half were not), I am going to submit them for inclusion. However, in the somewhat unlikely event that someone else is interested in reading Chinese patents, I thought that it might be helpful to have a list of some of the basic terminology in one place, so the following list includes all the ones I pulled out, including the ones that are in cedict. If anyone has others, please email me or put them in a comment and I’ll add them to the list.