What you see here is the result of the past 20 years of my efforts
to make Chinese character etymology information available online.
Please donate so I can keep this information on line and updated.
All information is free and without advertisements.
(Thank you)

When I was a young man of 22 in Taiwan in 1972 trying to become fluent
and literate in Chinese, I was faced with the prospect of learning to write
about 5000 characters and 60,000 character combinations. The characters were
complex with many strokes and almost no apparent logic. I found on the rare
occasions when I could get a step by step evolution of the character from its
original form, with an explanation of its original meaning and an
interpretation of its original form, suddenly it would become apparent how all
the strokes had come to be. The problem is that there is no book in English
that adequately explains this etymology and even if you read Chinese there is
no single book in Chinese that explains it all. In short it is a research
project to understand each character. To have this information at my fingertips
in English would have been a great help.

The first advantage of a computerized etymology is that you can do all kinds of
analysis that would be limited by the linear nature of books. The
second advantage is that etymology is an ongoing research project. We do not
know all the answers when it comes to character etymology. If errors or discrepancies are discovered
in a computerized system, they can be easily corrected. They can not be corrected in a
book that has already been published.

There are literally thousands of references on this subject, most of them in
Chinese. Most of them have something new, unique or
interesting to say but I only list here what I have found to be the top references.

In ancient China when characters were first invented they were formed from one
or more pictographs which either indicated meaning or pronunciation. Pictograph
means picture graphic. So we have some characters that have one or more
pictographs that alone or in combination indicate a meaning. Sometimes
characters will have one part that indicates meaning and another that indicates
pronunciation. In some cases it is hard to make up an ideograph, ideograph means idea graphic.
Because meaning is not easy to represent in pictographs, they sometimes just
borrow another character that has the
same pronunciation.

Primitives are the original form of a graphic. They should ideally be
recognizable, although they may require explanation. Over the years the
character forms were changed so the original pictographs are no longer
recognizable. The pronunciation too gets changed over the years, and finally the
meaning also gets modified. What we have left are modern characters or parts of
characters that I call remnants. Remnants are the modern form of a graphic. All
characters and character parts are remnants. A good example is the character
quan 犬 dog. We have characters that indicate that
the modern character remnant 犬 and 犭 were originally the same
and were clearly once a primitive picture of a dog. Even Confucius in 500 BC was quoted to say
"The ancients must have had very strange looking dogs". This example is worse
than most, but now, most Chinese characters are just a bunch of
complex strokes with no obvious connection with the meaning. So modern Chinese
characters are neither pictographs or ideographs.

The purpose of etymology is to trace back and find what those
remnants came from. A character has a meaning Dian
電 means electricity. Its modern meaning is electricity.
Its original meaning was lightning. Its interpretation
is a cloud with rain drops with lightning coming down and hitting a field.

I count about 400 primitives. If these primitives offer meaning to
the character their modern remnants are usually called Significs.
It is often not clear that a character is a signific
because either the meaning has changed so much or we can not get in the mind
frame of the person who invented the character. This is called Abstraction of the Signific.
A simple example is the string primitive Mi 糸 "string". Sun 孫
"grand child" would indicate a string and a Zi 子 "child", or the string of children, or by
abstraction, "grand child". This abstraction is easy, some are not.

There are about 800 characters that are used as phonetics in
modern Chinese. About a third of them can be readily recognized. Another third
can be recognized by literate Chinese and yet another third are problematic and
can only be analyzed.
It is very productive to study the phonetic shifts since ancient times, some being natural
and some being influences from other dialects.

Book References:

Analytic Dictionary of Chinese and Sino-Japanese by Bernard
KarlgrenThe classic English analysis of Chinese phonetics.

Script refers to the symbols
in which a language is written. The Chinese
writing system has been borrowed by or has influenced many languages and
Chinese dialects other than the current standard which is Mandarin. For Chinese
and all other characters derived from or influenced by Chinese characters I use
the term Chinese Derived Characters.
These languages include Cantonese, Taiwanese, Shanghaiese, Japanese, Korean,
Vietnamese, Jurchen, and other dialects. This web site is dedicated
to the etymology of Modern Chinese
characters which will include information from Chinese dialects of Mandarin,
Cantonese, Taiwanese and Shanghaiese.

This refers to the script used to write modern Mandarin.
In English we have an alphabet and
we spell things with a fixed number
of 62 letters and numbers from which we make about 60,000 modern
English words known by the average native speaker. In modern Chinese the literate
adult uses fuzzy number of about 5000 characters that correspond to single
syllable Mandarin words. These characters can be used to form about
60,000 multi syllable Mandarin words
used by modern native speakers. The problem is the fuzzy number nature of
Chinese characters.

On an English typewriter or computer we can use or make-up any word we want with
little trouble using exactly 62 letter-number symbols. In Chinese we can hand
write or sometimes make-up any character we want. The problem with a Chinese
typewriter or computer is that we have to limit the characters that can be used
ahead of time. It is like making an English typewriter that can only print a fixed number
of words, with no compensation for new or special words. The old
manual Chinese typewriters had 7000
characters, The GB2312-80 computer
standard for Simplified Chinese has
6763 characters. The Big5 computer
standard for traditional Chinese has
13051 characters, more than twice as many as most people use. The Unicode "basic multilingual plane" tries
to combine all Han characters from Simplified and Traditional Chinese, Japanese,
Korean and Cantonese and comes up with a total of 27,484. The question of what is a
simplified or
traditional character is very complex and will be discussed separately

Book References:

Chinese, Japanese, Korean and Vietnamese Computing - CJKV Information Processing by Ken Lunde
This is the best book on the computerization of CJKV languages.

The Unicode Standard Version 4.0 The Unicode standard.

常用國字標凖字體表
Published by the Ministry of Education of Taiwan listing the 4808 characters necessary for adult literacy.

Modern characters are written as a composition of simple strokes as if they were
written by a brush which has been the main writing instrument for the past 1800
years. Before this people used a totally different style of characters that
were written with reed pens on bamboo slats. There was a transition around 1 AD
to a simplified stroke based character rendition using reeds to write with.
This style was called LiZi 隷字 or LiShu
隷書 The word Li means “crude” because at the time this simplified form was considered to be non
standard. I use the word LiZi to indicate historical accurate renditions of
characters that actually existed in the period 1 AD to 200 AD as opposed to the
word LiShu which is a modern calligraphic style. As far as the current analysis
system is concerned, LiZi is considered to be an intermediate step in the
evolution between seal characters and modern characters. After the invention of
the brush for writing in about 200 AD, the stile became called
KaiZi 楷字 or KaiShu 楷書. The brush brought
some more rather minor changes in form and these characters were taken as
standard. The word Kai means “standard”. By 200 AD, they had become the
standard characters. Many common characters used in 200 AD have died, new ones
have been invented. There have been some, mostly minor changes in how some
characters are written and some changes in meaning. The HanYuDaZiDian
漢語大字典 is the largest dictionary of Kai type characters. It includes over 56,000 modern
printed Chinese characters, both simplified and traditional used over the past 2000
years. I call them modern because they are in the modern style. Most of them
are rare characters or rare alternates and not part of useful modern Chinese.
About 25% of modern characters did not exist in 200 AD. Most of the characters
in use then would be recognized today, although the meanings may have changed.

Book References:

HanYuDaZiDian 漢語大字典 8 volumes
The largest Chinese-Chinese dictionary of single characters

When Chinese write characters, they may write quickly so that the
strokes run together. This is called cursive Chinese, XingShu 行書 "running script".
Chinese over the years have devised a number of very cursive forms called "super
cursive", called CaoShu 草書 "grass script". The word grass refers to the
fact that it resembles flowing grass. The earliest form date back to 200 BC
and is called ZhangCao 章草,
documentary grass script.
This is a modification of LiShu. The most prevalent form of super cursive is JinCao
今草, modern grass script. It was pioneered by
WangXiZhi 王羲之 321-379 AD. It is still used today.
The third style was used in the Tang dynasty 618-905 AD it is called KuangCao
狂草, erratic grass script. There are rules for
super cursive and if you do not understand them you can not understand the
writing. Most modern Chinese are limited in the amount of super cursive Chinese
they can read. Still a fair percentage can read it. Super cursive is used to
allow for fast writing and it is also simplified. super cursive dose not fit the
simple stroke concept of printed Chinese. At some times in the past people have
taken the super cursive form of character and re-strokified them resulting in a
simplified printed form. This process is called CaoShuKaiHua
草書楷化 super cursive print formation.
This is where many of the modern simplified
characters come from. So to understand the etymology of Simplified Chinese it
is necessary to understand something about CaoShu.

Book References:

草字基本符號硏究 (上,中,下) by 趙緟華 and 任漢平
One of the best Chinese discussions of super cursive Chinese

行草讀本 Chinese Cursive Script An introduction to Handwriting in Chinese by FangYuWang
One of the best English discussions of super cursive Chinese

No one can control the set of characters that people actually write
with. So when the Communist Chinese in 1956 decided to tell people how to
simplify there language, at first they could only offer some general rules. By
the 1980s we have the advent of two computerized character sets that by default
are supposed to represent Simplified and Traditional characters.

Reduction in character number

Part of the attempt to make a Simplified
Chinese is to reduce the
number of characters in common use.

The GB2312-80
character set adopted on December 23, 1980 has 6766 characters. GB means GuoJiaBiaoJun
国家标准 "National Standard". It is quite
adequate for most people. Some problems are that Chinese people like to use
rare characters in their names, and those people usually have to find another
character that has the same pronunciation or the same meaning. Some place names
used old characters and had to change their names. If you wanted to use old
or rare characters from ancient literature, you just had to figure out some way
around the issue, rewrite the poem, or spell it out or use modified characters
or something. In any case 6,766 characters is quite enough for most people to
function with. News papers, from time to time, have been strongly encouraged to
limit the number of characters even more to 3500, since even 3500 is adequate for good literacy
if you make a few adaptations.

The Big5
standard for traditional Chinese put together by the then top 5 computer companies
in Taiwan has 13053 characters. Of them 5401 common Chinese characters are
arranged in hexadecimal pages A4-C6 and 7652 less common Chinese characters are
arranged
in hexadecimal pages C9-F9, If you are a literature major even this number is
inadequate, We really need the 56,000 characters from the HanYuDaZiDian. If you
are an ordinary literate adult, this is more than you are likely to
ever use. This means that more than half the traditional characters have no standard simplified form.

Are all simplified
characters actually rare traditional characters ?

By a very long stretch of the imagination, this is true.
Some simplifications are actually reverting back to older forms.
Some simplifications are rare and very non standard monstrosities
that have been seen somewhere in history.
Some are actually re-strokefication of known super cursive forms
to make new Kai type characters
It is true that all have some kind of historical justification.

350 Unique Simplifications

There are a set of 350 stand alone unique simplifications. That is
the characters are simplified but it is independent of seeing that character as
part of another character. In a few cases there is more than one character that
gets simplified to the same character. 366 characters get simplified to 350 new
characters.

132
Radical and Stand alone
Simplifications

There are 132 simplifications in which the stand alone character and any
contextual occurances of the character are simplified.

Simplest form of common alternates.

144 simplified characters are different from traditional in that they are
the simplest of several common forms. Most Chinese are unaware of which are
simplified are
which are triditional and these are not specifically defined by the Chinese government, they
just happen to be different in the Big5 vs. GB character sets.

Un-simplified Characters

Many of the characters have no different simplified form. They were considered
simplified enough already. So 6,766 simplified GB characters correspond to 6,883
traditional Big5 characters. 4,411 of the traditional Chinese characters have
the same 1-1 simplified equivalent excluding trivial style differences. We can
now consider 2,355 simplified characters to 2,522 traditional characters which are different.
The rest are unsimplified

1 to N simplification

Sometimes multiple traditional characters were simplified to one character.
This accounts for the disappearance of 188 characters which are in Big 5
classical set which have converged simplified forms in GB

In the large character sets you are talking about characters which most people do
not know. The Ministry of education defines 4808 traditional characters which a
student should know to get out of high school. If you know all of these
characters you can look a Chinese in the eye and say "I am adult literate".
You will still occasionally run into characters outside of this set.

To completely understand these characters you must realize that many characters
have multiple pronunciations called PoYinZi 破音字 "multiple pronunciation characters".
Most of the time these differences in pronunciation are trivial differences
based on where the character is used. Sometimes the differences are not so
trivial. Sometimes the differences in pronunciation are an indication that the
modern character may have been derived from two different ancient characters.
This lresults in the list of 4808 basic characters becomming 5300
character-pronunciation combinations.

Book References:

Modern Chinese Characters 现代汉字 by Yin Binyong and John S Rohsenow
A good English discussion of Chinese characters and simplification.

简化字源 by LiYaoYi 李乐毅 The Origins of Simplified Chinese Characters
A good Chinese discussion of the simplification story.

In 221 BC Chin Shi Huang 秦始皇 came to power and declared that the
proliferation of Chinese characters had become too complicated. He assigned his
Prime Minister LiSi 李斯 to make a standard set of official characters. He also
declared that all the old documents should be destroyed. This unification and
2200 years of history mean that very few written artifacts survive from before
221 BC. The characters of this time are well known and understood thanks to the
dictionary by XuShen 許慎 called the ShuoWenJieZi 說文解字 written in about 147 AD.
Our earliest copy dates to the
Song dynasty but we think the existing copies are fairly accurate accounts of
the original and of the time. This style of characters lasted until about 200
AD, but have been used continuously for some official documents and for
official seals, thus the name seal characters. The proper name should be Chin-Han characters.

In my research I use several sources for Chin-Han characters. The ShuoWen is
like the Rosetta stone of Chinese. Without it, it would have been almost
impossible to decipher the texts of the Zhou and Shang Dynasty. It is
also apparent that XuShen had little or no access to texts before 221 BC. When
we compare Usher's description to earlier archaeological artifacts we find
many, perhaps 30% of the descriptions have some degree of error ranging from
minor to just wrong. XuShen is still a great man, the Galileo of Chinese
etymology.

Book References:

ShuoWenJieZi 說文解字 The earliest complete 987 copy by XuXuan 徐鉉
My main seal character
database comes from the 11109 clearly printed characters found in this version
of the ShuoWen

ShuoWenJieZi 說文解字 The standard 1815 copy by 段玉裁
This version discuses slightly fewer characters but is probably the standard version of the ShuoWen

Chinese Characters Their Origin, Etymology, History, Classification and Signification
by Dr. L. Wieger, S.J
The most comprehensive English discussions of seal characters mainly from the
ShuoWen point of view.

Actually the Zhou Dynasty ended in 255 BC but the seal characters were not
standardized until abut 221 BC. From the
beginning of the Zhou Dynasty 周朝 to the ChinShiHuang 秦始皇
unification people would
have written on bamboo strips, but because of the ChinShiHuang destruction of
books and 3000 years of time we have few samples from bamboo strips. What have
survived are several thousand cast bronze articles with inscriptions of major
events. We have excavated many of these objects and this is what we know about
Zhou Chinese. We call these bronze characters, but we could just as well
call them Zhou
characters because they cover most of the Zhou Dynasty.

The peculiarities of bronze characters are:

One, the comparatively primitive bronze casting technology of that time means that we
can-not depend on the characters to be as accurate as they would be if they
were written on bamboo. They have casting flaws.

Two, they have undergone 2000 to
3000 years of corrosion which further deteriorates their condition.

Three, some of these objects were excavated recently, and thus we can depend on their
authenticity. Others have been around for hundreds of years and may be forgeries.
The making of forgeries was particularly prominent during the Tang Dynasty 600
A.D. to 900 A.D.

Four, the inscriptions range from single characters on coins
to several hundred characters on some large bronze objects. One of the main
references the JinWenBian covers about 4000 objects. 24,223 different sample characters
in all, representing about 4000 different characters.

Five, since these inscriptions
mainly commemorate important events, we may not find some of the every day
characters that were in use.

Six, these few artifacts range over the
entirety of China and over a thousand year period. This is good in that it
gives us a large range of samples, but not good in that we can not get an
extensive sample of any one place or time.

ShuShen describes a type of
characters called greater seal characters. These were the type of characters
that were supposed to be used during the Zhou Dynasty. They are often quite
different than the real samples we find in the bronze characters.

Book References:

JinWenBian 金文编 by RungGeng 容庚
Used for my database of 24,223 bronze characters.
This is the most accurate book of
character samples from the bronze artifacts.

Oracle bones were only discovered
in 1895. When we say oracle bones, we mean either the front plates (plastrons)
of turtle shells, or the shoulder bones (scapula) of oxen. The people of the Shang Dynasty would cut inscriptions in the bone or shell with a sharp object,
and then see how the bone broke when exposed to fire. In this way, they would
attempt to cast fortunes. The uneducated Chinese of the 19th century who first
found these bones thought they were dragon bones and ground them up for
traditional medicine. The writing was obviously not readable to them. We have
been studying them and digging them up and trying to put them together now for
a hundred years. We can understand somewhat over half of the character samples,
which means we can understand around 95 percent of the text.

Peculiarities of oracle characters are:

One, the
oracle bones and turtle plastrons all come from one excavation site. If it were
not for this one site, we would have no direct proof that the Shang Chinese
were really literate. The shells cover a period of about 200 years from about
1300 B.C. to about 1100 B.C. The advantage of this is that we have a small number of writers,
all from one place, and extending over a relatively short period of time. This gives us a
kind of average and we can at least talk about how the people of that time and
place wrote.

Two, the
pieces are a real mess. By some estimates, a total of 400,000 pieces were found. Several thousand
plastrons and bones have been reconstructed, and several tens of thousands of
sentences have been studied. I have compiled a database of 31,876 sample characters
that represent about 4000 different characters of which
we think we understand between 1500 and 2000.

Three, from
the analysis of characters like dian 典, we believe that the usual writing medium
for the time was the bamboo strips. The first actual examples of bamboo strips
we have date back to about 400 BC. So by that time we already have almost a
thousand years of Chinese for which we have proof that writing existed, but for
which there is not one single bamboo strip.

Four, the characters of 1300 BC have already undergone a high degree of abstraction. When
we are told what they represent and how they are supposed to be interpreted it
seams in most cases fairly obvious. Unlike Egyptian hieroglyphs, it is not
obvious, however, to a casual observer what most of the characters represent. This is an
indication that the writing system had already been around for a long time.

It is believed that spoken language
developed a little at a time. A language with 10 words is more useful than a
language with no words. 100 words are better, and so on. With written language
on the other hand, a written system that can not represent at least the
majority of the spoken language is virtually useless. Imagine a written
language that can only represent half of the concepts that you can talk about.
Why bother to learn it.

Five, the
purpose of the oracle bones was to cast fortunes. There was a lot of writing
done here, but it is like the vocabulary you might find in a horoscope.
We can assume that they probably had many characters for more every day common
things that never appeared on the oracle bones. We might be able to extract
5000 characters from the oracle bones, but there were probably twice that many
in use at the time.

The traditional story says that a
man named Chang
Jie 倉頡 invented the writing system around 3000 BC. You can only say so much
with paintings, and tokens. I think that when an innovative artist found that
he might represent words with basic symbols and phonetic parts, he and probably
a group of people were commissioned to invent and learn a writing system for
practical purposes.

Book References:

We need to be careful about copying
these characters so that we do not influence the form by our own interpretation
of the character which may be wrong. The following two are the most accurate books
of character samples from the oracle artifacts.

JaGuWenBian 甲骨文编 by ShunHaiBuo 孙海波

XuJaGuWenBian 續甲骨文编 by JinXiangHeng 金祥恒
My database of 31,876 oracle characters is taken from this reference.

YinXuJaGuWenHeJi 殷墟>字合集 13 volumes
There may still be questions or discrepancies since this is still an area of research. One will
want to see the original objects and sentences This is the largest resource for
the original pictures