The first character frequency count based on a corpus of texts written in the modern Chinese vernacular was conducted by the educator Chén Hèqín 陳鶴琴 (1892–1982) in Nánjīng 南京. Chén and his collaborators were inspired by Thorndike (1921). They counted the character frequency in a corpus of 554,478 tokens, finding 4,261 character types altogether (Chén 1928). In the 1950s, several character frequency counts were carried out in the People’s Republic of China in order to determine which traditional full-form characters most urgently needed to be simplifi…

Although Chinese is conventionally written in Chinese characters, the graphic representation of the language and its lexical units should not be confused or even identified with one another. This also applies when looking at the frequency of occurrence of linguistic entities, like words (Word and Wordhood, Premodern and Word and Wordhood, Modern). While quantitative linguistics in China originally set out by exploring character frequencies, the ever faster development of digital computers and related software starting from the late 1970s opened n…

Menzerath’s Law is named after the German phonetician Paul Menzerath (1883–1954) who had observed that in German, longer words tend to contain shorter syllables, measured by their number of phonemes. He hypothesized that analogous regularities would possibly hold in other languages as well as in non-linguistic areas. Gabriel Altmann assumed that Menzerath had found a general law of language which he expected to hold on all levels of linguistic analysis: “The longer a language construct the shorter its components (constituents)” (Altmann 1980:1), or, in mathematical formulation:
y…

Zipf’s Law, in its linguistic application, states that in a text written in a natural language, the frequency of any word is inversely proportional to its rank number in the frequency table of all the words found in this text, arranged in decreasing order. It is named after the American linguist and philologist George Kingsley Zipf (1902–1950) who had examined various …

Although the graphic representation of Modern Chinese in the form of Chinese characters, which does not feature word separators like blanks, seems to suggest otherwise, it is a well-known fact that its lexicon – like those of most other languages – contains words of different lengths. However, when studying word length, it is necessary to first reflect on how it should be measured. A prerequisite to this endeavor is that there is some satisfactory answer to the problem of wordhood. A rather obvi…