Late-Imperial Bibliographic Studies and Digital Quantitative Analysis

Modern scholars of late-Imperial Chinese literature benefit from collected texts printed during the Ming and Qing dynasties that are supplemented with bibliographic information on both extant and non-extant books. Cataloging old texts was traditionally an important part of late-Imperial Chinese scholarship. Scholars closely researched important works by exploring their textual histories, identifying forgeries, and tracing their provenance. Some of this information was eventually preserved in large annotated indexes.

Publishing houses also printed compilations of popular texts and sometimes reprinted entire libraries. Though some were commercial products, other endeavors aimed to preserve (particularly those sponsored by the government). Many examples of this exist, the most famous being the 18th century The Complete Library of the Four Treasuries (Si ku quan shu 四庫全書), compiled under the Qianlong (乾隆) emperor. This was accompanied by An Index of Summaries of the Complete Library of the Four Treasuries (Si ku quan shu zong mu ti yao 四庫全書總目提要), a bibliographic index that provided short descriptions of the titles within, as well as many that were not included in the Si ku quan shu. This tendency to publish collections of older books was common and likely lead to the preservation of many texts that may not have otherwise survived. Other important collections, such as the later The Collection of the Four Branches (Si bu cong kan 四部叢刊), first published from the 1910s to the 1930s, emerged from this tradition. In all, modern scholars have access to thousands of full length texts and valuable information on many more texts that are no longer extant.

The intensely bibliographic mode of scholarship popular during the Qing dynasty has fallen somewhat out of vogue, particularly with researchers in the United States. Perhaps this is owing to an accumulation of expansive, but difficult to generalize, information. Fortunately, the digital humanities’ increasing popularity has led to sophisticated tools, including increasingly easy to learn computer scripting languages, that enable scholars to incorporate this early textual scholarship into quantitative analyses of late-Imperial Chinese literature.

My dissertation topic initially drew quite narrowly on traditional textual studies, and did not immediately evoke the digital humanities. I am currently researching a class of popular late-Ming and early-Qing texts, which I call "quasi-history", at the Academia Sinica's Institute of Chinese Literature and Philosophy with the support of a Fulbright Student research grant. Individual people (rather than the government) produced these works, which "were ostensibly about historical events, but people processed and edited the stories the works told until their relationship with actual historical events became unclear. " These works fall mostly into three genres: novels on current events, drama on current events, and unofficial histories (野史 yeshi).

In the early stages of my research, I focused on traditional modes of analysis. I closely read late-Ming quasi-historical texts that examined the infamous eunuch Wei Zhongxian's (魏忠賢) death, cataloged the various extant titles, and explored their textual history. Thanks to his close relationship with the Tianqi (天啓) Emperor's wet-nurse, Wei was able to exert significant influence on the emperor. A few months after the Tianqi Emperor died in late 1627, the Chongzhen (崇禎) Emperor forced Wei to commit suicide. Within several months, a number of quasi-historical texts were published that purported to explain Wei's origins (with a strong subtext that aimed to explain how he obtained so much power). While I personally find this story very interesting and consider it representative of how quasi-historical texts manipulate historical events, extrapolating quasi-history as a late-Imperial phenomenon from this small group of texts is difficult. For that, I need to adopt a different approach.

Last December, my advisor, Tina Lu, and I discussed ways to overcome this problem. We hoped to find in a way that played to both the topic's strengths and my own interests. I had a large, but often difficult to access, source base and an interest in technological innovation. Playing off of work by previous scholars, such as Robert Hegel, we decided on a quantitative approach. I set about aggregating an extensive digital dataset on late-Imperial works.

Though other areas of Chinese literary studies are already taking advantage of increasingly powerful and inexpensive computers to perform "algorithmic reading," (to borrow a phrase from Stephen Ramsay), late Imperial Chinese texts are generally opaque to certain advanced types of digital analysis. In this case, my topic's strength is also its largest weakness: the large number of texts available means only a small percentage has been digitized. Further, the accuracy of those already digitized can be spotty. This lack of quality digitized texts precludes me from using some very interesting computer algorithms, such as topic modeling, in my research. Fortunately, current initiatives will soon allow Chinese literary studies to adopt digital humanities methods; from projects that aim to increase access to rare and important works, to others already analyzing full-text corpii of early Chinese works.

Despite these difficulties, so many works were published in the late-Ming and early-Qing dynasties that even a modicum of meta-data on these texts reveal trends less evident in traditional modes of analysis. The solution to my problem eventually emerged in the form of online library catalog records. Scholars and libraries throughout the world have produced numerous digital bibliographic records of rare texts. The International Union Catalog of Chinese Rare Books, a project led by Soren Edgren at Princeton University, deserves much credit for this. This project created a standardized format for cataloging rare works and incorporates information important to late-Imperial bibliographers. In doing so, it created a mass of digitized catalog records that include traditional “four-treasuries” classifications, physical descriptions, and so forth. In 2007 these records were integrated into WorldCat's online catalog so they are now publically accessible.

Knowing that this information existed on thousands of late-Imperial texts in digital format, and excited about its potential, I contacted WorldCat to ask permission to use their data for research. With the generous support of Yale University and my department (East Asian Languages and Literatures), the Online Computer Library Center granted me permission to use their bibliographic data.

This data has been invaluable in compensating for a lack of adequately digitized texts by making it easy to identify changes in textual parameters across the late-Imperial period. Exploratory statistical analysis using large quantitative data sets opens many future research avenues. This optimism should be somewhat tempered, however, because while the online library catalog records are extensive, they are not enough to produce an exhaustive depiction of late Imperial literature. They lack information on non-extant texts, and some records are not extremely detailed.

This is where the Fulbright program and the Academia Sinica have been indispensable to my research. With their support, I have been able to greatly expand the scope of my dissertation. Since arriving in Taiwan, I have bolstered the quantitative analysis portion of my dissertation using extensive modern and pre-modern bibliographic works and databases. I have also physically analyzed copies of late-Imperial texts. This eased transition into more narrow analysis. An exclusively digital approach does not provide enough resolution, so it was difficult to focus closely on the specific genres that make up quasi-history. Fortunately, scholars have invested significant energy providing resources that describe the contents of Ming and Qing fiction, drama, and unofficial histories. Texts such as the 500 Ming and Qing Novels (Wu bai zhong ming qing xiao shuo 五百種明清小說), A Catalog of Summaries of Vernacular Chinese Novels (Zhong guo tong su xiao shuo zong mu ti yao 中國通俗小說總目提要), the A Comprehensive Record of Ming and Qing Chuanqi Dramas (Ming qing chuan qi zong lu 明清傳奇綜錄), and the Dictionary of Chinese Yeshi (Zhong hua ye shi ci dian 中華野史辭典) have invaluably supplemented my dissertation research (despite their sometimes contradictory or incomplete information). These types of works break texts down in a way that allows for quick absorption of information (date of publication, place of publication, place and time of setting, main characters, authors, etc.).

Traditional scholarship left a significant repository of information that is fertile ground for quantitative analysis. By leveraging the interest late Imperial scholars had for bibliographic studies, with their focus on cataloging the physical characteristics of rare Chinese texts, I generated a database of information. When combined with several scripts I have written, I can perform a new type of digital aggregated textual study. My reliance on extensive bibliographic records of late-Imperial Chinese texts, often the fruits of late-Imperial scholars' labor, leads me to view this research as a natural progression from old-school Sinological research, only differing in its analytical scope.

This type of data is useful for learning many things, from tracking changes in textual size to which publishing houses are most likely to produce which genre of text. I hope this kind of macro-investigation of quasi-histories sheds light on historical imagination and collective identity. Among the various questions I tackle is an attempt to track changes in the “historical focal length” of quasi-history, by looking at the distance in time from the events a book describes and when the book itself was published.

Though this type of research is somewhat unconventional in my field, and more quantitative than I initially imagined, the Fulbright program has provided me with opportunities and connections to properly conduct my research, as well as the freedom to explore new and interesting directions. Less formally, the Fulbright community has become a support group and conversational sounding board, whereby people are quite willing to share their thoughts and advice on research issues I have encountered. My time in Taiwan thus far has been remarkably productive.

This email address is being protected from spambots. You need JavaScript enabled to view it. is a doctoral student in East Asian studies at Yale. His focus is the historical contexts of Ming and Qing fiction.