Do you know how to look up and visualize information in Chinese historical sources? Nearly every day, there are news articles about how big data and computational methods such as mapping and network analysis are changing our world. They are transforming the study of Chinese history as well; scholars could no longer ignore the potential of digital tools.

What sorts of questions about Chinese history can be asked and answered using computational methods? What are the main tools that scholars can use? This one-day workshop featuring experts from Harvard and beyond will provide an overview and practical training.

We will first introduce two main tools, CBDB and MARKUS. The China Biographical Database (CBDB) is a relational database with biographical information about more than 360,000 individuals, primarily from the 7th through 19th centuries. The data is open to use for statistical, social network, and spatial analysis as well as serving as a kind of biographical reference. The standalone version of CBDB in Microsoft Access format enables many functions that are not available in the online version. The MARKUS text analysis and reading platform is a multi-faceted tool that allows users to access a range of online reference tools while reading texts in literary Chinese, and/or to tag and extract information of interest to them. In addition to names already present in China Biographical Database and China Historical GIS, users can tag words or expressions by uploading their own lists or by using the keyword help tools.

We will then demonstrate the uses of spatial analysis for historical GIS data from China. There will also be content about network analysis (SNA) as a methodological approach, its basic concepts, and the use of software for simple visualization and analysis of network data on Chinese history. The day will conclude with presentations of case studies that came out from digital projects.

This workshop is part of the Automating Data Extraction from Chinese Texts (DID-ACTE) Project, which aims to provide humanists and social scientists with means of transforming historical Chinese sources into structured data. The project is funded by the Digging into Data Challenge, an international research initiative to develop big data analysis methods for the humanities and social sciences. MARKUS is developed by Brent Hou Ieong Ho as part of the European Research Council funded project "Communication and Empire" at Leiden University, which is led by Hilde De Weerdt.

Workshop report:

The “Computational Methods for Chinese History: A ‘Digging into Data Challenge’ Training Workshop”, organized by the China Biographical Database (CBDB) project, was held at Harvard University on October 17, 2015. The workshop was put together to show researchers of Chinese history how to utilize and manipulate data of interest, as well as showcase projects that make use of computational methods. To provide hands-on training and demonstration, the event was held in a computer lab in Harvard’s Science Center. Over 50 participants from Harvard and beyond, ranging from graduate students to senior scholars, took part in this one-day workshop.

The first presentation was given by Michael A. Fuller (UC Irvine), the designer of the structure of CBDB, which is a relational database with biographical information about more than 360,000 individuals primarily from the 7th through 19th centuries. This data is open to all researchers for statistical, social network, and spatial analysis, and can also serve as a kind of biographical reference. Fuller introduced some concepts about modelling historical data, then explained the advantages of having a database that is relational for storing information of biographical figures. He also guided the workshop participants through the installation procedures and basic operation of making queries and exporting data on the standalone version of CBDB, which is in MS Access format and downloadable from the project’s website.

In the next session, Lik Hang Tsui (Harvard University) introduced the open-source platform MARKUS, which was developed by the European Research Council funded project “Communication and Empire: Chinese Empires in Comparative Perspective”. He showed how one could use the platform’s different techniques and reference tools for reading a wide variety of Chinese historical and literary texts, including the tagging of personal names, dates, place names, official titles, etc. Hongsu Wang (Harvard University) further demonstrated methods of extracting and converting such textual information for analysis. These allow users to utilize the tagged data for the purpose of visualization, which was the theme of the next two presentations.

In his presentation, Peter K. Bol (Harvard University) demonstrated the uses of spatial analysis for historical GIS data from China. He outlined the kinds of research questions that could be asked or even answered by applying GIS techniques to data about China, such as from the open-access China Historical GIS project and ChinaMap. Mapping results of queries with such data enables researchers to identify further points of interest that are related to locational factors. Song Chen’s (Bucknell University) presentation concerned historical social networks. He gave a concise introduction to concepts in social network analysis, then provided a step-by-step tutorial of how biographical data from CBDB could be visualized in the form of network graphs in the application Gephi.

The workshop concluded with four presentations of case studies that evolved from digital projects. Hang Yin (Peking University), the former project manager of the CBDB editorial team in Beijing, reflected on the workflows of how their team inputs, processes, and cleans up data in both manual and semi-automated ways for CBDB. He reminded researchers to be aware of the possible pitfalls of manual data processing if the goals are not adequately well-defined. Donald Sturgeon (Harvard University) introduced his study of text reuse based on the Pre-Qin and Han data generated from his Chinese Text Project. By analyzing and visualizing these textual relationships, he identified the clustering of texts according to schools of thought of the time. Xin Wen’s (Harvard University) study was about the military garrisons (fubing) system in the early stages of the Tang dynasty. By mapping the locations of those garrisons, which was of crucial importance to the empire’s military strength, he observed that they did not correspond to the population density of the time. Instead, the elites were clustered along the capital corridor, indicating the political significance of that region. The final presentation by Weichu Wang (Harvard University) took a comprehensive look at families which produced multiple jinshi degree holders in Ming China. By taking newly available data of name lists of degree holders in the CBDB, she was able to show the geographical distribution and other characteristics of such families as part of her effort to quantify and analyze social mobility in China during that period.

The workshop has attracted the attendance of historians from a good variety of fields in East Asian studies. Their interest in this workshop is testimony to how the current state of digital methods and datasets are transforming the study of Chinese history. Scholars could no longer afford to ignore the potential of these new research approaches.