The first batch of Seshat data

It’s been a long haul. Six years ago we launched the project that we eventually named Seshat: Global History Databank. Three years ago, following a series of workshops that designed the overall structure of the databank, we started collecting data. By January 2015 we had 30,000 records (a “Seshat record” says what the value of a particular variable is for a particular polity at a particular time; indicates the degree of uncertainty and any expert disagreements associated with this value; explains in a narrative paragraph how this code was arrived at, and provides the references). In January 2016 we exceeded the symbolic level of 100,000 records. Currently there are 170,000 records in the database, and growing.

Last year we started the analysis of these data by first looking at the various dimensions of social complexity. These include aspects of social scale—population sizes and areas controlled by territorial states; measures of “vertical complexity” (length of chains of command in administrative, military, and religious hierarchies); specialized government agents; civic infrastructure; and economic and information systems.

The question we asked was: how many independent dimensions are needed to describe variation in social complexity in our global sample? Can a single measure capture the bulk of variation? Or is there perhaps two— one reflecting social scale and the other non-scale aspects of complexity? Or more? Or, perhaps, different societies have unique histories and cannot be meaningfully compared in this way, as is argued by many historians.

This article has been submitted for publication to an academic journal. Once it is accepted, we will make the data on which the analysis is based available to all comers.

But we also didn’t want to wait too long to show the power of the Seshat approach, and so at the last project meeting in Oxford in January we decided to publish a chunk of our data. There are multiple purposes behind this development. First, we wanted to show our experts—historians and archaeologists with deep knowledge of the societies we code in the databank—what the “product” looks like. Second, no database is perfect, especially a large one like Seshat, and there are undoubtedly many mistakes still in the data. We wanted to open our data to public scrutiny. There is a button next to all Seshat records that can be clicked to make a suggestion on how to improve the code, or to dispute it and offer an alternative value.

The Seshat Databank is an evolutionary resource—it will evolve to become better and more accurate as a result of feedback from various types of users. The release of this first batch of data moves the project into a new public phase.

Note that the data released this week covers less than 5% of what has already been coded in the Seshat Databank. In addition to polities occupying the eight NGAs (Natural Geographic Areas) in this release, we have coded 22 more NGAs (see map here). Furthermore, in addition to the social complexity variables, we are also gathering data on religion and ritual, warfare, agriculture, norms and institutions, and well-being variables (see the Codebook here).

Finally, for those of you who make a living analyzing other people’s data (a perfectly honorable occupation). Seshat data, in machine-readable formats, will be made progressively available to you to download and analyze on this page. However, we ask that those who are interested in analyzing data wait until we release the data for analysis. As I mentionied earler, we have submitted an article that analyzes social complexity data for all 30 NGAs, and when the article is published we will make the data on which the analysis is based available to all.

But if you don’t want to wait until that time and have ideas for statistical analysis, contact us and we will consider collaboration!

Seshat News

Data from the Seshat Databank (data.seshat.info) is under Creative Commons Attribution Non-Commercial (CC By-NC SA) (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode) licensing. Do you agree to the reasonable and appropriate use of these data?