Designing Databases for Historical Research

E. Entity relationship modelling

E1. Introduction

Throughout this Handbook so far reference has been made to
the translation and conversion processes involved in taking
information from sources and turning them into data within the
database. This section describes precisely the tasks involved in
performing these processes, which are collectively known as
Entity Relationship Modelling (ERM). The mechanics of ERM are in
fact a lot less intimidating than the name implies, but it is
nevertheless a complex activity, and one that is likely to prove
challenging at the first few attempts. Luckily, however, the
various stages of ERM draw very heavily upon the skills and
experience that the historian utilises as a matter of course
during their research anyway, which, unlike most aspects of
database use, places the historical researcher at something of an
advantage. The difficulty of the ERM process is directly
proportional to the complexity of the source(s) being used in the
research, with some types of sources being (relatively) simpler
to model than others. Highly structured sources like census
returns, lists of inhabitants, poll books and so on will be
easier to model than ‘semi-structured’ sources such as probate
inventories, which in turn will present fewer problems than
completely unstructured material such as narrative texts and
interviews, and so on. However all will have their own particular
features and problems to complicate the modelling.

The process of ERM serves a number of purposes. Firstly, it makes
the historian decide upon what it is the database is to achieve
in terms of its functions. Secondly, it identifies the types of
information that can be obtained from the sources, and in
conjunction with the database’s chosen aims, aids the historian
in deciding upon which information from the sources should be
entered into the database, and which can be can be excluded.
Thirdly, ERM makes the historian think in detail about the
components of the database, its tables, fields, relationships,
datatypes, and so on, decisions on all of which are crucial to a
successful database design. Finally, it encourages the
consideration of the layers of the database, what information
needs to be entered into both the Source layer and the
Standardisation layer, what can be entered only into the latter,
and how extensive the latter needs to be. Once these tasks have
been conducted, the historian is left with a very precise idea of
what the database will look like, and, on a more practical note,
will be left with the design of their database on paper (an
Entity Relationship Diagram [ERD]).