Designing Databases for Historical Research

C. Fundamentals of database design

C3. Conceptual models of database design

Whilst it is true that every database ever built has been
designed specifically for a particular conjunction of purpose and
data, and is therefore to a greater or lesser extent distinctive,
it is also true that there are two principal overarching
approaches to designing databases. The two conceptual models are
known as:

The Source-oriented
approach (sometimes called the Object-oriented
approach)

and

The Method-oriented approach
(also known as the Model-oriented approach)

These two models should be viewed as polar opposites at the ends
of a sliding scale, where the design of a database is based on an
approach somewhere between the two extremes. Every database
design will be something of a compromise, and no database will
ever constitute the ‘perfect source-oriented database’, nor will
there ever be the ‘perfect method-oriented database’.

C3i – The two conceptual approaches to database design

The Source-oriented model of database design dictates that
everything about the design of the historical database is geared
towards recording every last piece of information from the
sources, omitting nothing, and in effect becoming a digital
surrogate for the original. The information contained within the
sources, and the shape of that information, completely ordains
how the database must be built.

The lifecycle of an ideal source-oriented database can be
represented thus:

C3ii – Lifecycle of the Source-oriented database

This approach to database design is very attractive to the
historian as it places the sources at the centre of the database
project. Entering data into a database is a very time consuming
activity, however, and this becomes much more so if you are
taking pains to record all of the information that exists in your
sources. Ultimately you will need to make choices about which
information you will exclude from the database, contrary to the
principles of the Source-oriented model, which will undermine the
database’s role as a digital surrogate for your sources but which
will at least allow you to perform your research within a
reasonable period.

The Source-oriented approach, if rigidly applied, can lead to a
design that quickly becomes unwieldy as you try to accommodate
every last piece of information from your source, some of which
may only occur once. But, it does allow for wider analytical
approaches to be taken later, so that potential queries are not
reliant on the initial research agenda, meaning that the database
does not restrict the directions your research might take. It
also allows you the reassurance of not having to anticipate all
of your research questions in advance, which the Method-oriented
model does. The Source-oriented model transfers the source (with
all its peculiarities and irregularities) in a reasonably
reliable way into the database with little loss of information –
‘everything’ is recorded (or at least what is excluded is done so
by your conscious choice), and if later something becomes
interesting, you will not have to go back to the source to enter
information that you did not deem interesting enough to begin
with. The Source-oriented model also enables you to record
information from the source ‘as is’, and lets you take decisions
about meaning later – so ‘merc.’ can be recorded as
‘merc.’, and not expanded to ‘merchant’ or ‘mercer’ at the point
of entry into the database. [1]

At the other end of the scale, the lifecycle of the
Method-oriented model database could be represented in a
different way:

C3iii – Lifecycle of the Method-oriented database

This approach to database design is based on what the database is
intended to do, rather than the nature of the information it is
intended to contain. Consequently, if adopting this model for
designing your database, it is absolutely vital that you know
before you begin precisely what you will want to be able to do
with the database – including what queries you will want to run.
The level of precision needed here should not be underestimated
either, given that the database requires a high degree of
granularity to perform analysis –the database will not be able to
‘analyse the demographic characteristics of the population’, for
example, whereas it will be able to ‘aggregate, count and link
the variables of age, gender, marital status, occupation,
taxation assessment, place of residence’ and so on. When
designing any database it will be necessary to think at this
latter level of detail, but if you are designing a
Method-oriented database then it becomes much more important.

Method-oriented databases are quicker to design, build and enter
data into, but it is very hard to deviate away from the designed
function of the database, in order to (for example) pursue newly
discovered lines of enquiry.

Ultimately, historians will need to steer a middle course between
the two extreme models, perhaps with a slight tendency to lean
towards the Source-oriented approach. When making decisions about
what information you need from your sources to go into the
database, it is important to take into account that your needs
may change over the course of a project that might take a number
of years. If you want to be able to maintain the maximum
flexibility in your research agenda, then you will need to
accommodate more information in the database design than if you
are very clear on what it is you need to do (and what that is
will never change). If you do not know whether your research
needs will change, err on the side of accommodating more
information – do not exclude information about servants unless
you are absolutely sure that you will never want to treat
‘households with servants’ as a unit of analysis, because if you
have not entered that information, then it will not be there to
query later on.

However you should not dismiss the Method-oriented model out of
hand when considering the approach to your database design. If
you know your source(s) very well in advance, and you have
definite pre-determined research needs, and you know you will not
be attempting to recover all the information from the source, and
you know in advance exactly how you will treat your data and what
questions you will ask of it – if all this is true, you can use
the Method-oriented approach. Alternatively, if you are creating
a database which is not actually for historical research,
but is designed to be a resource with pre-defined functionality
and a limited set of tools that a user can use,[2] then a Method-oriented design is also appropriate.

[1] Leaving this kind of ‘normalisation’
until later in the project is beneficial as it allows you to
make decisions about the meaning of data until you have the
full body of data to act as context.

[2] Such as an online database with fixed
search and retrieval functionality, for example Old Bailey
Online (http://www.oldbaileyonline.org/,
accessed 23/30/2011).