Designing Databases for Historical Research

E. Entity relationship modelling

E3. Conclusion

The process of Entity Relationship Modelling (ERM) is
difficult, and rapidly becomes more difficult if you are
blessed with a number of different kinds of sources, each of
which contains rich information about a variety of subjects. If
you are using multiple sources, it is a good idea to avoid
creating entities that are source specific: for example, if you
are using census returns and taxation lists, both of which
contain information about people, do not create two tables for
people (one containing the information from one source, the other
from the second). Stick to the abstract logic of the information
– what is important to your research is people, so
accommodate all of the information about people in the same
place. Not only does this make sense from the point of view of
logic, but it will also make it much easier to find data about
specific individuals later on (either manually or via queries):
looking for a person is easier to do if everyone is located in
one table rather than several.[1]

No Entity Relational Diagram (ERD) will ever be perfect, as with
so much else involved in database design it will be a matter of
compromise. The success of an ERD is something that can only be
determined in one way – by the database performing the tasks it
was intended and designed to do, and this is something that will
not become evident until after you have begun entering data and
using the database for analysis. This is why the creation of the
ERD is (or should be) swiftly followed by a period of intense
testing of the database ‘in action’, in order to quickly identify
where the design is impeding the database’s purpose (see
Section G).

[1] Ultimately of course this is a matter
of personal judgement: you may decide that your entity is not
‘people’, but is in fact two separate entities comprising
‘census return’ and ‘tax payer’, in which case you would be
able to argue for two separate tables. You would still face
the problem of having to look for individuals in more than
one table, however, should the need ever arise.