donderdag 13 augustus 2015

Biography of the future? Digital Humanities and a hypothetical biography of John de Witt (1625-1672)

Introduction[1]

As any humanist
scholar of the twenty-first century, the modern biographer should account for
what to do, or not to do, with the advances of the so-called digital
humanities. A wide variety of biographical sources, primary sources like
correspondence and secondary sources like biographical dictionaries, have been
brought online over the past decades and will increasingly be consulted by
future biographers. The digital turn does more, however, than only make
biographers consult sources from behind their own computer rather than in a
library or archive. The digitized sources can be analyzed in new, more advanced
ways, visualizations of material help to see patterns, biographies can be
presented in different ways, et cetera. The
question remains if this digital turn, this increasing availability of
‘biographical data’ and new ways to consult them, really changes the
biographies of the future. This blog provides a critical reflection on the possibilities of digital humanities
technologies for biographical research. I will take the life of grand pensionary John de Witt, the highest official of
the Republic of the Netherlands (1653-1672), as an example of how a biographer
could use digital humanities technology for biographical research, and to
illustrate its potential and shortcomings. The choice for John de Witt springs
forth from personal interest, the availability of plenty of primary sources to
consult and the fact that he has been the subject of several larger biographies
already, both in and outside the Netherlands.

I John de Witt: a life of diplomacy and writing

John de Witt is
one of the most prominent figures in Dutch history. As grand pensionary of the
province of Holland, the most powerful province of the Dutch Republic, he was
considered to be the leader of the Republic and treated with all honors by
foreign rulers. His untimely death in 1672, murdered and ripped apart by an
angry mob, has contributed to the fame of his legacy. To date there is no
conclusive evidence to point the blame to any particular person(s) for this
murder, though his political enemy the prince of Orange, William III (the later king of England), is often mentioned as being (partly) responsible. Several biographies on John de Witt have appeared, in English and in Dutch.[2]

Statue of John de Witt in the Hague

De Witt's archival legacy is immense. In his role as a
state official alone, eight meters of letters are preserved in the Dutch
National Archive, bundled in twenty volumes, written in official circuitous
officialese jargon. [3]He was not only a politician, but also a mathematician who is still considered
to be a founding father of modern life insurance mathematics.

To
write a biography on De Witt is no small feat. Nineteenth century politician J.
R. Thorbecke noted that ‘To give us a life of De Witt worthy of the man is to
assure oneself a place among historians of all time.’[4] Rowen
needed almost 1000 pages to capture De Witt’s life in 1978. Panhuysen used
slightly more than 500 pages to describe the life of De Witt in relation to
that of his brother Cornelis in 2005. Recently, Prud’Homme van Reine needed
more than 200 pages to describe the murder on De Witt and his brother alone. Rowen especially, has consulted a tremendous variety of sources to compose his work.
Necessarily however, the biographer of De Witt, and of many other noteworthy
persons in history, has to be selective in the topics to address and the
sources to consult. One person can only ‘close-read’ a limited number of
sources, especially in the twentyfirst century where there is no patience for
academic output that takes longer than projects of four or five years. Digital
humanities technology allows biographers to combine close-reading with ‘distant-reading’ in
which larger texts are analyzed by the computer to facilitate finding patterns,
test hypotheses and find leads to further research. II Analyzing the texts

Computer
software is very good at reading and interpreting text, as long as it is modern
text and written in a software interpretable (digitized) format. The first
problem we encounter when we want to use digital humanities technology for a
biography on De Witt, is that the correspondence of De Witt is not digitized
yet, and even if it was, the OCR (Optical Character Recognition) that would
have to translate the handwriting to computer readable text might not deliver
great results. Computers
can be trained to recognize handwritings to improve performance, with the help
of the crowd, but hand made transcriptions definitely have to be preferred.[5] Let us assume however, that we have all eight meters of De Witt’s correspondence in a computer readable format of decent quality. The first thing we would want to know is all about the texts themselves to tell us a bit more about the man we are writing about. Questions we could ask with the help of digital tools, but could not ask before with great difficulty are: How long were De Witt’s letters? Did he use many words per sentence and many sentences per letter compared to his contemporaries? (this could tell us something about his working ethos and personality) How does this change over time and why? By lack of actually having the correspondence of De Witt available in a digitized format, we used a transcription of a famous political text by De Witt of 1654, his deduction, as an illustration, in which he defends himself against serious allegations after conceding to the English lord protector Oliver Cromwell not to appoint a member of the House of Orange to the highest state offices. [6] By using simple and free online tools like Voyant Tools and WordCounter, we find out that De Witt used 34.456 words in this political text and that there are 5185 unique words in it. He uses 749 sentences with an average of 46 words per sentence. The most frequently used noun is ‘provincien’ (provinces). We can also see he uses that word frequently throughout the entire corpus and not just in one particular section, showing the importance of the relation between the different provinces of the Republic that De Witt addresses. A word like ‘Brabant’, the province, is only used in a very restricted part of the text. The name Orange (Oraingne), from the political adversaries of De Witt, is used often in the beginning of the text, as well as slightly after the middle of the text and in the very end.

Voyant Tools as a means to analyze John de Witt's texts

Another
word De Witt uses often is ‘God’ (in several spelling variations) which we find
37 times. If we would want to write a
paragraph in our biography about how religious De Witt was in his thinking it
would be a valuable exercise to compare the relative occurrence of ‘God’ in
this political text to the mentioning of God in texts from other politicians of
his time. This is necessary to contextualize De Witt and see how he compares to
similar individuals. Once again, this would mean having to have access to full
text computer readable versions of as many political texts as possible. Finally,
when we are dealing with a wide variety of texts, we should definitely consider
the popular exercise in digital humanities of topic modelling. With topic
modelling the computer extracts topics from text, by looking at the words that
are mentioned together in statistically meaningful ways. In this way we could
globally assess what is discussed in which documents, without having to read
them fully.If we, for example, would want to know in which letters De Witt mentions the
strength of the Dutch fleet, topic modelling could point us the way.

III Patterns in Networks

One of the main
techniques for a biographer to contextualize his or her individual, is by
analyzing the networks he or she was part of.The analysis of correspondence with digital methods is a key component in
finding out who had contact with whom and for what reasons. John de Witt was a statesman who led the anti-Orange party, who corresponded
with foreign colleagues, ambassadors, scholars and international friends and
who had an extensive patronage network. It therefore is of high interest to map
his correspondence.His biographers have
also used his letters extensively to define his relationship to other people.

A
relatively simple exercise would be to analyze all the recipients of letters of
John de Witt and all the people who have sent him letters. When you would
visualize this with maps, graphs and figures (several off-the-shelf tools exist
to do this) you would get a picture that allows us to see patterns we could not
see before.[7]
Stanford University’s Mapping the Republicof Letters, provides good examples of such visualizations. If
we take the use case of the French philosopher Voltaire we can see graphs of
the nationality and social background of his correspondents. All is visualized on a map with the most
modern techniques. For one, we may deduce that Voltaire’s correspondence was
not as cosmopolitan as he might have wanted
to appear. An
initiative directly relevant for John de Witt is the Circulation of Knowledge and Learned Practices in the 17th-centuryDutch Republic.Even
though De Witt was primarily a statesman, he is likely to have been in contact
with the most prominent intellectuals of his time.[8] When
searching in the database we find five letters from the scientist (theoretical
physicist) and inventor Christiaan Huygens to De Witt, from 1658 to 1670. These
letters seem to give insight into quite a formal relationship between the two,
in which Huygens calls himself De Witt’s
‘humble servant’ as was the custom in that time. In
one of these letters (1 February 1664) we also find mentioning of ‘lord Brus,
brother-in-law of the lord of Somerdijck’.
Ideally, we want to know who this lord Brus is and what his relation to
De Witt might have been and the same goes for any other people who are
mentioned in the letters to and from De Witt. We also want to be able to match
this lord Brus to other mentions of him in the correspondence. A computer is able to search for other
instances of lord Brus, but it cannot learn (without great difficulty) if these
two lords are the same people, other than giving a negative match if two
different Brusses are mentioned in letters that are chronologically too far
apart.

John de Witt, by Adriaen Hanneman

Another problem is a difference in spelling of the names and a difference in the way people are called or call themselves. Christiaan Huygens signed the same letter with ‘Chr. Huygens van Zuylichem’, after the castle and estate his father had acquired in 1630. Similarly, a computer would have great difficulty to match a mention of a ‘master Vincent’ to the right person without knowing the context. The problem of recognizing and matching individuals automatically (NERD: named entity recognition and disambiguation) is common in projects that deal with biographical data. Statistics combined with domain expert knowledge are increasingly successfully applied however, to match names in separate documents.[9]IV ComparisonsWhen using computers you want to make use of the strength of their
calculating powers. For biographers it is particularly interesting to compare
his or her individual to similar people, to be able to frame their individual
in the context of their time.[10] To this end we
need structured biographical data on as many people as possible.
In the case of John de Witt there are several
groups of people we want to make comparisons to. De Witt started to study Law
at the University of Leiden at the age of sixteen. If we were able to draw up
schematic biographies of all students in the years he studied there we would
know how he compared to them in regard to age, social and geographical
background and later careers. De Witt was pensionary of Dordrecht shortly
before becoming grand pensionary of Holland. By analyzing the previous office
holders, and holders of the same office in other towns, we would get to know
how unique his appointment was in that time. The same goes for the office of
grand pensionary and practically any other network or group De Witt belonged
to. Such prosopopgraphical, structured analyses would allow us to make much stronger
substantiated claims about the person of De Witt, than usually is done in
biographies.
The
reason that such extensive comparisons
often do not find their way in biographies is because they are very time
consuming. With the help of digital methods however, such comparisons can be
made easier. In order to do so we would need much biographical data online
in a structured format. Digital
biographical data can either be ‘digitized’, for example from a scanned book,
or ‘digital born’.[11] We can make a
distinction between resources with primarily ‘generic’ biographical data, like
dates and places of birth, marriage dates and dates and places of death, and
resources with more narrative data with the description of a person’s life. If
we, for example, take the Wikipedia entry on John de Witt then we have a quite
extensive biography of his life (narrative data), accompanied by an info box
with structured data on his life. For a computer it is relatively easy to read and analyze the info boxes, but
difficult to interpret the main text. Wikipedia
is the biggest player in the field of online biographical data. Several studies
have shown that for factual knowledge Wikipedia can compete with authoritative
sources from professionals, especially in the field of data on people.[12]DBpedia publishes the data of Wikipedia’s info boxes
in linked data format. This
allows analyses over different datasets and therefore increases the potential
to compare John de Witt to different groups of people. Structured data on no
less than three quarters of a million individuals worldwide are available
through DBpedia.The advantages
of having biographical data online is recognized as well by editors of biographical
dictionaries.[13]
These dictionaries contain relatively short biographies on people who were
considered worth describing at the time of publication. Especially since the
nineteenth century the dictionaries were published in multiple volumes all over
Europe, containing descriptions of thousands of people.[14] These dictionaries form a rich source of information,
that no human could ever fully analyze with traditional methods. V Publishing the Findings

Biographers, as
historians in general, have a tendency to amass a vast amount of information on
the person that is the topic of their research. Already in 1946 the Dutch
biographer Jan Romein noted that biographies often were too long. The ideal
length of a biography would be 200 pages, in which the biographer made a
conscious selection of his source material.[15] When
looking at the most prominent biographies of John de Witt, and to the length of
modern biographies in general, we must conclude that Romein’s 200 page limit is
rarely adhered to. Even if this is not necessarily a problem, it does reflect an
above average struggle of biographers to select the right material and to keep
a book within a certain page limit. Digital publishing might be the answer to
both producing a manuscript of manageable size and being as ‘complete’ as
possible. The monograph is not the only way anymore, or maybe even the most
evident way, to publish your work.[16] With
more authoritative sources online a change of attitude towards digital sources
has taken place as well. Recently, for
example, academics have started to prefer the online version of the Oxford Dictionary of National Biography
rather than the printed version.[17] There
are many advantages to digital publishing. First of all, if we published our new biography of
John de Witt online, we could easily rectify mistakes If, for example, we would identify the previously mentioned lord Brus in the correspondence incorrectly, we could simply correct that and account for the change. Secondly,
digital biographies make it easier to let go of the traditional narrative of an
individual’s life from birth to death. We could divide our biography into
‘themes’ (e.g. the murder on John de Witt), provide hyperlinks to more detailed
information (e.g. the Dutch Republic and the navy) and even publish the source
material we used, or left out, for a particular topic (e.g. the letter from
Christiaan Huygens to John de Witt of the previous paragraph). The Ludwig Boltzmann Institut für Geschichte undTheorie der Biographieis a forerunner in publishing biographies this way. In
particular, they work on alternative modes of presentation for the lives of
Austrian writers Ernst Jandl and Karl Kraus. They developed a content
management system called Biographeme, “which breaks down the closed linear mode
of life narratives in favour of a modular form of biography, the individual
components of which can be combined and recombined according to interest or the
question asked.”[18] Thirdly,
the digital era offers unprecedented possibilities for researchers to also put
their raw material online. We could put transcriptions of letters of John de
Witt, systematically gathered information in a database and, when copyright
permits it, original material online as part of an online publication. This
would be a highly efficient way to facilitate further research and to allow
others to check our findings. It is also a good way to show that the tax money
invested in the research was well spent. It could or should be part of any data
management policies at academic institutions to facilitate storing these data
and making them available for further use. This
would of course, also mean that a biographer should not ‘claim’ his or her
subject as sometimes is the case, but let go of the subject once a biography is
published (or maybe even before that).[19] Finally,
there is the possibility of working together on a project if you put it online
before a printed publication. By bringing your material online you open up a
dialogue that holds the middle between writing and speaking.[20] In
our case, for example, we could simply put the question online if anyone knows
who lord Brus was. We could also provide our NERD results online and ask
visitors of the website to spot incorrect matches. This way we avoid too many
computer generated mistakes and get more input to refine our algorithms. VI ProblemsIn comparison to
previous biographies on John de Witt, this hypothetical biography would be based
on more resources, include quantifications on the topics John de Witt addressed
most frequently, say more about John de Witt as an individual compared to his
contemporaries, would include detailed and strongly visualized network analyses
and would be presented in a more dynamic way with room for all the material we
have gathered. Unfortunately, at the moment
this still remains largely hypothetical. Perhaps
the most fundamental problem which needs to be discussed is the one of representativeness.
It is self-evident, but cannot be stressed often enough, that the scope of
digital research is limited by the availability of computer readable digital
data. The results of any exercise with computational methods therefore should
be accompanied by a critical account of the completeness and biases of the
sources. It would be very nice if we had all correspondence of John
de Witt in a machine readable format, but that is not going to happen any time
soon. Even only digitizing his correspondence would cost a lot of time and
money.[21] In general. there
is a huge amount of material that is not digitized (yet) and remains out of the
scope of digital humanities research.[22]Another problem is the
relatively high ratio of mistakes and inconsistencies which one would have to
deal with when using digital tools on biographical data. The OCR quality of digitized texts alone can lead to
problems in the analysis, especially when looking for names.[23] In
the case of De Witt his seventeenth century handwriting poses another (high)
hurdle to take.
Finally,
computers may be great at making calculations, but are bad at interpreting
text. They can only work with the algorithms we feed them. If we ask them to
match names from different documents they are bound to make more mistakes than
a human would make. It is for example difficult for a computer to separate the
lord Wassenaar from the location Wassenaar. They would, however, do the work
much faster. A detailed documentation on how computer tools were used for our
research on John de Witt should be provided in order for other researchers to
check the analyses. Unfortunately, complex tools are more likely to provide
precise results, but are more difficult to comprehend on a more than basic
level for the digitally lay user.[24]

ConclusionsI discussed the potential of digital humanities tools for a
hypothetical biography on John de Witt. The hypothetical biography is based on more documents and provides detailed network
analyses, comparisons to contemporaries and visualizations. Some questions
which could not easily be answered before, like how religiously influenced De
Witt is in his writing compared to his contemporaries could even be answered.
The hypothetical biography would be presented in a dynamic and interactive
manner, providing possibilities for additions, dialogue and links to more
information.
Despite
the apparent opportunities for biographical research, there still is a long way
to go before this hypothetical biography of John de Witt could be written. First
of all, much more archival sources should be put online in a format that makes
it possible to analyze with digital methods.
Right now, we are still at the beginning of digitizing our cultural
heritage. Techniques to translate handwriting into computer readable text is
likewise still in the early stages of development.
Some
things however, can already be done for biographical research thanks to
digital humanities technologies. New forms of presenting biographies already
exist. In this chapter we have also shown some very basic examples of what can
be done with digitized texts relating to the life of John de Witt. Even though the analyses do not provide any conclusive evidence, they do inspire new
research questions and provide leads into what may be worthwhile investigating
further.