Metadata as an Analytical and Writing Tool

Metadata As An Analytical and Writing Tool​*If you would prefer to listen and look at slide decks, please click here. A roundtable I co-organized on digital tools for historians for the American Historical Association 2015 meeting is up on H-Digital-History.

DEVONthink

Most of us who work on the 20th century come back from the archives with more sources than we know what to do with. If we’re lucky, at least. If you’re like me, you have an external hard drive full of digital photos you took. What do we do with this abundance of sources? My premise for going into this is that in the struggle to transform sources into a written argument, knowing what you have is half the battle. Not only that, but I’d like to propose that the act and process of cataloguing your sources is an act of analysis and interpretation crucial to formulating an argument.

I went into cataloguing my sources as a complete novice. So if in a few minutes you’re reading and thinking that what I’m describing is too complex or too difficult, just remember that I also started from zero. I found these three posts (1, 2, 3) by Rachel Leow especially useful for starting off.

A few years ago I learned about metadata and I was transfixed. Metadata is data about data. In its purest form, what I did was “tag” my sources with key terms, which I could then call up later. This allowed me to group my sources in a flexible and dynamic way. In other words, instead of putting a source in either a folder about teachers salaries (salaires in French) or Parakou (a town in Benin) when I searched for either, the file came up. You can also see here that I’ve put a box around text that I think is particularly important in this document.

The second layer of processing that I did was to OCR my sources. OCR stands for optical character recognition, which rendered all of my sources text searchable. I will return to this later.

The program I used to do all of this was DEVONthink Pro Office, the pro version means it has the OCR software, ABBYY Finereader bundled with it. I’m not trying to advertise for this particular product, though I myself did find it very effective. There are other products out there that help you create a database, and if you are a Macintosh user you can actually tag your photos with metadata without any program; it’s built into the Mac architecture.

With DEVONthink, you first need to import your documents and OCR them as they are imported. I started with the ones I knew I would need to write my first chapter. This takes quite a while so I had it run at night while I was asleep. Then I started to tag them in batches. I used the Pomodoro technique of 25 minute work segments with 5 minute breaks so I could remain focused. At first I found myself tagging based on key terms in the documents, or even how the documents were filed in the archives themselves. This is a good start. However, the second level of tagging was crucial: this time I went through and tagged them with more concepts and not necessarily words that were in the documents. For example, as you can see here I had 572 pages of documents that I tagged as “Réforme de l’enseignement” or educational reforms. I soon realized that this term was too broad, so I had to scrutinize these documents further and add other tags to make these documents more useful for me.

Here you can see that the first tag I did was “Réforme de l’enseignement,” but I further refined it by labeling it as the revolution (a Marxist revolution in Dahomey/Benin), takeover by the state, and two more French tags, which translate to the financial situation of Catholic schools and funding subsidies. This enables me to distinguish between a financial aspect of reform from calls for reform that were more about curriculum or cultural orientation.

The third level of tagging I did was as I was starting to write. I sometimes found that the terms I used were too broad. For example, I had a tag that was “educational reforms.” This phrase, in French, was used in so many of my documents. So I dutifully tagged them as such. But when I looked closer at the documents I realized that the term was too broad to be of use, because some documents with this in the subject line talked about the budget, others talked about teacher recruitment, and still others about the curriculum. I had to subdivide this category in order to make it useful for my writing process.

In other words, sometimes once I was writing I had to tweak my metadata and further refine it. However, in general my tags helped me figure out how to structure my chapters and my argument. They helped me to see trends and themes in my sources, and, since I used over 20 archives, to connect sources that were on the same issue or meeting but were in different archives. The OCR function also allowed me to corroborate information from one source with information that may have been from a completely different archive. On several occasions, someone signed a letter without any information about what their position was. A quick search often yielded the information I needed, whether the person was a minister or other official. This saved me so much time and it was really a result of my computer working to OCR documents while I was asleep, which seems like a pretty good deal. OCR also allowed me to search for things that I may have overlooked while tagging, whether it was a reference to a particular person or place, or a school.

Here you can see that I searched for “Senghor” (the first president of Senegal) in the search field, and you can see the beginning of the list of documents and tags that contain Senghor in them. One sample document is below the list, so you can see that DEVONthink highlights the word you searched in the document itself.

As a historian of empire and decolonization, I have read countless books and had countless conversations about trying to “read against the archival grain,” “read between the lines,” or somehow infer the experience of, in my case, Africans, from reading documents written by French officials. There is of course no way to entirely decolonize the archive. However, I’d like to suggest that using metadata and creating your own archival database through a program like DEVONthink can be an epistemic move in this direction. We may not be fully decolonizing it, but we are at least transgressing it. In creating a database, you are in essence creating your own archive, one where you are not bound by the categories, organizational system, terminology, or taxonomy of the archives of provenance. This is incredibly liberating. Especially for those of us who work in languages that are not our native ones, it can be hard enough to grasp the terminologies used to describe certain things we’re looking for in the archives, and at times it can be counterintuitive what something is filed under. I think that searching the “right” terms, archivally, can then later obscure our own interpretive act of thinking what the “right” term is for us and our work, the different or new way of thinking about our evidence. In the example I gave earlier, sometimes “reform” was a stand-in for budgetary reforms, which in turn was a euphemism for budgetary cuts. My chapter outline and argument starts to look very different if I think about the retrenchment from the state in public services like schools, versus thinking about some hazy concept of “reforms.”

People dread writing for a number of reasons, but I think one of them is that we dread it because we aren’t ready for it. Our sources are scattered and not organized. Creating the DEVONthink database was a sort of pre-writing for me. Some sources I read very carefully and annotated beforehand. Others I merely skimmed. It depended upon how much time I had and how important I thought the source was. But either way once I started to write, I didn’t have to interrupt the flow of writing by searching for sources, they were already all there and easily searchable.

Above is an example of a source that I annotated with a comment and a box around certain text.

DEVONthink also structured my writing in a very important way. I actually looked through my tags to see what terms my sources naturally clustered into. These clusters of terms, and the relationships between them, helped me determine what the subsections of my chapters would be. There were some topics that I thought I had more on, but when I looked at my tags, I realized I only had, for example, 13 tags for that term, which meant that could not be a standalone section but maybe would get folded into something bigger. And other terms I had tagged crept up to higher numbers than I realized, which meant that I could make those a more substantial pillar of my structure than I might have otherwise. So, once I had clusters of terms, I set out to make sections in Scrivener, a writing program. Each section in Scrivener, in other words, was based on a term or set of terms that I had tagged in my documents. Then, to write, I started by systematically going through every document I had tagged with the corresponding term.

Here in DEVONthink you can see that I have 77 tags for “Arabic as language of instruction.” I knew then that I could have a subsection of my chapter about this.

Here is the subsection of my chapter in Scrivener with the same title. (See the next section on Scrivener for more).

The nice thing about this, in addition to having all of my sources organized, was that I could feel a sense of progress. In this section for instance, I knew I had 77 tags. I knew then that when I got to tag 35 or so, I was almost halfway done. It is this sense of progress, and breaking things down to doable sizes, that made my dissertation seem so much less daunting. Of course, creating the database can feel daunting, but once you front-load your work, it is a much more efficient process. Also, you can think of building this database as a long-term investment in your academic future. You could perhaps get by improvising and going on memory of your time in the archives for writing the dissertation, but what about revisions into a book manuscript years from now? It seems to me that having my DEVONthink database is a much more sustainable and accessible way of organizing my sources in the long term.