January 21, 2017

The Five Lessons No One's Yet Written (but need writing)

Adam Crymble

We’ve published 57 peer-reviewed tutorials since we launched in 2012, and we’re proud of our pioneering role within digital humanities. These lessons are on topics as wide-ranging as introductory Python programming, Web Scraping, and Regular Expressions. Many of our lessons involve learning how to extract and structure information from other people’s websites and databases. This has been an unfortunate need because of decisions in digitisation made in the first decade of the twenty-first century by organisations that assumed we’d want to read digital texts like we read them in the library: one word at a time.

But gathering data isn’t research in its own right. We need analysis. And that’s where we believe the The Programming Historian needs to go next. We’re looking to move beyond the gathering stage, because you know how to get the data (thanks to our authors), and you’ve cleaned it to a brillian shine (thanks again to our authors!). But what do you do with it next? How do you perform the types of analyses that lead to publishable historical research articles and monographs? How do you do digital research?

So here is our call to action. We’re looking for analysis-focused tutorials that teach tangible discipline-specific data analysis skills: from lingustics to geography to network science and beyond. We’re happy to hear from any prospective authors, but to get the ball rolling, here are Five lessons we really think need to be written. Let us know if you’re the person for the job:

1) What can you conclude from topic models?

We’ve got a great lesson on how to conduct a topic model using MALLET. It’s been extraordinarily popular over the years. But we’re still not seeing enough historians (and humanists) actually publishing topic-model-based research results. If you’ve done so, please write us a tutorial on how others can do so too. This is a great opportunity to share the HOW of your article (all the bits the peer reviewers told you to take out so you could focus on the WHAT).

2) How do you conduct a stylometric analysis (well)?

Stylometry, the process of computationally attributing (probable) authorship to an anonymous text, has grown in popularity in recent years, even outing J.K. Rowling as ‘Robert Galbraith’ in 2013. But how do you DO it? And what are the pitfalls you need to beware of? Given the vast amount of machine-readable text out there, we think it’s time stylometry came into the mainstream of historical research.

3) How do you conduct spatial clustering of geographic data?

We’ve got a great set of introductory mapping lessons, and while they are great for teaching how to make nice visualisations, we’ve not yet branched into more advanced analysis skills. One of the most useful is the application of clustering algorithms, which identify logical groups of individual points in geographic space. Useful for forming conclusions on anything from trade to migration. But like with all analyses, it’s a space (no pun intended) fraught with pitfalls for the uninitiated. We’re looking for a great introduction that highlights both the strengths and the challenges of this form of analysis.

4) When do you know your network analysis is meaningful?

Ok, so we’ve built a great network diagram. How do we move to the next step and form meaningful conclusions? This is about starting with a graph and shifting into analysis mode. If you can help our readers take that step, we want to hear from you.

5) TF-IDF to Historical Research

Let’s talk about meaningful words. Term Frequency - Inverse Document Frequency is a well known means of identifying words that appear more often than we might expect in a given document. It’s one of the ways we know what a document is about. Let’s take this to the next step and teach readers how this fairly simple statistic about meaningful words can turn into meaningful research outputs. If you’ve published on historical lingustics (as in #1 above), we’d love to hear from you on the HOW TO of your wonderful paper.

If you are interested in taking up our challenge, please get in touch with one of our editors. We’d be happy to talk it through with you.

About the author

Adam Crymble is a senior lecturer of digital history at the University of
Hertfordshire.

The Programming Historian (ISSN: 2397-2068) is released under a CC-BY license.