Methodology – Skulking in Holes and Cornershttps://jostwald.wordpress.com
Genteelly Observing the Enemy since 2011
Wed, 19 Dec 2018 07:14:25 +0000 en
hourly
1 http://wordpress.com/https://secure.gravatar.com/blavatar/5e2fb19ca449c021681ce9bb39814883?s=96&d=https%3A%2F%2Fs0.wp.com%2Fi%2Fbuttonw-com.pngMethodology – Skulking in Holes and Cornershttps://jostwald.wordpress.com
From historical source to historical datahttps://jostwald.wordpress.com/2018/12/03/from-historical-source-to-historical-data/
https://jostwald.wordpress.com/2018/12/03/from-historical-source-to-historical-data/#commentsTue, 04 Dec 2018 00:57:36 +0000http://jostwald.wordpress.com/?p=6911Where I offer a taste of just one of the low-hanging fruits acquired over my past five months of Python: The Sabbatical.

Digital history is slowly catching on, but, thus far, my impression is that it’s still limited to those with deep pockets – big, multi-year research projects with a web gateway and lots of institutional support, including access to computer scientist collaborators. Since I’m not in that kind of position, I’ve set my sights a bit lower, focusing on the low-hanging fruit that’s available to historians just starting out with python.

Yet much of this sweet, juicy, low-hanging fruit is, tantalizingly, still just out of reach. Undoubtedly you already know that one of the big impediments to digital history generally, and to historians playing with the Python programming language specifically, is the lack of historical sources in a structured digital format. We’ve got thousands of image PDFs, even OCRed ones, but it’s hard to extract meaningful information from them in any structured way. And if you want to clean that dirty OCR, or analyze the text in any kind of systematic way, you need it digitized, but in a structured format.

My most recent python project has been to create some python code that automates a task I’m sure many historians could use: parsing a big long document of textual notes/documents into a bunch of small ones. It took one work day to create it, without the assistance of my programming wife, so I know I’m making progress! Eventually I’ll clean the code up and put it on my GitHub account for all to use. But for now I’ll just explain the process and show the preliminary results. (For examples of how others have done this with Python, check out The Programming Historian, particularly this one.)

Parsing the Unparseable: Converting a semi-structured document into files

If you’re like me, you have lots of historical documents – most numerous are the thousands of letters, diary and journal entries from dozens of different authors. Each collection of documents is likely drawn from a specific publication or archival collection, which means they begin being all isolated in their little silos. If you’re lucky, they’re already in some type of text format – MS Word or Excel, a text file, what have you. And that’s great if you want just to search for text strings, or maybe even use regular expressions. But if you want more, if, say, you want to compare person A’s letters with person B’s letters over the same timespan, or compare what they said about topic X, or what they said on date Z, then you need to figure out a way to make them more easily compared, to quickly and easily find those few needles in the haystack.

The time-tested strategy for historians has been to physically split up all your documents into discrete components and keyword and organize those individual letters (or diary entries, or…). In the old days – which are still quite new for some historians – you’d use notecards. I’ve already documented my own research journey away from Word documents to digital tools (see Devonthink tag). I even created/modified a few Applescripts to automate this very problem in Devonthink in a rudimentary way: one, for example, can ‘explode’ (i.e. parse) a document by creating a new document for every paragraph in the starting document. Nice, but it can be better. Python to the rescue.

The problem: lots of text files of notes and transcriptions of letters, but not very granular, and therefore not easily compared, requiring lots of wading through dross, with the likelihood of getting distracted. This is particularly a problem is you’re searching for common terms or phrases that appear in lots of different letters. Wouldn’t it be nice if you could filter your search by date, or some other piece of metadata?

The solution: use Python code to parse the documents (say, individual letters, or entries for a specific day) into separate files, making it easy to hone in on the precise subject or period you’re searching for, as well as precise tagging and keywording.

Step 1:

For proof of concept, I started with a transcription of a campaign journal kindly provided me by Lawrence Smith, in a Word document. I’m sure you have dozens of similar files. He was faithful in his transcription, even to the extent of mimicking the layout of the information on the page with the use of tabs, spaces and returns. Great for format fidelity, but not great for easily extracting important information, particularly if you want, for example, June to be right next to 20th, instead of on the line below, separated by a bunch of officers’ names. (‘Maastricht’ and ‘London’ are actually a bit confusing, because I’m pretty sure the place names after the dates are that day’s passwords, at least that’s what I’ve seen in other campaign journals. That some of the entries explicitly list a camp location reinforces my speculation.) Of course people can argue about which information is ‘important,’ which is yet another reason why it’s best if you can do this yourself.

Aside: As you are examining the layout of the document to be parsed, you should also have one eye towards the future. In this case, that means swearing to yourself that: “I will never again take unstructured notes that will require lots of regex for parsing.” In other words, if you want to make your own notes usable by the computer and don’t already have a sophisticated database set up for data entry, use a consistent format scheme (across sources) that is easy to parse automatically. For example, judicious use of tabs and unique formatting:

Step 2:

Clean up the text, specifically: make the structure more standardized so different bits of info can be easily identified and extracted. For this document, that means making sure each first line only consists of the date and camp location (when available), that each entry is separated by two carriage returns, and adding a distinctive delimiter (in this case, two colons, ‘::’) between each folio – because you’ll ultimately have the top level of your structured data organized by folio, with entries multiple entries per folio (this is a one-to-many relationship, for those of you familiar with relational databases like Access). Cleaning the text can be easily done with regex, allowing you to cycle through and make the appropriate changes in minutes. Assuming you know your regular expressions, that is.

The result looks like this:

Note that this stage is not changing the content, i.e. it’s not ‘preprocessing’ the text, doing things like standardizing spelling, or expanding contractions, or what have you. Nor did I bother getting rid of extra spaces, etc. Those can be stripped with python as needed.

For this specific document, note as well that some of the formatting for the officers of the day is muddled (the use of curly brackets seems odd), which might equal loss of information. But if that info’s important, you should take care to figure out how to robustly record it at the transcription stage. If you’re relying on the kindness of others, ‘beggars can’t be choosers.’ But, if you’re lucky, you happen to have a scanned reproduction of a partial copy of this journal from another source, which tells you what information might be missing from the transcription:

You probably could do this standardizing within your Python code in Jupyter Notebook, but I find it easier to interact with regex in my text editor (BBEdit). Your mileage may vary.

Step 3:

Once you get the text in a standard format like the above, you read it into python and convert it into a structured data set. If you don’t know Python at all, the following details won’t make sense. So go read up on some Python! One of the big hurdles for the neophyte programmer, as I’ve discovered over and over, is to see how the different pieces fit together into a whole, so that’s what I’ll focus on here. In a nutshell, the code does the following, after you’ve cleaned up the structure of the original document in your text editor:

Read the file into memory as one big, long string.

Perform any other cleaning of the content you want.

Then you perform several passes to massage the string into a dictionary with a nested list for the values. There may be a better, more efficient way to do this in fewer lines, but my beginner code does it in three main steps:

Convert the document to a list, splitting each item at the ‘f. ‘ delimiter. Now you have a list with each folio as a separate item.

Always look at your results. For some reason, the first item of the resulting list is empty (it doesn’t seem to be an encoding error), so just delete that item from the list before moving on.

Now, read the resulting list items into a python dictionary, with the dictionary key the folio number, and all of the entries on the folio as the value of that folio. Use the ‘::’ as the delimiter here, with the following line of code, a ‘comprehension’, as they call it. Notice how the strip and split methods are chained together, performing multiple changes on the item object in that single bit of code:

Now you use a for loop to parse each value into separate list items, using the other delimiter of ‘\n\n’ (two returns) between entries, using the string of the value (since otherwise it’s a list item and the strip and split methods only work on strings). This gives you a dictionary with the folio as the dict key, and the value is now a nested list with each of the entries associated with its folio as a separate item, as you can see with folio 40’s four entries:

That’s pretty much it. Now you have a structure for your text. Congratulations, your text has become data, or data-ish at least. The resulting python dictionary allows you to search any folio and it will return a list of all the letters/entries on that folio. You can loop through all those entries and perform some function on/with them. So that’s a good thing to “pickle”, i.e. write it to a binary file, so that it can be easily read back as a python dictionary later on.

Once you have your data structured, and maybe add some more metadata to it, you can do all sorts of analysis with all of Python’s statistical, NLP, and visualization modules.

But if you are still straddling the Devonthink-Python divide, like I am, then you’ll also want to make these parsed bits available in Devonthink. Add a bit of code to write out each dictionary key-value pair to a separate file, and you end up with several hundreds of files:

Each file will have only the content for that specific entry, making it easy to precisely target your search and keywording. The last thing you want to do is cycle through several dozen hits in a long document for that one hit you’re actually looking for.

That’s it. Entry of May 8th, 1705 in its own file.

The beauty is that you can add more to the code – try extracting the dates and camps, change what information you want to include in the filename, etc. Depending on the structure of the data you’re using, you might need to nest dictionaries or lists several layers deep, as discussed in my AHA example. But that’s the basics. Pretty easy, once you figure it out, that is.

Even better: now you can run the same code, with a few minor tweaks, on all of those other collections of letters and campaign journals that you have, allowing you to combine Newhailes’ entries with Deane’s and Millner’s and Marlborough’s letters and… The world’s your oyster. But, like any oyster, it takes a little work opening that sucker. Not that I like oysters.

]]>https://jostwald.wordpress.com/2018/12/03/from-historical-source-to-historical-data/feed/2jostwaldNewhailes_sample1.pngEarly_formatting_ideas.pngNewhailes_sample.pngNewhailes_sample_BL_Add61404.pnglist_items.pngdictionary.pngdictionary_nested_list.pngNewhailes_finder_folder.pngNewhailes_sample_entry.pngWhere the historians are, 2017https://jostwald.wordpress.com/2018/09/18/where-the-historians-are-2017/
https://jostwald.wordpress.com/2018/09/18/where-the-historians-are-2017/#commentsWed, 19 Sep 2018 02:14:40 +0000http://jostwald.wordpress.com/?p=6861“Shaving the yak” is a phrase used to describe the process of programming. It alludes to the fact that you often have to take two, or more, steps backward in order to eventually move one step forward. You want a sweater, so first you need to get some yarn, but to do that you have to… and eventually you find yourself shaving a yak. The reason why you even consider shaving a yak is that, once you’ve shaved said yak, you now have lots of yarn, which allows you to make many sweaters. This colorful analogy has a surprising number of online images, and even an O’Reilly book. It’s a thing.

I have been doing a lot of digital yak-shaving over the past four months. Come to think of it, most of my blog posts consist of yak shaving.

So if you’re interested in learning to code with Python but not sure whether it’s worth it, or if you just want to read an overview of how I used Python and QGIS to create a map like this from a big Word document, then continue reading.

Taking Advantage of Sabbatical

On a meta level, I knew that if I were ever to make any sweaters with computer code, I would have to shave that particular yak this sabbatical. Multiple factors converged:

First, this year ‘off’ would be my one opportunity in the next seven years to delve into Python and to learn whatever else would set up my research and (digital history) teaching. Several years ago, I remember reading a digital historian’s blog post on the cool stuff he was doing with some advanced digital tool, and I thought, “Yeah, but who has time to do all that?”
[Thumbs pointing at self]: “This guy.”
Admittedly, I have Marlborough’s Big Book of Battles (working title) to finish, but some of the coding I learn can help with that. Ultimately, it’s about priorities, and, honestly, the world will not end if it’s denied one more book on Marlborough within the next year. And the book will be a lot better with the Python tools I’m learning.

Second, and fortuitously, beginner-friendly Python has arrived, literally within the past few years. Thanks to Anaconda, Jupyter notebooks, oodles of websites (including the programminghistorian.org), dozens of books, and dozens of YouTube tutorials from recent PyCon, PyData, PyLondon, PyBerlin… conferences, there is a critical mass, and you can learn much of it on your own, even if you don’t take any of the available online courses.
Don’t get me wrong: learning Python has still been challenging – the most frustrating part is getting everything set up, whether it’s installing the right Python version in the right directory (tip: start with a clean install of Python 3 using Anaconda), installing third-party Python libraries visible to your Anaconda installation (tip: do it from the command line and activate the conda environment first), or getting your data into a usable format for analysis (see below). It also requires a learning process to move from the basic tasks you can perform with a Jupyter tutorial downloaded from GitHub (or from a website or book), to more realistic, and therefore more complicated, customized tasks that you really want to perform with your data, right now. I wouldn’t have been able to do much of what I wanted in Python, certainly not within a few months of beginning to learn it, without the help of my programming wife and a Python-literate colleague in Eastern’s English Department (Ben Pauley). So there’s definitely a learning curve.

Python has become the go-to language for text cleaning, natural language processing, visualizations (along with R), and, increasingly, basic machine learning. And did I mention it also has mapping libraries like geopandas? Python will do practically any academic gruntwork a humanist can imagine computers would do, and then some. And I say that as a humanist with a bit of an imagination.

Having taught my Intro to Digital History course once already, I learned that online tools are fleeting and fragile, and will only do a third of what you want them to do. You can usually find small, niche programs that give you the ability to do another quarter of what you want: things like Vard2 and GATE (cleaning OCRed text) and GRAMPS (genealogy) and OutWit Hub (web scraping) and Stanford NER (named entity recognition of text) and Edinburgh Geoparser (NER/mapping). They can be very useful, but they can also become outdated (especially the free online ones), and you may well have as much trouble installing them on your local machine as something like Python. So given the fact that Python will do almost everything just about every other dedicated software package will do (again, I’m talking data science and academic tasks here, and it will require programming on your part), and since everything in Python is free, and since there are so many Python libraries that will perform most of these tasks, why struggle installing a dozen different programs and learn each of their quirks, just to do one specific thing in each? One program to scrape data from a website. Another program for cleaning data. One for doing quantitative analysis. Another for qualitative analysis or natural language processing. Another for visualizing your results in a fancy chart (Excel does not count). Another for creating a network graph of your data. Yet another program to map your data. Still another to create an interactive visualization that you can explore… Python can do them all. Don’t get me wrong: sometimes it will actually be easier to just install a specialized program. But you’ll only know after you’ve tried to recreate part of it in Python.
So, after playing around with most of the other programs, I decided to focus my struggle on installing and learning one tool (Python), and then use its hundreds of libraries to help me do any number of analyses. Python’s a pretty big yak, but there’s a lot of multicolored yarn on that beast. And, you can always rely on your text editors, Excel, and other niche programs to fill in any gaps, until you learn more about Python and its libraries.

Thus: means + opportunity + motive = learn Python. I already have plans for a few dozen projects for Python to automate, everything from simple time-savers like calendar look-up (“Last Tuesday we marched…” – what date was last Tuesday?) to analyzing my prose to analyzing primary (or secondary) sources to semi-automating the creation of a book index, to the topic of this post: a map of US History departments. And that list doesn’t even include various service tasks I’ll need to perform once I become department chair next year.

But back to maps. Last month I read a recent article by John D. Hosler, “Pre-Modern Military History in American Doctoral Programs: Figures and Implications” in the April 2018 issue of Journal of Military History. In it, he argues that there is a dearth of US doctoral programs that teach medieval military history. I was curious about this, and I was looking for another dataset to play around with in Python. Since I’m still in the advanced beginner stage of Python, I used this cartographical sweater project to force myself to learn some basics of Python text parsing. Sabbatical allows you to give that yak a bit closer of a shave than you could during a regular school year. (Ok, that metaphor is beginning to sound a little weird now…)

The steps I took illustrate the frequent walking-backwards-in-order-to-move-forward process that is shaving the figurative yak:

After reading Hosler’s piece, I thought to myself, “Hmm, that sounds like early modern European military history, but EMEMH is probably worse off.”

Then thought to myself, “But I don’t want to replicate his work for EMEMH. But maybe I’ll just map his schools! That should be straightforward enough!” Famous last words.

Asked John Hosler for his data. He kindly obliged.

Realized his data isn’t in a very computer-friendly format. The most complete dataset was published in the original article:
It looks fine on the printed page, but it’s problematic for reuse. It’s a textual table, so it’s not easy to convert to csv; multiple schools are in each cell rather than a separate row for each school; the important information – which school has a medievalist – is indicated by formatting (bold), rather than as a separate column with a yes/no value, and Excel has a tough time finding sorting/filtering on bold formatting; finally, there are abbreviations which are quite understandable to humans, such as “IU-Bloomington”, but these are not their standard names, which means it wouldn’t be easy to match them to another list of schools algorithmically.
Lesson learned? Historians (practically all of us) are horrible at preserving our data, and few of us have been trained in how to present data that can be easily digested by computers. Nowadays, people refer to that as “tidy data” (pdf link here).

So I decided that, rather than enter it all by hand, the best way to find a list of all the US history programs would be to check the AHA website, the AHA being the flagship organization for American historians. I discovered that the data isn’t, in fact, available online (being migrated). I emailed the person in charge and asked for a download of the dataset. Was told that the data isn’t in a very usable state now and can’t really be extracted from the database (but they’d think about maybe making it downloadable in the future). I was, however, generously given the next best thing – a Word document of the AHA’s 2017 department directory, which includes all sorts of self-reported info on each department that pays to have its info included, about 600 departments in total.
Lesson relearned for the umpteenth time? Historians don’t think in ‘dataset’ terms, and we’re not very good at constructing and managing them. But we do like to share, which is a start.

Looking over the Word file, I realized that the 361,000-word text document wasn’t particularly usable in its current form, at least for much more than looking up a person or school. But it does have lots of data, and most of that data is semi-structured:
Enter Python!

So I spent some time learning the Python that would let me import that structured data into a Python data object. I figured out that I need a Python dictionary (school as dictionary key and info on school as dictionary value), and that I could use regex to parse the data, though, again, bold fonts don’t help much.

First step was to convert it from Word to plain text, which could be easily imported into Python. But before that, I cheated and used my knowledge of Word’s advanced find-replace based off formatting (school names were in a larger font size) to add a delimiter character at the beginning of each school name – that would make it easier to separate out the schools once in Python.

Once I started trying to organize the data in Python, I realized that I actually needed a nested dictionary in Python. So then I spent time (ok, my wife’s time) figuring out how to import items at different layers into nested dictionaries, with nested lists within the nested dictionary values. This is where it got complicated, but we figured it out after several hours over a couple of days. Then I could expand from the simplest test case to the several variables I was interested in, using regex as needed. Part of the code looks like this:

I spent time learning how to clean the data further – schools with multiple values in a field, converting lists to numbers, etc. Real data is real messy.

I then read the nested dictionary into Python’s pandas library. Then I did more cleaning.

But if I want to map these, I’ll need their coordinates. So I geocoded the list of schools to get their respective latitude and longitude coordinates. This can be done in Python, but I used Google Sheets’ ezGeocoder because I was more familiar with it.

Then I combined those coordinates with the other pandas data. I haven’t perfected this concatenation yet in Python, but even with some extraneous rows to clean, it was still faster than doing it by hand for 600 schools.

After it was pretty clean, I exported the resulting pandas table to Excel, to finish off the cleaning (haven’t yet figured out how to convert a list in a pandas cell into a numeric).

Saving it to csv, I then imported the data into QGIS – haven’t had time yet to explore Python’s geopandas library. So I mapped some of the data in QGIS.

In the process, I realized that I needed to add another field to the dataset, which took me back to Python, to add another field to parse, and then clean, and then export it all again to Excel, and then to QGIS. Note that this process is easy with Jupyter notebook’s pipeline – all you have to do is add the extra bit of code and then run the tweaked code again on the original dataset. It will redo all the importing and cleaning and exporting automatically – just make sure you’re not overwriting any cleaning you did in Excel after pandas! The revision process would be even easier if you eliminated steps 12-15 above, by automating the geocoding and final cleaning procedure in Python and mapping it in geopandas. (Though QGIS will give you more customizability.)

So that’s how I got to the map shown above. In reality, it took more than just a few weeks to make the map (I was doing other things, you know). More to the point, I’d already spent a few decades looking at maps and being a “power user” of computers, spent a few semesters taking cartography and statistics courses in grad school, and spent weeks learning QGIS as well.

Is learning Python a lot of work? Yes, depending on what you want it to do.

Is it worth it? Well, I guess that depends on how bad you want to map data. And analyze data. And chart data. And get new data to map and analyze and chart… In the past, historians could get away with prose alone, and maybe the occasional hand-drawn map. But, if I put on my prediction hat, I think more and more historians will not only see how powerful these tools can be, but will also realize that their arguments will increasingly be tested by historians who bring more data to the party, and who use digital tools that allow them to be more consistent, and look at more data than is possible with eye and hand alone. And, as more primary sources become available in digital form, and as more unstructured text becomes readable by machines, there will be less of an excuse not to use digital tools. I know, I know, all this has been predicted before, back in the 60s. But this time it may really be different. Natural language processing, the ability to extract information from masses of digitized text, might well be the difference.

So if you are still code-curious after all the above, I present you with general thoughts I’ve learned over the past four months, many drawn from other guides on Python. And then, more maps.

The easy Python code is readily available in books, online, and in Jupyter notebooks you can freely download. If you want to do basic stuff, it’s not that hard. Unfortunately, you’ll probably not be particularly interested in the basic stuff. But you should be patient and start with the baby steps. I wasn’t patient at first, but I ended up having to take those baby steps all the same. Baby steps include understanding the different object types (like strings, integers, lists, dictionaries), understanding the basic Python syntax (such as common abbreviations and what they stand for), and some computer concepts like methods and arguments.

There’s a huge difference between thinking you understand what somebody’s finished code is doing, and creating comparable code yourself from scratch.

Code will, almost always, tell you if it doesn’t work. (Except for regex, which can “fail silently.”) And when it fails, it’s probably doing exactly what you told it to do, not necessarily what you wanted it to do. As a result, programmers talk about the process of “failing to success” – i.e., each error message brings you closer to code that works. It’s humbling and sometimes frustrating, but at least the computer tells you if you did something wrong. Doing things by hand, even with Excel, rarely gives us that safety net.

Consult different resources. There are many books, websites, blogs and videos that teach specific Python features and libraries. But some are better than others, and some will discuss techniques you are more likely to use. So poke around. Once you start to feel a little more comfortable with the basics, then look at the online documentation for Python and its various libraries. Those pages will let you know what exactly you can do with each method, what ‘arguments’ and parameters are available.

If you really want to understand code and modify it to your own purposes, you need to do it the hard way. Which means learn by typing the code out yourself. That’s the only way it will stick.

Like everything else, it gets easier the more you do it. They call it a learning curve for a reason, because the slope gets less steep at a certain point. I think I’m starting to see that flattening curve ahead of me. But that required me going back and rereading chunks of some of the introductory chapters more than once, when I’d get stuck on an intermediate-level task.

You will undoubtedly get frustrated when you find some sample code that probably does what you want, but it starts with a different type of object. This is why it’s important that you understand the basics of the language, e.g. the different types of objects, so that you can modify the sample code to fit your specs. So spend some time early on learning how to read different types of files (a text file, a Word doc, a PDF, a csv file…) into Python as a string or list or dictionary or what-have-you. Even better, realize that you’ll ideally read most of your data in as either a csv or txt file, excluding images and sound, of course. So figure out an easy way to convert all your other files to one of those two formats. Tragically, the two most commonly-used file formats in the humanities, Word documents and PDF files, are the worst when it comes to readability by other software. So do what I did with the AHA Word doc – convert it to txt. Ditto for Excel files – convert to csv.

If you deal with text files, keep an eye out for encoding issues. If you see weird gobbledygook in your text, strange characters with slashes, upside-down question marks and what-not, that probably means you have an encoding issue. Encoding is it’s own universe, but the best advice is to always save your files (csv, txt) in your text editor (NOT Word) as plain UTF-8 encoding. Do not use BOM, do not use UTF-16 or Western, do not use anything else. And don’t assume that just because the file was at one point in UTF-8, it’s still in UTF-8, particularly if you’re switching between Excel and a text file. Saving the original file in UTF-8 is the easiest way to get usable data into Python, including text in non-Latin alphabets.

When you get stuck trying to do something in Python, you can rotate your Python projects: get stuck on one project, move to another until you get stuck on that, and then on to another. You’ll probably finish at least a couple of those, and you’ll likely learn things that can then be applied back to a project you were previously stuck on. And there’s always a Python community (including on Stack Overflow) for the harder stuff. Hopefully I’ll post on my GitHub site the Jupyter notebooks as I complete them.

Python allows you to rerun your code on the original data every time you make a change to either the code or the data. You can ‘show your work’ – versus the response that I’ve received from two EMEM historians over the past couple decades when requesting the data their summary tables were based on: “Don’t have it anymore.” Certainly not at the level of a certain book on American gun culture, but still, we should do better.
That’s the gold standard of replicability and transparency: a) you, or others, can rerun your old code on your old data and get the same results you did previously; b) you, or others, can run your old code on new data and get results compatible with your old methodology, or maybe discover that you need to redo a) above; c) you, or others, can run new code on your old data and update your results; and d) you, or others, can run new code on new data, for new results. Consistently. Easily.

You can reuse chunks of your code in other projects, without having to reinvent the wheel. That includes sharing it with other people. At first, the hardest part of repurposing someone else’s code will be figuring out how to use your data with their code. Once you figure that out, things get easier.

Some of the Python libraries are built off each other, using similar syntax. This is particularly true for pandas, a Python equivalent to Excel, or maybe even Access. So, at some point, be sure to learn the foundational packages like matplotlib and NLTK and pandas.

Newer Python libraries in the same domain tend make it easier to do the same things, and add additional features. For example, matplotlib (graphing) and NLTK (text processing) are old libraries that are ultra-customizable, and therefore can have some complicated syntax. But there are new libraries, like seaborn or textblob/spaCy, that make it easier for you to do the same thing with simpler syntax.

But if you find yourself with a deadline, or feel like you’re spending more time on a Python project than is necessary, don’t be afraid to revert to a better-known method or tool. But be sure to do a quick online search to make sure someone hasn’t already invented the wheel for you. In the case of Hosler’s data in Table 1, it was easiest to just manually enter those 51 schools with a faculty member in medieval history. Cleaning data can be time-consuming.

The more you explore Python’s features, the more you’ll start seeing connections to other things, to other potential projects. And the more you’ll understand how packaged software does the things it does, because now you can do it too, but in Python.

So How About Some More Maps?

We’ll start with a boring map with one dot per History department. Using QGIS 2.18’s Print Composer, you can use the same layout and just change the details, giving each of the 602 schools an equally-sized point symbol.

Since the AHA directory had a ‘Degrees offered’ category that lists the type of degrees offered (BA, MA, PhD…), I used Python to extract each of those degrees into its own field and mapped them as well, using a rule-based style to only display History departments that offer a PhD in History, about 150 programs.

Or, we could go back to the initial starting question – where are all of those medieval (military) historians teaching? Given my description of Hosler’s formatted data above, I had to add his information (school and degree offered in medieval) to the AHA dataset by hand. Fortunately, it wasn’t that big of a chore – only 52 schools, 39 of which offer Ph.D.s. Mapping that data results in the following:

Interesting. Initially, I had guessed they’d be grouped more in the southern US, but that doesn’t seem to be the case. One could explore further, looking at percentages by region and the like, but it does appear that the ‘interest’ (if that is even relevant to faculty lines) in pre-modern is more pronounced in the ‘older’ part of the US.

Even More Possibilities

Overall, the Python data extraction was a bit messier than I’d like, e.g. a few schools were missed in my initial passes, and some schools didn’t include all their data. One or two major programs didn’t even have their info in the AHA directory. You can probably see one or two other anomalies in the maps above. All of which should serve as a reminder: it’s always better to start with a data dump from a database rather than extract from text, if you can. But sometimes you can’t, so you make do.

Given all the structured information that’s in the AHA directory, you could look at all sorts of things if you were so inclined. And if you took the time to double-check your extracted data. A few examples of questions you might ask of this expanded dataset:

Map by number of faculty, i.e. size of department.

The distribution of the full-time faculty by rank in 2017, i.e. the percent at each rank of assistant, associate, full…

Faculty areas of specialization. Are there generally-accepted names for these categories? Is there a pattern according geographical location, or to the school where each faculty got their PhD from, or the year their PhDs were granted? Do departments have more than one specialist in a particular area? Which? Where? What period?

Schools where the full-time faculty in each department got their PhDs from, as well as when they got their PhDs. Patterns? Maybe even make a network graph that would draw lines for a given school – where did all the faculty at school X get their PhDs from? Which schools are the main feeder schools (not that hard to guess), and have these changed over time? Do we see an overall change in schools based by year of PhD?

If you had some way of ranking each school (e.g. by Carnegie classification or US News & World Report ranking), you could combine your dataset with that information for further analysis.

Given Hosler’s question of where one would go for medieval (military) history, you could ask the same question of any area of specialization. It might even be useful to know whether medieval (military) history has more or fewer schools than other equivalent subspecialties. Maybe look at the listed specializations of specific faculty and see if their alma mater currently has a specialist in that area?

All sorts of possibilities in the data. And we haven’t even mentioned combining this AHA data with other data – political proclivities of a state relative to its History faculty, and so on…

These would be interesting not so much for specific individuals, but to look at broader trends in the profession. The kinds of analysis the AHA occasionally does, but on a more micro scale.

First off, the locations of various combats (battles and sieges mostly) from 1494-1559, color-coded by war, with the Natural Earth topo layer as base map. It might be more useful to group the wars together into a smaller number of categories (make a calculated field). Or maybe make them small multiples by war. But it’s a start.

Then, using the Data defined override and Size Assistant style in QGIS 2.18, you can add army sizes to the symbols (sizeA+sizeB), to create a multivariate map. Note, however, that I don’t have very many army size statistics (the no-data events are all those tiny dots), but you get the idea – add a continuous variable to a categorical variable, and you’ve got two dimensions.

Remember, with GIS and a good data set, the world’s your oyster.

Next up – getting that good data set. In other words, setting up the Early Modern Wars database in MS Access. What? You want to see my entity-relationship diagram so far? Sure, why not:

And, once sabbatical hits this summer, I’ll be appealing to y’all (just got back from Texas) to help me fill in the details, to share our knowledge of early modern European warfare with the world.

]]>https://jostwald.wordpress.com/2018/03/16/wars-of-italy-pt-2/feed/0jostwaldScreenshot 2018-03-16 10.42.18.pngScreenshot 2018-03-16 10.50.10.pngEMEWars ER diagram.PNGHistorical Research in the 21st Centuryhttps://jostwald.wordpress.com/2018/03/01/historical-research-in-the-21st-century/
https://jostwald.wordpress.com/2018/03/01/historical-research-in-the-21st-century/#commentsThu, 01 Mar 2018 19:59:18 +0000http://jostwald.wordpress.com/?p=6635So let’s say you’ve become obsessed with GIS (geographical information systems). And let’s also posit that you’re at a teaching institution, where you rotate teaching your twelve different courses plus senior seminars (three to four sections per semester) over multiple years, which makes it difficult to remember the ins-and-out of all those historical narratives of European history from the 14th century (the Crusades, actually) up through Napoleon – let’s ignore the Western Civ since 1500 courses for now. And let’s further grant that you are particularly interested in early modern European military history, yet can only teach it every other year or so.

So what’s our hypothetical professor at a regional, undergraduate, public university to do? How can this professor possibly try to keep these various periods, places and topics straight, without burdening his (errr, I mean “one’s”) students with one damned fact after another? How to keep the view of the forest in mind, without getting lost among the tree trunks? More selfishly, how can one avoid spending way too much prep time rereading the same narrative accounts every few years?

Why, visualize, of course! I’ve posted various examples before (check out the graphics tag), but now that GIS makes large-scale mapping feasible (trust me, you don’t want to manually place every feature on a map in Adobe Illustrator), things are starting to fall in place. And, in the process, I – oops, I mean our hypothetical professor – ends up wondering what historical research should look like going forward, and what we should be teaching our students.

I’ll break my thoughts into two posts: first, the gritty details of mapping the Italian Wars in GIS (QGIS, to be precise); and then a second post on collecting the data for all this.

So let’s start with the eye-candy first – and focus our attention on a subject just covered in my European Warfare class: the Italian Wars of the early 16th century (aka Wars of Italy). I’ve already posted my souped-up timechart of the Italian Wars, but just to be redundant:

Italian Wars timechart

That’s great and all, but it really requires you to already have the geography in your head. And, I suppose, even to know what all those little icons mean.

Maps, though, actually show the space, and by extension the spatial relationships. If you use PowerPoint or other slides in your classes, hopefully you’re not reduced to re-using a map you’d digitized in AutoCAD twenty years earlier, covering a few centuries in the future:

Instead, you’ve undoubtedly found pre-made maps of the period/place online – either from textbooks, or from other historian’s works – Google Images is your friend. You could incorporate raster maps that you happen across:

Maybe you found some decent maps with more political detail:

Maybe you are lucky enough that part of your subject matter has been deemed important enough to merit its own custom map, like this digitized version of that old West Point historical atlas:

If you’re a bit more digitally-focused, you probably noticed a while back that Wikipedia editors have started posting vector-based maps, allowing you to open them in a program like Adobe Illustrator and then modify them yourself, choosing different fills and line styles, maybe even adding a few new features:

Now we’re getting somewhere!

But, ultimately, you realize that you really want to be your own boss. And you have far more questions than what your bare-bones map(s) can answer. Don’t get me wrong – you certainly appreciate those historical atlases that illustrate Renaissance Italy in its myriad economic, cultural and political aspects. And you also appreciate the potential of the vector-based (Adobe Illustrator) approach, which allows you to add symbols and styling of your own. You can even search for text labels. Yet they’re just not enough. Because you’re stuck with that map’s projection. Maybe you’re stuck with a map in a foreign language – ok for you, but maybe a bit confusing for your students. And what if you want to remove distracting features from a pre-existing map? What if you care about what happened after Charles VIII occupied Naples in early 1495? What if you want to significantly alter the drawn borders, or add new features? What if you want to add a LOT of new features? There are no geospatial coordinates in the vector maps that would allow you to accurately draw Charles VIII’s 1494-95 march down to Naples, except by scanning in another map with the route, twisting the image to match the vector map’s boundaries, and then eye-balling it. Or what if you want to locate where all of the sieges occurred, the dozens of sieges? You could, as some have done, add some basic features to Google Maps or Google Earth Pro, but you’re still stuck with the basemap provided, and, importantly, Google’s (or Microsoft’s, or whoever’s) willingness to continue their service in its current, open, form. The Graveyard of Digital History, so very young!, is already littered with great online tools that were born and then either died within a few short years, or slowly became obsolete and unusable as internet technology passed them by. Among those online tools that survive for more than a five years, they often do so by transforming into a proprietary, fee-based service, or get swallowed up by one of the big boys. And what if you want to conduct actual spatial analysis, looking for geospatial patterns among your data? Enter GIS.

So here’s my first draft of a map visualizing the major military operations in the Italian peninsula during the Italian Wars. Or, more accurately, locating and classifying (some of) the major combat operations from 1494 to 1530:

Pretty cool, if you ask me. And it’s just the beginning.

How did I do it? Well, the sausage-making process is a lot uglier than the final product. But we must have sausage. Henry V made the connection between war and sausage quite clear: “War without fire is like sausages without mustard.”

So to the technical details, for those who already understand the basics of GIS (QGIS in this case). If you don’t know anything about GIS, there are one or two websites on the subject.

I’m using Euratlas‘ 1500 boundaries shapefile, but I had to modify some of the owner attributes and alter the boundaries back to 1494, since things can change quickly, even in History. In 1500, the year Euratlas choose to trace the historical boundaries, France was technically ruling Milan and Naples. But, if you know your History, you know that this was a very recent change, and you also know that it didn’t last long, as Spain would come to dominate the peninsula sooner rather than later. So that requires some work fixing the boundaries to start at the beginning of the war in 1494. I should probably have shifted the borders from 1500 back to 1494 using a different technique (ideally in a SpatiaLite database where you could relate the sovereign_state table to the 2nd_level_divisions table), but I ended up doing it manually: merging some polygons, splitting other multi-polygons into single polygons, modifying existing polygons, and clipping yet other polygons. Unfortunately, these boundaries changed often enough that I foresee a lot of polygon modifications in my future…

Notice my rotation of the Italian boot to a reclining angle – gotta mess with people’s conventional expectations. (Still haven’t played around with Print Composer yet, which would allow me to add a compass rose.) More important than being a cool rebel who blows people’s cartographic preconceptions, I think this non-standard orientation offers a couple of advantages. First, it allows you to zoom in a bit more, to fit the length of the boot along the width rather than height of the page. More subtly, it also reminds the reader that the Po river drains ‘down’ through Venice into the Adriatic. I’m sure I’m not the only one who has to explicitly remind myself that all those northern European rivers aren’t really flowing uphill into the Baltic. (You’re on you own to remember that the Tiber flows down into the Tyrrhenian Sea.) George “Mr. Metaphor” Lakoff would be proud.

I converted all the layers to the Albers equal-area conic projection centered on Europe, for valid area calculations. In case you don’t know what I’m talking about, I’ll zoom out, and add graticules and Tissot’s indicatrices, which illustrate the nature of the projection’s distortions of shape, area and distance as you move away from the European center (i.e. the main focus of the projection):
And in case you wanted my opinion, projections are really annoying to work with. But there’s still room for improvement here: if I could get SpatiaLite to work in QGIS (damn shapefiles saved as SpatiaLite layers won’t retain the geometry), I would be able to re-project layers on the fly with a SQL statement, rather than saving them as separate shapefiles.

I’m still playing around with symbology, so I went with basic shape+color symbols to distinguish battles from sieges (rule-based styling). I did a little bit of customization with the labels – offsetting the labels and adding a shadow for greater contrast. Still plenty of room for improvement here, including figuring out how to make my timechart symbols (created in Illustrator) look good in QGIS.
After discovering the battle site symbol in the tourist folder of custom markers, it could look like this, if you have it randomly-color the major states, and include the 100 French battles that David Potter mentions in his Renaissance France at War, Appendix 1, plus the major combats of the Italian Wars and Valois-Habsburg Wars listed in Wikipedia:
Boy, there were a lot of battles in Milan and Venice, though I’d guess Potter’s appendix probably includes smaller combats involving hundreds of men. Haven’t had time to check.

I used Euratlas’ topography layers, 200m, 500m, 1000m, 2000m, and 3500m of elevation, rather than use Natural Earth’s 1:10m raster geotiff (an image file with georeferenced coordinates). I wasn’t able to properly merge them onto a single layer (so I could do a proper categorical color ramp), so I grouped the separate layers together. For the mountain elevations I used the colors in a five-step yellow-to-red color ramp suggested by ColorBrewer 2.0.

I saved the styles of some of the layers, e.g. the topo layer colors and combat symbols, as qml files, so I can easily apply them elsewhere if I have to make changes or start over.

You can also illustrate the alliances for each year, or when they change, whichever happens more frequently – assuming you have the time to plot all those crazy Italian machinations. If you make them semi-transparent and turn several years’ alliances on at the same time, their overlap with allow you to see which countries switched sides (I’m looking at you, Florence and Rome), vs. which were consistent:

Plotting the march routes is also a work in progress, starting by importing the camps as geocoded points, and then using the Points2One plugin to connect them up. With this version of Charles’ march down to Naples (did you catch that south-as-down metaphor?), I only had a few camps to mark, so the routes are direct lines, which means they might display as crossing water. More waypoints will fix that, though it’d be better if you could make the march routes follow roads, assuming they did. Which, needless to say, would require a road layer.

Not to mention applying spatial analysis to the results. And animation. And…

More to come, including the exciting, wild world of data collection.

]]>https://jostwald.wordpress.com/2018/03/01/historical-research-in-the-21st-century/feed/3jostwaldItalianWars1494-1532PPTItalySPMScreenshot 2018-02-17 13.59.49Screenshot 2018-02-17 13.59.58campaigns_charles_7Italian Wars 1494 mapScreenshot 2018-02-17 13.40.19Screenshot 2018-02-17 14.21.17Screenshot 2018-03-01 14.18.11.pngScreenshot 2018-03-01 14.27.00.pngScreenshot 2018-03-01 14.44.52.pngVoyant-to-web also a successhttps://jostwald.wordpress.com/2017/06/25/voyant-to-web-also-a-success/
https://jostwald.wordpress.com/2017/06/25/voyant-to-web-also-a-success/#respondSun, 25 Jun 2017 19:30:05 +0000http://jostwald.wordpress.com/?p=5476In case you need proof, here’s a link (collocate) graph from Voyant tools, based off the text from the second volume of the English translation of the “French” Duke of Berwick’s memoirs published in 1779: http://jostwald.com/Voyant/VoyantLinks-Berwick1.html. Curious which words Berwick used most frequently, and which other words they tended to be used with/near? (Or his translator, in any case.) Click the link above and hopefully you’ll see something like this, but interactive:

After you upload your text corpus in the web version of Voyant, you can then export any of the tools and embed it in your own website using an iframe (inline frame). Note that you can also click on any of the terms in the embedded web version and it will open up the full web version of Voyant, with the corpus pre-loaded. Something like this, but oh-so-much-more-interactive:

Apparently the Voyant server keeps a copy of the text you upload – no idea how long the Voyant servers keep the text, but I guess we’ll find out. There’s also a VoyantServer option, which you install on your own computer, for faster processing and greater privacy.

Never heard of Voyant? Then you’d best get yourself some early modern sources in full text format and head on over to http://voyant-tools.org.

]]>https://jostwald.wordpress.com/2017/06/25/voyant-to-web-also-a-success/feed/0jostwaldScreenshot 2017-06-25 14.49.23.pngScreenshot 2017-06-25 14.47.08.pngAutomating Newspaper Dates, Old Style (to New Style)https://jostwald.wordpress.com/2017/05/27/automating-newspaper-dates-old-style-to-new-style/
https://jostwald.wordpress.com/2017/05/27/automating-newspaper-dates-old-style-to-new-style/#respondSat, 27 May 2017 18:40:49 +0000http://jostwald.wordpress.com/?p=5174If you’ve been skulking over the years, you know I have a sweet spot for Devonthink, a receptacle into which I throw all my files (text, image, PDF…) related to research and teaching. I’ve been modifying my DTPO workflow a bit over the past week, which I’ll discuss in the future.

But right now, I’ll provide a little glimpse into my workflow for processing the metadata of the 20,000 newspaper issues (yes, literally 20,000 files) that I’ve downloaded from various online collections over the years: Google Books, but especially Gale’s 17C-18C Burney and Nicholls newspaper collections. I downloaded all those files the old-fashioned way (rather than scraping them), but just because you have all those PDFs in your DTPO database, that still doesn’t mean that they’re necessarily in the easiest format to use. And maybe you made a minor error, but one that is multiplied by the 20,000 times you made that one little error. So buckle up as I describe the process of converting text strings into dates and then back, with AppleScript. Consider it a case study of problem-solving through algorithms.

The Problem(s)

I have several problems I need to fix at this point, generally falling under the category of “cleaning” (as they say in the biz) the date metadata. Going forward, most of the following modifications won’t be necessary.

First, going back several years I stupidly saved each newspaper issue by recording the first date for each issue. No idea why I didn’t realize that the paper came out on the last of those dates, but it is what it is.

London Gazette: published on Dec. 13 or Dec. 17?

Secondly, those English newspapers are in the Old Style calendar, which the English stubbornly clung to till mid-century. But since most of those newspapers were reporting on events that occurred on the Continent, where they used New Style dates, some dates need manipulating.

Automation to the Rescue!

To automate this process (because I’m not going to re-date 20,000 newspaper issues manually), I’ve enlisted my programmer-wife (TM) to help me automate the process. She doesn’t know the syntax of AppleScript very well, but since she programs in several other languages, and because most programming languages use the same basic principles, and because there’s this Internet thing, she was able to make some scripts that automate most of what I need. So what do I need?

First, for most of the newspapers I need to add several days to the listed date, to reflect the actual date of publication – in other words, to convert the first date listed in the London Gazette example above (Dec. 13) into the second date (Dec. 17). So I need to take the existing date, listed as text in the format 1702.01.02, convert it from a text string into an actual date, and then add several days to it, in order to convert it to the actual date of publication. How many days exactly?

Well, that’s the thing about History – it’s messy. Most of these newspapers tended to be published on a regular schedule, but not too regular. So you often had triweekly publications (published three times per week), that might be published in Tuesday-Thursday, Thursday-Saturday, and Saturday-Tuesday editions. But if you do the math, that means the Saturday-Tuesday issue covers a four-day range, whereas the other two issues per week only cover a three-day range. Since this is all about approximation and first-pass cleaning, I’ll just assume all the issues are three-day ranges, since those should be two-thirds of the total number of issues. For the rest, I have derivative code that will tweak those dates as needed, e.g. add one more to the resulting date if it’s a Saturday-Tuesday issue, instead of a T-R or R-S issue. If I was really fancy, I’d try to figure out how to convert it to weekday and tell the code to treat any Tuesday publication date as a four-day range (assuming it knows dates before 1900, which has been an issue with computers in the past – Y2k anyone?).

So the basic task is to take a filename of ‘1702.01.02 Flying Post.pdf’, convert the first part of the string as text (the ‘1702.01.02’) into a date by defining the first four characters as a year, the 6th & 7th characters as a month…, then add 2 days to the resulting date, and then rename the file with this new date, converted back from date into a string with the format YYYY.MM.DD. Because I was consistent in that part of my naming convention, the first ten characters will always be the date, and the periods can be used as delimiters if needed. Easy-peasey!

But that’s not all. I also need to then convert that date of publication to New Style by adding 11 days to it (assuming the dates are 1700 or later – before 1700 the OS calendar was 10 days behind the NS calendar). But I want to keep the original OS publication date as well, for citation purposes. So I replace the old OS date on the front of the filename with the new NS date, and append the original date to the end of the filename with an ‘OS’ after it for good measure (and delete the .pdf), and Bob’s your uncle. In testing, it works when you shift from one month to another (e.g. January 27 converts to February 7), and even from year to year. I won’t worry about the occasional leap year (1704, 1708, 1712). Nor will I worry about how some newspapers used Lady Day (March 25) as their year-end, meaning that they went from December 30, 1708 to January 2, 1708, and only caught up to 1709 in late March. Nor does it help that their issue numbers are often wrong.

I’m too lazy to figure out how to make the following AppleScript code format like code in WordPress, but the basics look like this:
–Convert English newspaper Title from OSStartDate to NSEndDate & StartDate OS, +2 for weekday
— Based very loosely off Add Prefix To Names, created by Christian Grunenberg Sat May 15 2004.
— Modified by Liz and Jamel Ostwald May 26 2017.
— Copyright (c) 2004-2014. All rights reserved.
— Based on (c) 2001 Apple, Inc.

tell application id “DNtp”
try
set this_selection to the selection
if this_selection is {} then error “Please select some contents.”

repeat with this_item in this_selection

set current_name to the name of this_item
set mydate to texts 1 thru ((offset of ” ” in current_name) – 1) of current_name
set myname to texts 11 thru -5 of current_name

set newdate to the current date
set the year of newdate to (texts 1 thru 4 of mydate)
set the month of newdate to (texts 6 thru 7 of mydate)
set the day of newdate to (texts 9 thru 10 of mydate)

set enddate to newdate + (2 * days)
set newdate to newdate + (13 * days)
tell (newdate)
set daystamp to day
set monthstamp to (its month as integer)
set yearstamp to year
end tell

set daystamp to (texts -2 thru -1 of (“0” & daystamp as text))
set monthstamp to (texts -2 thru -1 of (“0” & monthstamp as text))

set new_item_name to formatdate & myname & ” ” & formatenddate & ” OS”
set the name of this_item to new_item_name

end repeat
on error error_message number error_number
if the error_number is not -128 then display alert “DEVONthink Pro” message error_message as warning
end try
end tell

So once I do all those things, I can use a smart group and sort the Spotlight Comment column chronologically to get an accurate sense of the chronological order in which publications discussed events.

This screenshot shows the difference – some of the English newspapers haven’t been converted yet (I’m doing it paper by paper because the papers were often published on different schedules), but here you can see how OS and NS dates were mixed in willy-nilly, say comparing the fixed Flying Boy and Evening Post with the yet-to-be-fixed London Gazette and Daily Courant issues.

Of course the reality has to be even more complicated (Because It’s History!), since an English newspaper published on January 1, 1702 OS will publish items from continental newspapers, dating those articles in NS – e.g., a 1702.01.01 OS English newspaper will have an article dated 1702.01.05 NS from a Dutch paper. So when I take notes on a newspaper issue, I’ll have to change the leading NS date of the new note to the date on the article byline, so it will sort chronologically where it belongs. But still.

]]>https://jostwald.wordpress.com/2017/05/27/automating-newspaper-dates-old-style-to-new-style/feed/0jostwaldScreen Shot 2014-03-09 at 7.53.14 PMDTPO Newspapers redated.pngThere’s gotta be a better wayhttps://jostwald.wordpress.com/2017/05/10/theres-gotta-be-a-better-way/
https://jostwald.wordpress.com/2017/05/10/theres-gotta-be-a-better-way/#respondThu, 11 May 2017 00:27:29 +0000http://jostwald.wordpress.com/?p=4882In preparation for a new introductory digital history course that I’ll be teaching in the fall, I’ve been trying to think about how to share my decades of accumulated computer wisdom with my students (says the wise sage, stroking his long white beard). Since my personal experience with computers goes back to the 80s – actually, the late 70s with Oregon Trail on dial-up in the school library – I’m more of a Web 1.0 guy. Other than blogs, I pretty much ignore social media like Facebook and Twitter (not to mention Snapchat, Instagram, Pinterest…), and try to do most of my computer work on a screen larger than 4″. So I guess that makes me a kind of cyber-troglodyte in 2017. But I think that does allow me a much broader perspective of what computers can and can’t do. One thing I have learned to appreciate, for example, is how many incremental workflow improvements are readily available, shortcuts that don’t require writing Python from the terminal line.

As a result, I’ll probably start the course with an overview of the variety of ways computers can help us complete our tasks more quickly and easily, which requires understanding the variety of ways in which we can achieve these efficiencies. After a few minutes of thought (and approval from my “full-stack” computer-programming wife), I came up with this spectrum that suggests the ways in which we can make computers do more of our work for us. Toil, silicon slave, toil!

Automation Spectrum: It’s Only a Model

Undoubtedly others have already expressed this basic idea, but most of the digital humanities/digital history I’ve seen online is much more focused on the extreme right of this spectrum (e.g. the quite useful but slightly intimidating Programming Historian) – this makes sense if you’re trying to distantly read big data across thousands of documents. But I’m not interested in the debate whether ‘real’ digital humanists need to program or not, and in any case I’m focused on undergraduate History majors that often have limited computer skills (mobile apps are just too easy). Therefore I’m happy if I can remind students that there are a large variety of powerful automation features available to people with just a little bit of computer smarts and an Internet connection, things that don’t require learning to speak Javascript or Python fluently. Call it kaizen if you want. The middle of the automation spectrum, in other words.

So I’ll want my students, for example, to think about low-hanging fruit (efficiency fruit?) that they can spend five minutes googling and save themselves hours of mindless labor. As an example, I’m embarrassed to admit that it was only when sketching this spectrum that I realized that I should try to automate one of the most annoying features of my current note-taking system, the need to clean up hundreds of PDFs downloaded from various databases: Google Books, Gale’s newspaper and book databases, etc. If you spend any time downloading early modern primary sources (or scan secondary sources), you know that the standard file format continues to be Adobe Acrobat PDFs. And if you’ve seen the quality of early modern OCR’d text, you know why having the original page images is a good idea.

But you may want, for example, to delete pages from PDFs that include various copyright text – that text will confuse DTPO’s AI and your searches. I’m sure there are more sophisticated ways of doing that, but the spectrum above should prompt you to wonder whether Adobe Acrobat has some kind of script or macro feature that might speed up deleting such pages from 1,000s (literally) of PDF documents that you’ve downloaded over the years. And, lo and behold, Adobe Acrobat does indeed have an automation feature that allows you to carry out the same PDF manipulation again and again. Once you realize “there’s gotta be a better way!”, you only need to figure out what that feature is called in the application in question. For Adobe Acrobat it used to be called batch processing, but in Adobe Acrobat Pro DC such mass manipulations now fall under the Actions moniker. So google ‘Adobe Acrobat Actions’ and you’ll quickly find websites that allow you to download various actions people have created. Which allows you to quickly learn how the feature works, and to modify existing actions. For example, I made this Acrobat Action to add “ps” (primary source) to the Keywords metadata field of every PDF file in the designated folder:

I already copied and tweaked macros and Applescripts that will add Keywords to rich text files in my Devonthink database, but this Adobe solution is ideal after I’ve downloaded hundreds of PDFs from, say, a newspaper database.

Similarly, this next action will delete the last page of every PDF in the designated folder. (I just hardcoded to delete page 4, because I know newspaper X always has 4 pages – I can sort by file size to locate any outliers – and the last page is always the copyright page with the nasty text I want to delete. I can, for example, change the exact page number for each newspaper series, though there’s probably a way to make this a variable that the user can specify with each use):

Computers usually have multiple ways to do any specific task. For us non-programmers, the internet is full of communities of nerds who explain how to automate all sorts of software tasks – forums (fora?) are truly a god-send. But it first requires us to expect more from our computers and our software. For any given software, RTFM (as they say), and then check out the software’s website forum – you’ll be amazed at the stuff you find. Hopefully all that time you save from automation won’t be spent obsessively reading the forum!

]]>https://jostwald.wordpress.com/2017/05/10/theres-gotta-be-a-better-way/feed/0jostwaldComputer automation spectrum.pngScreenshot 2017-05-10 18.52.17.pngScreenshot 2017-05-10 18.52.43.pngNo wonder we historians are bad at math – they keep changing the answershttps://jostwald.wordpress.com/2017/01/13/no-wonder-we-historians-are-bad-at-math-they-keep-changing-the-answers/
https://jostwald.wordpress.com/2017/01/13/no-wonder-we-historians-are-bad-at-math-they-keep-changing-the-answers/#respondSat, 14 Jan 2017 03:11:46 +0000http://jostwald.wordpress.com/?p=4665Apropos an old thread on naming wars based off their duration (and how complicated that really is), this story appeared recently on my History News Network feed. It’s neither early modern nor European, but it’s been a busy six months.

Professor: “How long was the Eight-Year War of Resistance against Japanese Aggression”?
Student: “Eight years.”
Professor: “Wrong. Fourteen.”

My main thought: while it’s nice that there’s an official name for wars, just imagine the need to change all those references and Library of Congress subject headings. Ugh.

]]>https://jostwald.wordpress.com/2017/01/13/no-wonder-we-historians-are-bad-at-math-they-keep-changing-the-answers/feed/0jostwaldfile-jan-13-9-59-00-pmI sure do love Lincoln and Washingtonhttps://jostwald.wordpress.com/2016/02/20/i-sure-do-love-lincoln-and-washington/
https://jostwald.wordpress.com/2016/02/20/i-sure-do-love-lincoln-and-washington/#commentsSat, 20 Feb 2016 18:43:17 +0000http://jostwald.wordpress.com/?p=4460Because they give us U.S. faculty on a MWF teaching schedule a full week off in the Spring, and that’s before Spring Break. Which, combined with the two consecutive snow days last Friday and this past Monday, mean I’ve had the time to finish up my siege capitulation chapter (okay, 99% done) that I’ve been working on forever. Literally. I wrote a graduate seminar paper on the subject circa 1994.

Why has it taken so long to finish this chapter with a target length of only 12,000 words? Let me count the ways, leaving aside non-project issues:

I’m generally an empirical, detail-oriented kind of guy, which means I’m not convinced by historical arguments that rely primarily upon logic or theory (without copious empirical evidence from the period/place/subject in question), nor by arguments that mention a single example or two, without illustrating the depth of the research behind the selection of those cases – and bibliographical entries don’t tell us much about which sources were actually read through, which case studies were actually studied. My three arch-enemies are unfortunately quite popular among too many of the historians I read: Argument by Anecdote, Argument by Appeal to Theory, and Argument by Appeal to Zeitgeist. (Don’t know if these are in Hackett Fisher’s Historian’s Fallacies.) Unfortunately this describes the historiography on siege capitulations to a tee. Which means I need to break the cycle of violence – violence to contemporaries (whom we caricature and pigeon-hole), and violence to historical research method (which we turn into a game of “seek-a-pithy-quote-in-an-easy-to-find-source” and “let’s-just-assume-that-theoretical-treatises-accurately-describe-reality-because-they’re-easier-to-process”). Breaking a cycle of violence takes time.

Early modern European military history has a ton of sources (relative to the period and place), and thanks to the digital age and some small research grants, I now have a ton of those on my hard drives (and the cloud) that I can call up effortlessly. That allows me to analyze the subject in a lot more detail: to learn a lot more about what dozens of contemporaries actually said and thought and did, and to analyze the subject with much finer granularity, to see all the exceptions that get buried as history races on at its dizzying pace. (Seriously, every historian should read Butterfield’s Whig Interpretation of History at least once every few years.) But that also takes a lot more time (not the reading Butterfield part).

“My” war, the War of the Spanish Succession, had about 125 sieges (I’ve lost count, to be honest) – I’m only now starting to realize what a massive number of sieges that is, especially for a war of thirteen years. Because I’ve focused on the Low Countries theater my whole career, I’m focusing on those cases, though I draw from other theaters as well. But even examining only one of the four main theaters of war, we’re still talking about 36 sieges or so. And in those 36 sieges there are actually 49 different fortifications (towns plus associated citadels and forts), each of which might receive a distinct capitulation. And, if you’ve read my Vauban book, you know I want to see the patterns, rather than just choose a random (or worse, non-random) example and say “This is how they all were!” To repeat a previous graphic:

So when you combine the 36/49 Flanders siege surrenders with the yuuge number of sources available, you get a lot of sources to look through. And I really do need to look through a lot of them. My ideas keep getting richer and more nuanced as I compare one siege capitulation with another, one interpretation of the capitulation of X with a divergent view from another source. It’s not just about piling on 12 different sources all supporting the claim that X happened at siege A, though corroboration is important in and of itself. It’s also about finding out that source 1 disagrees with source 2 about claim X, which means we need to be more nuanced when describing contemporary conceptions of X. It’s also about finding 5 examples of the same thing at sieges A, B, and C, which suggests that it’s a more robust phenomenon than just a one-off, that there’s some kind of trend or pattern among multiple sieges, over time. And it’s also about finding 5 examples of several reasons why X occurred at sieges A, B, and C, which is consilience, or different types of reasons all supporting the same broad claim. If I can put it in argument mapping terms, the last map is structurally much stronger than the others (ceteris paribus), and we should be requiring more evidence for our claims rather than less, since we historians overgeneralize all, the, time:

One Reason, supported by 3 sources

3 Reasons, 1 source each

2 Reasons, 3 Sources each

3 Reasons, 3 Sources each

But finding these details require time:

In theory you can make educated guesses as to which events will strike gold: ‘this was a really big siege, so it should have a really big discussion about topic Y.’ But I’ve found it isn’t easy to guess which sieges will reveal the best case studies. On several occasions I have serendipitously come upon a really revealing discussion of capitulations for a siege that I never would’ve predicted would yield such insight. And then I am disappointed that a better-known siege has nothing of use. Considering that I know these sieges pretty well (I think I wrote something about the subject once), the Sisyphean prospect is disheartening, but it also feeds into the addiction to keep wading through the sources. I still come across examples where I find myself saying: “Holy sh*t, that’s exactly what I wondered, and I would truly hate myself if I’d published my chapter without that example.” Further, most contemporaries will only make a big point with one siege (which one, you have to guess), not repeat it when they discuss siege after siege. Normally scholars talk about diminishing returns with the increasing number of sources you consult, but, scarily, I haven’t even come close to reaching that inflection point yet. And the richer the interpretation, the more hypotheses to test and the wider to cast your net. Oh wait, now you mean I have to go back to all those other capitulations and see if they mention trumpets as well? Damn.

One of the most interesting aspects of siege capitulations is how they were interpreted by observers (the participants themselves spent practically no time whatsoever describing these evacuations and marks of honor outside of the capitulation – hint, hint). One of the best sources to provide an observer’s perspective are the several dozen contemporary newspapers – for this project I’m largely limiting myself to English- and French-language papers. They come in a variety of flavors (I’ve already expressed my affection for the salty Observator), but they do have one big problem familiar to anyone who’s used the genre: their asynchronous reporting means that you never really know which issue to look in. For example, I know the siege of X surrendered on June 6 N.S. But I can’t just focus on the papers of June 6-8. News travels at different speeds, different accounts are sent at different times, breaking news might delay publishing some news stories, etc. Even worse, the English newspapers were, the bastards, still using the Old Style calendar, which means on top of all the other issues, you need to subtract 11 days from the issue date, since the fighting was occurring on the Continent (my old Access database did that automatically with some code, but not DTPO). But then news across the Channel might be further delayed – they often complained about packet boats being delayed by weather, or by being on the wrong side of the Channel – so you can really get quite the asynchronicity headache reading any given issue of an English newspaper (see my series on “I Read the News Today”). And most of the papers were only published every few days, which means you need to spread your net out even further, not to mention all the monthlies. The result is that for each siege, you need to read the closest issue after the capitulation date, plus or minus a month or more for each paper, so you can see the expectations for the surrender, the initial reports of its capitulation, and then see how the news of the event kept trickling in, as well as reactions to it. You need to check the year-end roundup as well, which, wouldn’t you figure, isn’t always published in the last issue of the year. And you have to skim through the entirety of each issue, because A) you don’t know where in the newspaper your stories will be (fortunately each issue is only two two-column pages, but very small print with dense detail and topics poorly demarcated within each article), and B) they might have two different accounts (on two different dates) on different pages in the same issue, the first speculating on a coming surrender and the next with news of the garrison having already evacuated. All of which takes time.

Fortunately it doesn’t take much time to create notes, since my DTPO scripts are working beautifully. The problem I do find, however, is that I end up transcribing a lot (the originals are usually in image PDFs) because I’ll find four other interesting quotes about topics A, B, and C in addition to topic X that I’m currently working on. And given my experience with the importance of serendipitous discovery (see #4 above), it’s probably better in the long run to take that extra minute or two to keyword and summarize what you find when you find it. DTPO makes it easy to find it later on, and it helps with DTPO’s AI.

Finally, it’s taken so long because I was writing a work that should really be published in two separate pieces, actually one, much longer, piece. I decided that the honor and siege capitulations that I really want to write about will probably take twice the space, or more, of the 12,000 words that I’m allowed in this book chapter. So after writing big chunks on a dozen different points, I decided last month to just chop the thing in half, and only focus on the garrison’s “marks of honor” for this chapter (kind of). Over the summer I’ll write the bigger piece, incorporate some of the recent research by new French Ph.D.s (without stealing their thunder, I hope), and maybe just post it up online: discuss the Flanders theater more comprehensively (deal with all the sieges, deal more with the disagreements between sources), expand my discussion beyond the Flanders theater, and expand my discussion beyond the garrison’s honor to the honor of the other participants and observers as well. And I have no interest in submitting this to another journal, or turning it into a book. I’m getting tired of the multi-year delays, of following publishers’ limits and cutting out all sorts of interesting things to water down a work, chasing after a mythical broad(er) audience that I doubt even exists for my topic – and I’m a co-editor! If nobody else wants to read it, I’ll manage. I’ll read it again and again! And its full text will be discoverable in Google (or your search engine of choice). But then I can do that because I’ve already got tenure, and I’ll have plenty of other future publications. Maybe I’ll even revise the work as I find more information and refine my conceptualization of the subject. It’s a whole new world…

The Digital Advantage

So speaking of things that can’t be put in a printed work, I include here an image of my content analysis of the Flanders capitulations’ terms, which I initially started in grad school. As usual, there are some errors, but it gives one example of how scholars should study topics that are said to be structured and repetitive: test it! Doing so, I’m surprised at how unstructured they were, considering how structured they’re said to be, and how structured they could have been. (Obviously this depends on how standardized you think rituals are supposed to be, which is apparently controversial among specialists in ritual studies.)

So, if anyone out there is still reading: I’m soliciting a few volunteers to read through my chapter draft for general comments about the argument. If you’re tempted but uncertain, I can summarize what it’s about, to help you decide if it’s worth your time.

My general thesis is that historians have given way too much attention to these little marks of honor. First off, these ritual evacuation terms (evacuating with drums beating, flags flying, musket balls in the mouth…) weren’t very precisely calibrated, contra the common belief. This is significant in part because it’s a common component of that whole ‘rhetoric of siege history’ (artificial, scientific, ritualized, like a chess game or ballet, blah blah blah), but also because (some) cultural historians talk about rituals constructing meaning. Most contemporaries, including the garrisons themselves, didn’t care about these evacuation marks in any specific sense. They wanted an “honorable surrender,” sure, and preferred to get “all the marks of honor,” but nobody was clear on what “all” the marks of honor even were. On the other hand, they were very clear about practical things like whether the garrison was free or imprisoned (i.e. its fate), and who had to pay for all this broken stuff (“you broke it” did not usually mean “you bought it”). Both participants and observers definitely cared about honor, but in a much broader sense: the evacuation marks were only one minor way to measure a garrison’s honor. And from the outside, these marks of honor weren’t particularly diagnostic because there were all sorts of reasons why the marks (and even the garrison’s fate) didn’t always correspond to a garrison’s honorable conduct.

But only after reading through a few summaries of ritual theory did I realize that while these evacuations were performative rituals (i.e. the honor theoretically came from the performance of the evacuation that was allowed by the besiegers), there were other “markers” of the garrison’s honor that were much more important, other indications of honor that observers and participants cited again and again, and which had nothing to do with those symbolic evacuation marks that nobody mentioned. That is to say, I knew about all those contemporary arguments before and had argued that these evacuation marks weren’t very important, but I hadn’t realized that these other reasons why the garrison defended itself well or poorly were also markers of the garrison’s honor, just like the evacuation terms except more useful. The result is, I think, a much more interesting case where symbolic capitulation terms, the garrison’s fate, and its honor were manipulated and debated with a variety of contested markers, rather than a simplistic view of a garrison’s evacuation ritual defining its honor. I think I’m also setting up a framework to talk about how exactly early moderns argued that military commanders gained honor or shame; I won’t say I’m deconstructing the discursive field of early modern martial honor, but it might be something like that. Various scholars have talked about what early modern (martial) honor was, and talk in general terms about who could have it, how one gained it from brave acts (duh), and the physical markers that displayed it (in addition to evacuation marks of honor, this includes medals and insignia, batons and Napoleon’s “baubles”…). What I’m doing is getting really specific, spelling out the specific rhetorical strategies that contemporaries (garrisons, besiegers, relief commanders, outside observers in the army and at Court, newspapers and the reading publics most broadly…) used to argue that a garrison earned either martial honor or shame, and looking at how these debates evolved over time. What markers were used to claim that a garrison had gained honor, and which were most popular? (Honor being all about getting your peers to acknowledge your claim to honor.) How did the different sides advance and contest honor claims? That’s the big question that I’m interested in here. And, needless to say, I’ll apply it to my battle book as well.

So hopefully this all makes sense – I’m still formulating the broader framework. Thoughts appreciated.

If you’re interested, you can email me at the address listed in the fourth comment of the About Me page. I’ll need feedback by April.

But now I have to switch gears and write my chapter on Louis XIV’s pursuit of relief battles, and what that says about battle-avoidance.

The utility of list-making depends on your ability to write ideas down (externalize) wherever you happen to be when you think about them, and the ability of your system to easily create a variety of task lists that you can consult in different circumstances. When you learn (or remember) that you have to do something and it will take more than a few minutes to complete the task, write it down in your system. It’s ok if you don’t want to spend the time figuring out which Project/Context/Action/Tag it belongs to right now, just leave it in the Inbox and you can add the rest of the metadata later when you have more time.

So assuming you organize your projects and tasks in a logical manner, and assuming that you assign the appropriate metadata, you have an incredibly wide range of lists at your command. How do you use them? It’s really not hard. Ask yourself “Where am I?” and consult that @Context list to see what you can do there. Or ask yourself “What do I want to do today?” and take a peek at the Projects list to see what you can do. Then decide which tasks on those lists you want to do, and do them.

Here’s how those lists get used throughout the day.

Scenario 1: School Day

You’ve got school in the morning, so by the time you go to bed, you’ve already emptied your head of any tasks that need to be done tomorrow (i.e. you trust that they will be there in the morning). And you’ve glanced at your Tomorrow Focus view, which alerts you to the deadlines ahead.

In the morning, after you’ve performed your ablutions and gustations, you check your Today Focus view to remind yourself of the deadlines o’ the day, as well as any tasks that have hard due dates. If there’s time, skim over any overdue tasks, and either reassign their due dates, or change them to None and track them through the other PI lists. Otherwise be sure to change their dates later, before they start to pile up.

If you want, you can continue to scroll down the Today view and look at the tasks in any of the filters or views you’ve assigned to the Focus view. In my case, that means the Do Next 7 Days filter (tasks to complete over the next week), the Tasks Started list (in progress and repeating tasks), and the Starred Tasks list, which are all the major projects I have on the burner.

Before you head off to work, you should check the Context @Home-Office To School, to see if there are any items that you need to take with. Alternately, you could skip checking out this Context list by assigning such tasks with a due date and time (approximate a time before you leave home); those tasks would then appear in the appropriate Focus view on the appropriate day. But only do that if you know you’ll be going in on that day – avoid rescheduling tasks if at all possible. To form the habit, it’s probably easier just to keep them all in the @To School context list.

Once you get to the office at school, and find you have some time to kill, or slow office hours, check the @ECSU Office (probably start with sort by Action). For example, I often set reminders to enter student homework scores in the gradebook when I get to school in the morning (our state is anal about keeping student grades outside the university ecosystem) – this prevents me from having nightmares where I realize at the end of the semester that I never recorded their grades, and of course half of the students threw their assignments away.

Do whichever tasks can be done given the time you have and your energy level, and their priority (including the sort order of sequential projects) – it’s your call, which is why some people like to use contexts like @LowEnergy or @15minutes for short tasks.

If there aren’t any (important) tasks in the @ECSU Office list, you can always jump to one of the other ‘generic’ lists, such as @Computer-Any (assuming you brought your laptop), or make a few phone calls from your @Computer-iPhone list. Maybe you need to talk to So-and-So about such-and-such, so check your @ECSU-Person So-and-So list and walk down the hall. Maybe you want to focus on that book chapter draft coming due in a few weeks – go to its Project list and sort by Context or Next Action. Don’t forget to reward yourself by checking off the tasks once you’ve finished them.

Before class starts, check @ECSU 231 To Bring to see if there are any items you need to take along with you: maybe a deck of Marlborough Victory Cards to show the class, or graded papers if you’re the forgetful type.

Once you arrive in class, pull up @ECSU 231 Class and make the announcements on that list. If a student needs to take a makeup exam, pull up the calendar and schedule it. If a student asks a question that requires further work, make a task.

Need to go to the library? Check the @ECSU Library list, or the @ECSU Office To Library list if you have one.

Have a meeting? Check your @Meeting context (or the meeting Project) to see what you need to do and bring and discuss. Run into somebody? Check their @Person context to see what you need to tell them, and what you need from them. All requests for appointments and commitments are easily checked and recorded on the calendar.

If you’re gunning for greater efficiency, get in the habit of checking your Today Focus view at the end of every meeting or class. Maybe you have time to take a quick glance at the contexts between your current position and your next destination point? On occasion I’ve envisioned myself swinging by the library to pick up a book on the way back from class, but then I forget about my detour by the time class was over.

Before you turn off the lights and head out the door, give one last look at your Today Focus view (maybe you have to stop and pick something or somebody up?), and check the @ECSU Office Bring Home context to make sure you haven’t forgotten to put anything in your bookbag. You’ll head home confident that you haven’t forgotten anything. (If you have forgotten something, it’s because either you didn’t externalize it into your system, or your system has holes.)

At the end of the day, you can look at your Today Focus view one more time if you need to reassure yourself that you didn’t miss anything important. Glance ahead at the Tomorrow Focus view to make sure there aren’t any surprising waiting, knowing that you can refresh your memory in the morning. If you do see something of concern, check out the Project task list, particularly if it would disturb your slumber if not dealt with. If you need positive affirmations, you can look back over the tasks you’ve Completed Today, and satisfy yourself that you did indeed accomplish something after all.

If you’re feeling particularly energetic, you could even check to make sure there’s a Next Action for each Project that moved forward that day. But most important is to clear your head before bed. Now go to sleep – you’ve earned it.

Scenario 2: “Day Off”

It’s morning and you’re now ready to see what’s ahead of you. Hopefully you checked the Tomorrow Focus view the night before, so you aren’t surprised by the events staring at you on your calendar, nor the tasks due today. But let’s say you don’t have much specific planned, which means you could get some real work done. But what to do? You could go the usual Context route – what can I do in @Home-Office? If I go out to the @Garage? But if you’re at home you probably have a wider range of contexts available to you, and therefore many more possible tasks you could perform. So you just need to decide where you want to start. Decide on a Context, Project or domain (area of focus) and consult the corresponding list of tasks to see what you could do. Then choose what you will do. And revel in the satisfaction of checking those puppies off when you’re done.

Do you want to keep Inbox Zero humming along? Then pull up your @Computer-Email list and crank out those emails. A bunch of phone calls to make? @Computer-iPhone’s the ticket.

Did you dedicate this morning to getting some research done? Then pull up your Research smart filter (maybe sorted by Project) and either choose one of the Projects to work on, or knock off a couple of the Next Actions to move several Projects forward a small increment.

Or maybe you’ve got a specific research project to get going on. Pull up that Project’s list (sorted by Action probably) and get to work.

Sequential project with Next Action assigned automatically

Or, just maybe, you’re the kind of person who surrenders himself/herself to the universe? In which case you could just look through your Next Actions list (probably sorted by Context or Project) and ask yourself: “What do I feel like doing right now?”

Any time you find yourself thinking about some project, externalize your thoughts and plans in your system – don’t forget to use the Someday Action for plans that are still in the pie-in-the-sky stage. If you find yourself worrying about how you’ll do some project, whip out PI and create the Next Action required to move the project forward. You’ll feel better, even if you don’t plan out the rest of the project.

Just Do It

Ultimately, PI and GTD won’t do the tasks for you. They won’t even tell you which task to do when (unless you set due dates), and they won’t pester you (unless you set due dates and repeating alarms). But they will provide the structure for you to make those decisions yourself. If you choose to use them.