Flips and Clicks: More Musings on a Multimodal Analysis of Scissors-and-Paste Journalism (Part 3)

This is part 3 of a 4-part series. Parts 1 and 2 of this essay are available here.

Having identified the many types of information that can help us understand these historical social networks, which is the best method for bringing them together? As with textual and contextual clues, the best approach for understanding wider dissemination networks is multi-layered, or, as the title of this series suggests, multimodal.

The first and most obvious model divide is between the printed and the digital. This distinction is, however, slightly misleading, as a great deal of newspaper content is currently only available in microfilm format or through digitized images that, containing no machine-readable metadata, must be treated in a similar way as microfilm (see, for example, the 18h century holdings of Google News Archive). As both these formats must be examined manually, that is without the aid of keyword or full-text search functionality, we shall treat them alongside manual examination of loose or bound originals.

Manual examination, wherein a researcher flips, winds, or clicks through a chronological series of issues within a single newspaper title, has several advantages and limitations when mapping dissemination pathways. First, it allows, even encourages, the mapping of an entire title, rather than a few serendipitously chosen articles within it. Moving methodically from page to page, and issue to issue, the researcher has two, perhaps overlapping, options: manual cataloguing and manual transcription.

The first, in essence, is the creation of a highly specialized index for the title. Using a spreadsheet or database—or indeed pen and paper—the researcher can catalogue each individual article in a run, detailing its title, author, dateline, topic, and any references to its source or origin. The level of detail available from purely textual material is, of course, limited, and may result in wild fluctuations in the accuracy of any given network cluster. It does, however, have the advantage of giving a broad overview of the title, and how its different content types—shipping news, parliamentary news, foreign news, colonial news, miscellany—relate to each other in form and origin. Are different topics sourced from different, interlocking networks or does the paper as whole following a consistent pattern of news-gathering?

Moreover, by limiting the scope of the project to cataloguing textual clues, a researcher can theoretically move quickly from page to page, issue to issue, year to year, creating a tremendously useful resource in a relatively limited period of time—time being a particularly precious commodity when using material housed within a library at a distance from the researcher’s base. Even if the catalogue is limited to a particular type of content, or topic, it can provide a detailed network map against which the researcher, or a successor, can contextualize other material.

The second option is to catalogue metadata, such as the section or page number, alongside complete, manual transcriptions of articles, creating a corpus of machine-readable texts. Although these are seemingly less complete than fully digitized versions of the material, markup-languages such as XML can provide typesetting and other spatial information. These transcriptions can then be used alongside those obtained via optical character recognition to map changes and continuity in the dissemination of individual articles across many different titles. The costs and technical eccentricities associated with digitization and OCR projects currently limits our ability to undertake large-scale analyses of periodical networks. However, the creation and dissemination of machine-readable transcriptions for even a sub-section of this un-digitized material, collected to inform cognate projects, could revolutionise computer-aided analyses of newspaper networks.

Machine-readable digital content, of course, can and must be treated differently from printed or microform versions. Providers, such as the Library of Congress and Trove, that offer API access to digitized newspaper text offer researchers a tantalizing opportunity; they can bypass the messy, time-consuming process of manual transcription and delve into computer-aided analysis of the text itself – a process that I will address at greater length in my final post. However, as has been noted by David Smith, Ryan Cordell, Elizabeth Dillon and Charles Upchurch, this material, however kindly provided, is not always suitable for immediate analysis, owing to errors in the transliteration of images into machine-readable code. Nonetheless, reasonable corrections can be made through the use of dictionaries and replacement protocols – checking transcriptions against an appropriate dictionary, selecting unrecognized words, replacing commonly mis-transcribed characters and the checking the new word against the dictionary once more. Once sanitized to a reasonable degree, these texts can be stacked, or grouped, by textual similarity – that is, by the percentage of common nGrams – and then checked for consistency and change to determine likely pathways of dissemination.

Where machine-readable content exists, but is not readily available to researchers (as is the case with most commercial newspaper archives), alternative tactics must be used. One of the most straightforward, and immediately profitable, is that used by media historian Bob Nicholson in his delightful (honestly, go and read it) article on the dissemination of American jokes in British periodicals. Using full-text searching, a researcher can hunt for, rather than gather, relevant reprints by crafting a series of relevant search phrases – and crossing their fingers. Again, those working on very particular topics, a key event or literary text, can develop case-study networks that, when combined with title-wide maps or a plethora of other small-scale clusters, can yield impressive results. Likewise, commonly used textual indicators, such as ‘from the London Examiner’ can be themselves searched for and catalogued.

Thus, by combining the results of meticulous, chronological examinations, surface-level catalogues, and digitised nGram hunts, a community of researchers could, in theory, develop a multi-layer, multi-modal network diagram.

But how do we even begin to gather these far-flung resources together, and how do we knit them together once they appear. Twenty years ago, such a task would appear too daunting to even ponder, but, next week, all shall be revealed.