Posts Tagged ‘keyword searching’

In an earlier EMOB post, “Digital Humanities and the Archives I: Economics and Sustainability”, we discussed the varied connotations that the term “sustainability” evokes. Yet the concept of “archives” also engenders a multiplicity of meanings as does the word “database.” In some circles “archive” and “database” are used interchangeably, while for others the terms signal distinctions between the past and the present. As Marlene Manoff has observed,

When scholars outside library and archival science use the word “archive” or when those outside information technology fields use the word “database,” they almost always mean something broader and more ambiguous than experts in these fields using those same words. The disciplinary boundaries within which these terms have been contained are eroding. Scholars use the terms metaphorically, appropriating them from the professional experts. (Manoff, “Archive and Database as Metaphor: Theorizing the Historical Record.” portal: Libraries and the Academy, 10.4 [2010], 385)

The submissions for the “Digital Humanities and the Archives” roundtable at ASECS 2012 attest to the varied meanings scholars ascribe to “archive” as a digital entity. While some proposals viewed commercial textbases such as ECCO or EEBO as archives, others considered non-commercial digital projects (some of which were designed to perform additional roles beyond being a repository), as falling under the “archival” designation. Still others proposed topics that were not tied to specific digital collections or projects. Reflecting this diversity, the selected presentations featured two papers on the nature of searching within digital environments (Randall Cream, West Chester Univ., and Bill Blake, New York Univ.), another on the coding issues encountered in building a performance history database (Mike Gavin, Rice University; University of South Carolina, Fall 2012), a fourth on the potential evidence that can be derived from negative results (Sayre Greenfield, Univ. of Pittsburgh, Greensburg), and the last on a digital archive aimed at facilitating exchange between scholars facilitating exchange between scholars and those outside the academy (Jessica Richard, Wake Forest Univ.). In his post on the many Digital Humanities sessions at ASECS, Stephen Gregg offers a fine overview of this roundtable, so the following comments supplement his summary. In addition, they serve as a springboard for discussing digitization’s broader “archival effects,” a term coined by Marlene Manoff to “suggest the ways in which digital media bring the past into the present” (386).

Contrasting the old and the new, Randall Cream noted that unlike traditional archives whose contents are not always fully known, digital archives and databases afford more certainty because their creation involves detailed and defining–an encyclopedic naming of their various parts. For Cream, this difference has also meant that searching the digital archives lacks the serendipitous discovery that scholars often experience when working in brick-and-mortar archives. He suggested concept-linked searching as a possible means of fostering chance discoveries within digital environments, a suggestion that provided a fitting segue to Bill Blake’s talk on crafting more effective digital searches. Blake argued for thinking beyond topical keyword searches aimed solely at retrieval. Instead, he called for adopting more quality, conceptually-based searches that will yield better results; such searches will counter the drift and spread that occur when the aim of retrieval replaces the goal of discovery. (Given earlier EMOB discussions of semantic- or meaning-based searches, it should be noted that Blake was referring to the ways users select and fashion search terms and not to the new search platforms that enable semantic or meaning-based searching such as Mimas used in JISC’s Historic Books collection.)

Cream’s and Blake’s remarks point to what could be termed a remediation of research practices as print and digital interact, and both their talks highlighted searching as perhaps one of the most significant reconfigured practices. And indeed the concept of searching has undergone major reformulations in the digital environment. While accessibility and quickness of obtaining results are often seen as digital archives’ main advantage over print, a key benefit of digital collections resides in their enabling users to traverse immense areas of texts multi-directionally. Put another way, what seems radically different about searching in the digital world is not merely unprecedented access and speed, but rather the ways one can alter search strategies instantaneously, shifting not only the search terms employed at a moment’s notice but also the temporal and spatial coordinates in which those terms are placed. This capability expands the ways we are approaching the search as a strategy, opening up new conceptualizations even as we retain the habits and training we acquired working with print. As Wired magazine’s Kevin Kelly has observed: “What search uncovers is not just keywords but also the inherent value of connection…Search opens up creations. …As a song, movie, novel or poem is searched, the potential connections it radiates seep into society in a much deeper way than the simple publication of a duplicated copy ever could” (Kevin Kelly, “Scan this Book!” New York Times, 14 May 2006).

The searching enabled within digital archives reorients our thinking about what constitutes relevant information and exposes the kinds of connectivity that we would likely miss or overlook working with print and manuscript in traditional environments. This reorientation, moreover, possesses its own opportunities for serendipity. While serendipitous discoveries made when working in a traditional archive or even browsing in the stacks typically occur within a bounded space and a pre-selected range of call numbers, digital archives and databases enable virtual movement throughout their holdings to uncover relevant but unforeseen connections not bounded by categories of expectations. In short, capable of serving as far more than text delivery systems and repositories, these digital archives and databases function as “discovery aids.” Fostering a culture of connectivity, these intellectual laboratories of sorts can provide access not only to individual titles but also to a larger, dynamic field of textual and sociocultural activity.

Sayre Greenfield’s paper demonstrated the kind of discoveries that this rethinking of relevant information can yield. Noting that assessing negative findings requires caution, Greenfield explored the ways in which a lack of search results—negative evidence—can translate into meaningful information and concluded that “absences are most useful when measured against positive results found elsewhere, in different genres or different periods.” In offering examples of the different hits obtained from performing the same search in ECCO and Burney, he drew attention to the importance of knowing the scope of a given database and the value of working across databases.

Mike Gavin’s paper also underscored the importance of understanding the operation of digital archives and the rethinking that such understanding can prompt. As Gavin recounted, creating a digital archive of dramatic works that incorporates their performance history has necessitated adapting TEI coding to facilitate searching. While his comments reflect the perspective of those constructing the archive, they also hold significance for users of digital archives. The tagging examples he provided illustrate the significant intellectual labor that goes into the creation of digital databases and archives; encoding a document, after all, is an interpretive practice requiring careful thought and subject expertise. His illustrations are a cogent reminder that the archives–whether traditional or digital–are never neutral but always are rooted in the views and principles of their creators. In the case of digital archives or databases, users benefit from being cognizant of their “constructedness.” Having an awareness of a digital archive’s creators, the circumstances surrounding its creation, the quality of its metadata, and the idiosyncrasies of its search engine will almost certainly enhance a user’s search process and, in some cases, even his or her analysis of results. Unfortunately, it is not always possible to uncover such details about digital archives and databases. Plus, even when there is transparency and one can familiarize oneself with a digital archive’s encoding principles and information architecture, the tagging can still limit the what results searches return. On a different note, it seems worth mentioning that the tasks of coding and organizing the contents of a traditional archive will, in turn, often enrich knowledge of its physical material. And this physical material remains important, for the digital and the material are not one and the same.

Unlike the first four papers that focused on either existing archives or ones nearing completion, Jessica Richard’s paper dealt with the early planning stages of a digital project. The incarnation for the project was a desire to foster exchange between eighteenth-century science studies scholars and a non-academic readership; creating a web-based site seems an ideal medium for the public-humanities thrust of this project. Notwithstanding its differences from the other talks, Richard’s topic very much reflects how the digital is transforming our traditional conceptions of archives. The project’s rethinking of audience, attention to wide access, and desire to translate scholarship for an interested general public all exemplify aspects of this transformation.

As these five talks illustrated, digital media are transforming our theoretical conceptions of “archives”; creating new paradigms and inspiring shifts in existing models as the digital and traditional archival cultures interact; and shaping the kinds of archival projects being undertaken, the methodologies used, and the types of research questions posed. Early in her essay Manoff suggests that “our current moment reflects the convergence of two phenomena–new technical capacities and an age-old impulse to gather and preserve. The ease of capturing digital data is an incitement to archive” (386). In light of the linguistic history of “archive,” connections between new technical capacities and the desire to collect and preserve have perhaps an even longer history. The word “archive” does not appear until after the invention of hand-press printing. While its use as a noun to denote either a historical document that is preserved or the place in which such documents are kept dates from the late 1630s/early 1640s, its verbal form–to archive–does not enter the lexicon until the twentieth century. Whether coincidence or not, this verb does not gain wide currency until the 1980s, a timing that corresponds with the growth in the use of computers and related technologies. In the past two decades the extensive adoption of digital technologies has dramatically spurred efforts to assemble large-scale collections of visual, verbal, and even oral materials and make them virtually available, either freely or commercially.

For Manoff, metaphorical appropriations of “archive” are not only useful for theorizing the ever-increasing growth of these collections but also for theorizing the digital in terms of its archival effects on our conceptions of history and the cultural record (385-6). As Manoff observes at the close of her essay, “archive” especially lends itself to such theorizing because the concept “carries within it both the ideal of preserving collective memory and the reality of its impossibility” (396). The musings about traditional and digital archives presented here touch upon only a few of the archival effects that digital transformations are exercising on our research practices and broader relationships with the history and knowledge. I hope others will add their thoughts about these changes and the explanatory power of “archive” to address our cultural moment.

This past fall JISC announced a new venture, the JISC eCollections, “a new community-owned content service for UK HE and FE institutions.” What might interest EMOB readers most is its Historic Books. This digital collection contains over 300,000 books from before 1800 and also makes over 65,000 19th-century first editions from the British Library available for the first time online. The entire corpus is accessible through institutional subscription and, most welcome, searchable over a single platform.

The pre-1800 material in the JISC Historic Books eCollection consists solely of ProQuest’s Early English Books Online (EEBO) and Gale’s Eighteenth Century Collections Online (ECCO) textbases, so some might wonder what this collection offers that is new for those working in the early modern period. One does not need to be in eCollections, for instance, to conduct searches simultaneously across both databases. Yet the Help page for the eCollections indicates that more than just the convenience of a single interface and platform is being offered:

JISC Historic Books uses meaning-based searching rather than traditional keyword searching, which is why you will notice you get different results to searching EEBO and ECCO on the publishers sites. Meaning-based searching enables you to find conceptual and contexual [sic] links betweeen [sic] related documents which aren’t possible using traditional keyword searching.

Besides returning traditional results, JISC Historic Books also delivers “meaning-based” concepts deemed relevant to the search in the form of a Concept Cloud:

The more prominent the word, the more relevant it is deemed to the search, and as the screenshot indicates, items in the cloud can be manipulated to narrow one’s search further.

Over the past three or four years (and maybe longer) I have been consistently struck by the transformations that traditional searches of ECCO, Burney, EEBO, as well as Google Books have had on the ways I think about searching, construct searches, and view my results. More specifically, these keyword searches, described here as traditional, were already encouraging me to view results in a more networked, contextual way and, as a consequence, to devise additional searches aimed at teasing out new potential relationships. The meaning-based search enabled by JISC’s mimas platform, of course, is offering something quite different, but I wonder how its use might cause rethinking of what it means to search and research.

It would be interesting to hear from EEBO and EECO users in the UK who have used JISC Historic Books, especially the differences between results obtained from searching using the JISC platform and those obtained by searching using the original publishers’ platform.