JISC’s Historic Books: Searching EEBO, ECCO for meaning

This past fall JISC announced a new venture, the JISC eCollections, “a new community-owned content service for UK HE and FE institutions.” What might interest EMOB readers most is its Historic Books. This digital collection contains over 300,000 books from before 1800 and also makes over 65,000 19th-century first editions from the British Library available for the first time online. The entire corpus is accessible through institutional subscription and, most welcome, searchable over a single platform.

The pre-1800 material in the JISC Historic Books eCollection consists solely of ProQuest’s Early English Books Online (EEBO) and Gale’s Eighteenth Century Collections Online (ECCO) textbases, so some might wonder what this collection offers that is new for those working in the early modern period. One does not need to be in eCollections, for instance, to conduct searches simultaneously across both databases. Yet the Help page for the eCollections indicates that more than just the convenience of a single interface and platform is being offered:

JISC Historic Books uses meaning-based searching rather than traditional keyword searching, which is why you will notice you get different results to searching EEBO and ECCO on the publishers sites. Meaning-based searching enables you to find conceptual and contexual [sic] links betweeen [sic] related documents which aren’t possible using traditional keyword searching.

Besides returning traditional results, JISC Historic Books also delivers “meaning-based” concepts deemed relevant to the search in the form of a Concept Cloud:

The more prominent the word, the more relevant it is deemed to the search, and as the screenshot indicates, items in the cloud can be manipulated to narrow one’s search further.

Over the past three or four years (and maybe longer) I have been consistently struck by the transformations that traditional searches of ECCO, Burney, EEBO, as well as Google Books have had on the ways I think about searching, construct searches, and view my results. More specifically, these keyword searches, described here as traditional, were already encouraging me to view results in a more networked, contextual way and, as a consequence, to devise additional searches aimed at teasing out new potential relationships. The meaning-based search enabled by JISC’s mimas platform, of course, is offering something quite different, but I wonder how its use might cause rethinking of what it means to search and research.

It would be interesting to hear from EEBO and EECO users in the UK who have used JISC Historic Books, especially the differences between results obtained from searching using the JISC platform and those obtained by searching using the original publishers’ platform.

Thanks for this interesting update on JISC, Eleanor. In a recent talk about the Digital Public Library of America, Robert Darnton and John Palfrey talked about the need to be able to find items in a given database, and this suggests some creative approaches to searching for items in EEBO and ECCO.

It would be great, as you say, to hear from EEBO and ECCO users in the UK about how well this search function works.

Thanks, Dave. I did just see Kelly’s extremely useful illustration and was planning on cross-posting here as you have just done and also responding to her post on the Long 18th Century

While in many ways the JISC search platform merits Kelly’s designation as an “upgrade,” it also could be deemed a different way of searching. For as she also notes, the traditional publishers’ keyword search can deliver results not obtained using the JISC search platform: “I have found things on the Cengage ECCO that I could not find on JISC “—though she does qualify that these finds do not happen very often and typically result from using precise search terms.

While I am tempted to agree with Kelly ‘s claim that the JISC platform enables discovery and to add that the Cengage platform operates primarily for retrieval of known terms, experience leads me to resist this characterization as perhaps too simple. I have felt for a very long time that Cengage searches also promote discovery—though the process is governed by manual manipulation of search terms rather than being automated by the database’s ability to search contextually. That said, more than speed separates the kind of discovery I have experienced using the traditional Cengage platform and the discovery enabled by JISC’s use of the Autonomy IDOL (Intelligent Data Operating Layer). As an aside, it is telling that Autonomy incorporates “discovery” with “search” to create a compound phrase to describe its operation: “search and discovery.” One also needs to consider the perspectives and influences affecting the construction of an ontological-based platform—just as one should consider how tagging and structured metadata affect the results returned from databases built to perform traditional keyword searches.

Yesterday the founder and CEO of Autonomy Corporation, Dr. Mike Lynch wrote about his company’s new means of contextual searching JISC Historic books in a post titled, “Uncovering History’s Hidden Secrets,” for the Huffington UK. In this piece he remarks,

The technology is intelligent; it automatically understands concepts within data, so if you’re looking for something specific, it will automatically draw your attention to other appropriate and relevant content that you might otherwise have missed. This highlights how meaning-based technology can be used to unlock some of our country’s most valuable assets — its literature and the “unknown unknown” which are yet to be discovered

On one level the notion of being “automatically draw[n]…to other appropriate and relevant content that you might have missed” extends extremely exciting possibilities—as indeed Kelly’s example indicates. Yet, on another the phrase “automatically drawn” also raises questions about what is determining and directing these otherwise missed connections. Semantic-based searching was first adopted by the commercial sector, especially to obtain data that would assist in marketing, and its potential for scholarship has only recently been tapped. It seems helpful to consider how this semantic-based operates: in other words what is the governing basis for the conceptualization?

As for why this might matter, consider the extended example that George Lakoff and Mark Johnson offer in Metaphors We Live By of cultural differences in conceptualizing:

An Iranian student, shortly after his arrival at Berkeley, took a seminar on metaphor from one of us. Among the wondrous things that he found in Berkeley was an expression that he heard over and over and understood as a beautifully sane metaphor. The expression was “the solution of my problems”—which he took to be a large volume of liquid, bubbling and smoking, containing all of your problems, either dissolved or in the form of precipitates, with catalysts constantly dissolving some problems (for the time being) and precipitating out others. He was terribly disillusioned to find that the residents of Berkeley had no such chemical metaphor in mind. And well he might be, for the chemical metaphor is both beautiful and insightful. It gives us a view of problems as things that never disappear utterly and that cannot be solved once and for all. All of your problems are always present, only they may be dissolved and in solution, or they may be in solid form. The best you can hope for is to find a catalyst that will make one problem precipitate out. And since you do not have complete control over what goes into the solution, you are constantly finding old and new problems precipitating out and present problems dissolving, partly because of your efforts and partly despite anything you do. (143-4)

.

Semantic searching is certainly affording the means to discover “the unknown unknown”, but it also behooves us to know more about its operation. Although he is referring to other applications of meaning-based searching, Dr. Lynch asserts in his penultimate paragraph: “By understanding meaning in information, the possibility of having a more predictable world seems within reach.” This turn to predictability alongside the lack of clarity over who or what is understanding “meaning in information” should give pause.

Please don’t feel compelled to take my language too literally–it was written rather quickly with the goal of providing an overview, not a theoretical analysis, and thus my word choice may not be as precise as it could!

That aside, let us consider my use of ‘upgrade’, for perhaps subconsciously I did choose that word with precision. In its simplest form, I suppose it is, by definition, an upgrade. However, I had my (unbiased, non-academic) spouse tinker with both and he found Cengage’s ECCO easier to use–particularly that it was quicker. In some regards, Cengage provides an outlet in which we can determine the validity of our searches (through their success or failure), and therefore the validity of our way of thinking or perhaps the research itself. By constantly providing suggestions, JISC furthers this notion that all research is good research, and all trains of thought should be pursued. That may not always be the case (I feel a dissertation on Jonathan Swift and kittens may fall along this line!). My husband was irritated at having to specify his search more and more to get what he wanted from JISC, whereas with Cengage he knew immediately if it worked or not.

Your critique of the phrase ‘automatically drawn’ illustrates this dilemma: to what point do we allow ourselves to sit on ‘autopilot’, not questioning and deconstructing our though processes? Searches like these, while beneficial for those moments where I’m trying to find all the ephemeral pieces surrounding satiric discourse (!), are also problematic by reaffirming our increasing tendency to allow technology to do the work for us; we demand, now more than ever, instant gratification. If I were to compare the two ECCOs to things an advisor may say, I imagine they would sound like this: A JISC search tells me, ‘Good job! But did you think of this, and this, and this?’ A Cengage search (at least in my experience) tells me ‘Nope. Think harder, and try again.’ A bit anecdotal, perhaps, but depicts how the different approaches not only work, but how they may resonate with a user. Moreover, it alerts us to a key problem; that is, how much do we really know, think, and process, when we’re always clued in to the answers? And, if we think we are clued in to these answers, will we ever try to think beyond them? Beyond the suggestions given?

On another note, many thanks for pointing out Cengage’s in-text search feature. To be honest, I rarely use Cengage anymore (which has more to do with it being easier to VPN to my campus databases than log-in to Uni. of London Library’s databases!), and so had completely missed it.

Many thanks for this additional material—especially your spouse’s comparison of his experience using the Cengage and the JISC search platforms. As your comments also suggest, “autopilot” does seem to be the mode that extended use of these databases can foster at times. I have become increasingly interested in meta-issues surrounding databases and their constructions as well as the possibility that we have begun to naturalize the search process. These interests shaped my response (and in fact inspired the original post).

Also, I didn’t mean to pounce on “upgrade”—and if it seemed that way, I also will invoke haste in replying! Besides intending to suggest difference as opposed to improvement (though I can easily see why “improvement” could be an accurate description despite not having been able to try the JISC search), I also was writing in response to jlmg’s comment about having fought so hard to get “basic ECCO.” I can understand her dismay, but I also wanted her to know that the ECCO her institution has represents the only form of ECCO currently supplied by Cengage. The version your institution has, JISC Historic Books, is due to the work of JISC and its partnerships and only open to institutions in the UK.

Let me thank you again for the wonderful, detailed explanation you provided of JISC searching and the many screen shots that helped clarify. JISC had plans to do a video, but you seem to have beat them to it!

I work in the JISC Historic Books support team at Mimas and we’re really pleased to see your posts and comments about our new platform.

It only launched in August 2011 and we’re keen to gather feedback from the academic community on it. We’re especially interested in your thoughts on the relevance of the meaning-based results and whether you’ve been able to find related materials with it that you may not have known about otherwise.

Your search examples, comparisons with the Cengage platform and the questions you’ve raised about searching in general are extremely useful to help us understand how the resource will be used by the community and to ensure its development meets your needs.

Kelly, we wondered if you would be willing to complete a case study for us? We have a template on our website: http://ecollections.mimas.ac.uk/books/impact.html. All your comments here, and your post with search examples, would be perfect. We’re more than happy to receive both good and bad feedback as we’re interested in your honest opinion and experience. We simply want to understand what you need from the resource and what your experience of using it is.

Just to let you know, Eleanor, we’re currently editing the short how-to video so it will be available soon!

Many thanks, Susan, for your response. Since its inception, EMOB has advocated for better communication among all of those involved in digital resources. So we are pleased to have you add your voice to the conversation. Better exchange between scholars using these tools and those who create, design, and construct them can only lead to improvements.

We are looking forward to the video; when it is ready, we will post a link. Also, has JISC offered any of its products outside the UK? In other words, are there any plans to offer JISC Historic Books to universities in the US (or Europe, Asia, etc.)? I do know that JISC has partnered at times with US entities on certain projects, but JISC seems to operate primarily for UK institutions of higher education.

Hi Eleanor, the discussion that you, Kelly and colleagues have been having is absolutely fascinating – especially around the ‘auto-pilot’ and how this impacts research / search behaviours. There really has been a mixed reception to JISC Historic Books, with some preferring the Cengage and Proquest platforms and their search methodology and some preferring the ‘discovery’ search of JISC Historic Books. We also wonder whether there will be a difference between undergrads and researchers / post grads and how they make use of it. As you say though, it is important that users understand the technology underneath JISC Historic Books and we will be working to embed this knowledge.

You ask if JISC Historic Books (part of JISC eCollections) will be made available to institutions outside the UK HE and FE market. Currently this is not allowed under the licensing arrangements that we have with ProQuest, Cengage and the British Library and in addition, we purchased the content in perpetuity using JISC funding which comes from the UK higher and further education funding councils. However, we are always open to exploring new licensing arrangements and can raise it with the publishers. You may also be interested to know that we are commencing discussions with Proquest and Cengage how we might collaborate on an international project to share crowd-sourced corrections of OCR across all the platforms that have ECCO and EEBO content – an initiative led by Laura Mandell as part of 18th Century Connect and her TypeWright tool: http://www.18thconnect.org/typewright/documents

I hope that it is ok with you all if I mention your discussions in a news item for the JISC eCollections website and share it with the JISC Historic Books Advisory Board, I’m sure they would find it of interest too.

Kelly, we look forward to your case study and perhaps you would be interested in joining the JISC Historic Books Advisory Board? We would love to have you 🙂

You may certainly mention our discussion in news items for JISC eCollections.

Your response to my query about the potential availability of JISC Historic Books outside the UK markets is in accord with my understanding of JISC’s funding and mission. It is interesting to know about the potential international project tied to Laura’s 18thConnect. It’s collaborations like these that made me wonder about the possible availability of other JISC efforts.

I’d be interested in learning more about JISC’s work with 18thConnect, especially in light of Kelly’s discussion of two different search outcomes for ECCO. Will there be similar differences in those using 18thConnect through JISC and those outside of it?

Caren or Laura could tell us more, but I believe that what is being partnered with 18thConnect is the manual correction of the OCR. While that will help return more reliable results, it will not affect the search operations. I may well be misunderstanding, though.

[…] the last couple of weeks a really interesting discussion has been taking place on the Early Modern Online Bibliography blog run by Eleanor Shevlin (West Chester University of Pennsylvania) and Anna Battigelli (SUNY […]

Cross-posted from long 18th:
Ever since Eleanor’s post on JISC a while back, I have been wanting to know more about how JISC’s word cloud gets assembled. Knowing that will help us understand more fully just how “relevant” JISC’s “relevant” terms are. It would be great if someone from JISC could explain this more fully. Thanks Kelly and Eleanor for very interesting comments!

Knowing more about the word cloud’s ontology would be extremely useful as would an understanding of the overall operation of search returns. In some ways these questions may seem to be asking for proprietary information about Mimas, and that’s not the case at all. Rather, we are asking about the intellectual basis for the coding that produces the contextual results.

It should also be noted that the semantic-based platform returns contextual results generally–and the cloud is just one, highly visual way of seeing results.

Finally, although I have not used JISC Historic books yet, I have often characterized my experiences using the traditional platforms for EEBO and EECO as discovery searches. As I mentioned either here or on the Long Eighteenth century blog, I do recognize that JISC Historic books platform no doubt provides its own form of discovery search results–ones in fact that may be richer than ones obtained through one’s individually-forged connections and re-envisioning of search terms based on results. It might be worth mention once again the comment by Kevin Kelly of Wired magazine, “What search uncovers is not just keywords but also the inherent value of connection…Search opens up creations. …As a song, movie, novel or poem is searched, the potential connections it radiates seep into society in a much deeper way than the simple publication of a duplicated copy ever could” (Kevin Kelly, “Scan this Book!” New York Times, 14 May 2006).

I originally had thought ‘relevance’ had more to do with ‘prevalence’, but as I went back and searched ‘Drapier’ again, I realised size and prevalence are not correlated. Some of the smaller phrases on the cloud (the French ones, for example) are smaller yet yield a greater number of results when clicked than the larger-sized, Swiftian examples.

JISC’s FAQ lists this:
“The concept cloud is a conceptual breakdown of the results, providing a list of conceptual suggestions to refine your query by. The concepts are extracted on-the-fly each time a search is run, typically by sampling the first 30-100 pages of the results set. It’s not a ‘tag’ cloud.

These concepts are names and phrases which are considered to be particularly relevant to the documents in which they reside, with the more prominent concepts appearing in a larger, bolder type.”

I tried doing a really broad search and tinkering with search-result order (e.g. date, relevancy) to see if it would change the cloud results, but it did not.

My only thought, then, is that they scan the prevalence of the phrases INSIDE the text as well. But of course, this is pure speculation. 😀

[…] and spread that occur when the aim of retrieval replaces the goal of discovery. (Given earlier EMOB discussions of semantic- or meaning-based searches, it should be noted that Blake was referring to the ways […]