Monthly Archives: September 2011

Discuss either a) which module you decided to try to try from assignment 2 and how it enhances your collection; include if you like any problems or tips related to installation; or
b) now that you have some experience, how you feel overall about the suitability of Drupal for your collection.

It is clear that Drupal, in the hands of a trained Drupal programmer, would be a powerful and customized tool that could be used to manage my digital collection; although it seems that it is not really designed for the type of content I would like to include: many large searchable text files (in pdf or other formats, especially including files with specialized markup). When I say that it is not really designed for it, I mean that the native content types don’t lend themselves to it (although I have not experimented with the “book” type). Of course there are many modules that add that type of functionality; I saw several that seemed designed to make RDF-type relations between nodes; but I was too intimidated by all the dependencies to try to install such modules, and the help material was too highly technical for a casual Drupal user to understand.

I did find an apparently simple module that added some necessary functionality to my site, i.e., the ability to search attached text files. The module is called, appropriately, search-files.Here is a screenshot of the kind of output the module produces:

Because this is a crucial function for my collection, I decided to install it, even though it requires several “helper applications” in Linux.

Helper Applications

In order to extract text, this module calls ‘helper apps’ such as cat and pdftotext. Drupal administrators can configure any helpers they like. Helper apps need to be installed on the server and need to be setup to print to stdout.

I assumed that my Linux installation might already have these applications available, although I could enable them separately if need be. So I downloaded and installed search_files-6.x-1.6.

I had no difficulty installing it or configuring it in Drupal. But it can’t search the pdf files I have attached, so I’m assuming I also need to install the helper applications in Linux.

UPDATE: as it turns out, this module worked in Drupal 5 but is broken in Drupal 6. Evidently it works in Drupal 7, so hopefully when I update my system I can get this working. Else I will need to find a different CMS, because this search functionality is crucial.

This week, you might choose to comment on how suitable Drupal might be for your collection. Begin to develop some criteria you would use to judge how well an application such as Drupal meets the needs of your collection and its users. We will expand on this problem over the semester.

We have been reading about the need for humanities scholars to be able to use a digital collection with a degree of confidence about the nature and authority of the relations between objects, yet having the structure of those relations clear so that the information added is objective rather than subjective. What I would really like to make is a database or collection or semantic web of all the texts (with attached full-text) that George Eliot read or interacted with, with some degree of confidence added in about how influential those texts were. One could argue that there is a sort of taxonomy to how much she interacted with a text, in ascending order from hearing it read aloud, to reading it in translation, to reading it herself in the original language, to reviewing it, to editing it, to translating it from another language into English. These are all types of relations with a text. One can also argue that reading it more than once, or attesting to its influence in letters or in research notebooks, is also a measure of influence. I was reading about RDF, and that seems exactly the sort of inferential structure I want to be able to capture, starting with the simplest: What she read, with some sort of statement about her relation to the text, and a documentary page showing the authority for that relation. One can infer the direction of influence between texts according to who read what and when.

Because eventually I would want this to be part of a larger database of “Literary intelocutors,” I’m having trouble figuring out if the key entity in this collection is texts or a person. The way I envision the normalized tables in a database would be a table of persons, a table of texts, and a table of links between the two, in the form of “GE read Rousseau’s Les Confessions, in French, in 1834, according to these authorities, and here is a link to that edition of Les Confessions in French (or perhaps a digital image), plus a searchable English translation.” I have been thinking that I needed to include all the standard metadata for each text in each entry, but that seems a waste of space. The new and useful information to be collected is the table of links, so all I really need to capture is what I have underlined; Each underlined phrase is a field in my collection.

Any content management system I use for my collection will need to be able to search and manage large attached text files in a variety of formats, to query the collection of these files with a full-text search, and have a faceted search that narrows the query results by type of relation, by subject, by language, by year, or type of text file. I also want to be able to widen the search if necessary, though, across subjects, dates, etc. The idea is to be able to use this collection to specify a group of texts to search, and to be able to document the relationships and direction of influence between them. I would love to be able to actually graph the connected nodes in some sort of network display and to assess the degree of influence.

These studies about the online searching behavior of humanities scholars versus science scholars are important information that I need to take into account when designing my database. The notes and bibliographic entries to this article are fabulous signposts to other work about the digital behavior of humanities scholar.

As one who has taught science and engineering literature and has some sense of the culture and values of the sciences, I was struck by the distinctive characteristics of the humanities’ culture, values, and expectations, as against those of the sciences, as they appeared in our study.

I comment on these differences in Reports #1, 4, and 5, in particular. Much of the database world has been developed on the science model. In Report #4, I drew implications from the study for the design of databases and other information resources for humanities scholars. Librarians may find the points in Report #4 of value for their future collection development in reference departments or main collections.

In general, these data show:

The logical, engineering-oriented design of online systems is generally not well matched with the talents of the humanities scholar.

The character of humanities search terms varies considerably from that of the sciences. Humanities thesauri should probably be designed on different principles from conventional thesauri, and humanities search interfaces should be designed differently as well.

Here is the link to the proposed American Studies Information Community at the University of Virgina that sounds very much like the kind of portal I would like to develop for George Eliot studies.

An Information Community is a group of scholars, students, researchers, librarians, information specialists and citizens from similar or dissimilar fields, whose common link is a shared information need. This information need can be oriented around a subject, a field, a methodology, or a data type. The information can include text, data, digitized media, images, and formal and informal scholarly exchanges of ideas. Information Communities exist as a medium for bringing people together and making them aware of opportunities and resources. Community is fostered by personal communication, shared interests, shared research materials, shared tools, and shared standards. Information Communities add value to information, and offer opportunities for using information in new and different ways. Activities of the community can include creation of web-based materials, development of portable tools for enhancing access to the materials, and managing of conferences and publications. Information Communities foster innovation and spark new areas of research, and usually result in a tangible body of knowledge for consumers.

I was excited to read this article for class, because it describes the kind of collaborative digital environment I want to create for George Eliot studies. Now I have some models to examine, and some protocols to follow!

As an example of a specialized service, the University of Virginia’s proposed American Studies Information Community will draw on harvesting protocols to bring together disparate types of information (text, data, media, images) for a community, defined as a group of scholars, students, researchers, librarians, information specialists, and citizens with a common interest in a particular thematic area. The project is being undertaken collaboratively with other institutions and content providers (e.g., Thomas Jefferson Foundation, Virginia Tech University, and the Smithsonian National Museum of American Art). The University of Virginia describes these information communities as “learning and teaching environments in which subject-driven websites are developed around print and digital versions of our collections and the teaching interests of our faculty members . . . Information communities will foster interdisciplinary and collaborative research and publication amongst scholars with common interests.”2

This access model is interesting because it reflects several trends that are also evident in the broader landscape. The new service will take advantage of a distributed collection model and a range of partners. The descriptive techniques will reflect enhanced attributes appropriate to the subject area and the diverse formats in the distribut ed collections. Analytic tools will be incorporated to add value to the content and to stimulate collaboration. Perhaps most significant, the access system is explicitly designed to serve a social role as a catalyst for an interdisciplinary community—a far more intrusive role than is provision of access alone.