Category Archives: ePrints

I have learned so much this semester, it’s hard to know where to begin. But I guess I need to begin with metadata and taxonomy. Early in the semester I posted that “I read some articles this week that made me realize that much of what added value librarians provide to collections is in the form of metadata. I guess I always thought of librarians mainly as reference librarians or subject specialists — not as experts in classifying and indexing information.” But now I know (a little) about the value of good metadata, and taxonomies, and I’ve learned about metadata standards such as Dublin Core, that help to standardize metadata usage over the entire web. I’ve learned about the semantic web, and the idea of linking data and building ontologies that describe the relations between concepts. I’ve learned about the contrasting benefits of controlled vocabularies versus “folksonomy” (i.e., tagging). And I’ve learned a little about harvesting metadata, using PKPHarvester to harvest metadata from several databases and data providers.

I’ve also learned more about the Open Access movement, and open access initiatives; and about the issues of “freeing” information from behind paywalls. The main obstacle to this is that knowledge (and its associated data) is a currency that has value, and that making it freely available will necessitate basic structural changes in academia and in academic publishing.

Those structural changes include major changes in the role of the library and librarians in the production and preservation of knowledge. These changes present sigificant challenges to libraries in managing, curating and preserving digital materials and data. Librarians are increasingly expected to have the technical skills to design and select Content Management Systems for their libraries, to design, create, and maintain digital collections and digital repositories, and to train other librarians to do the same, often with limited technical staff and limited budgets. Open Source software is a boon to the small library or non-profit or museum that needs these types of functionality; but again that requires technical knowledge and skill on the part of the librarian to install, configure, and maintain operating systems and small in-house servers.

In order to gain those technical skills, I learned how to create several virtual machines with linux stacks of various sorts, and to install and configure four different content management/digital repository software systems (Drupal, DSpace, Eprints, and Omeka). I created a sample digital collection in each one, and used the experience to compare each system’s strengths and weaknesses, and then to decide which sorts of digital collections (and environments) each system is best suited for. I then chose which system to use to host my digital collection, set up the system, entered the records and created the metadata, and then wrote a paper on the process, which will contribute toward my digital portfolio. In my case, I decided upon Drupal as the system I want to use for my digital collection. Drupal has a steep learning curve, and I really learned a lot about Drupal in a short time through the process of designing my collection, downloading and installing extra modules to provide the functions I needed, and troubleshooting the installation. I’m proud of how well my prototype digital collection works, but I already have plans to keep working on the prototype to get it working even better, and to extend its functions, and to redesign certain features. I’m turning into a Drupal geek already.

Librarians are also expected to conduct outreach to their various communities, in order to make the services of the library more accessible and useful. This seems to be especially needful for the humanities scholar community. We read many articles about the obstacles that keep humanities scholars from embracing digital initiatives, and from using digital resources (and those articles confirmed my own observations). We learned about how humanities scholarship, data, and workflows are vastly different than those of the scientific community; identified some of the obstacles that prevent humanities scholars from using (or producing) digital resources, including digital repositories; and read about several digital humanities initiatives, both in the U.S. and in Europe.

I think one of the most enjoyable aspects of the course for me was just the chance to see so many different examples of digital collections; to interact with my fellow students over their collections and interests; and to explore what is already being done, what is possible and useful. A find that was very helpful to me was the UK Reading Experience Database (RED). This is a database that contains much of the same sorts of data that I wanted to collect in my own digital collection, so it gave me some assurance that I was on the right track with my ideas.

Finally, I have included a slideshow of some screen shots from my project.

I think the question here is really: what skills are really needed in a librarian working in digital collections? That is not easy to answer, because libraries vary so much in terms of staff size, budget, and training. I certainly feel much more confident about installing and configuring virtual machines (VMs) as a result of this course; but I wonder if it was the best use of class assignment time, especially since it could be so time-consuming. But if I was the sole librarian in a small non-profit or museum, with no technical staff, and I wanted to create and host a digital collection, the ability to create a VM from scratch would be important. I guess the question is, how common is this scenario, and how common will it be in the future? And what sorts of librarianship does the DigIN program want to support?

I don’t know if a middle ground might have worked better; maybe install only 3 VMS and spend more time on metadata and actually working with collections. I think DSpace and Eprints were sort of repetitive; maybe we could have been given a choice to install one or the other as our example of digital repository software. I think it was good to see Drupal because it is so ubiquitous, and to get an idea of the more technical end of the spectrum in digital collections management. And I think Omeka seems to represent the other end, the simple end of digital repository management.

A repository software package’s homepage ought to be attractive, clear, and inspire the users’ confidence. Some of these homepages do that better than others; but they are also geared toward very different audiences. In general I think each site is geared toward the user that could best benefit from it.

Eprints (http://www.eprints.org/): clearly states what it is, the interface is clean, and provides a live demo as well as links to documentation, downloads, and a description of the principles of open access.

Omeka (http://omeka.org/): again, the homepage is well-designed, attractive, and provides clear links to all the information a user could want. It seems geared especially to draw in the new or uninitiated user (i.e. me). The user could be an individual rather than an institution.

DSpace (http://www.dspace.org/): There is a lot of white space on this page. For some reason I find that intimidating. There is a very clear statement about what it is–if you know what an “institutional repository application” is. The logo at the top identifies it as a “scholar space” – This is definitely geared toward an institutional user/IT professional that already knows what an institutional repository is. This might be more confidence-building if you are an institutional administrator looking to find a turnkey application.

Drupal (http://drupal.org/): This is a very busy page. But the tag line: “Come for the Software, Stay for the Community” is catchy. The page goes out of its way to show you how world-wide its scope is; you can tell that the software is geared for IT professionals and developers; they even have announcements about DrupalCons (a very geeky term for conventions). This is definitely a geek community and that means that I am not the sort of user they are targeting.

PKP (Public Knowledge Project http://pkp.sfu.ca/): This site also has a lot of projects besides the harvester software. It takes a while exploring and reading to figure out what this site is and what it contains. It’s not for the casual user, and it seems to already assume that the user is committed to open source and open access projects.

JHove (http://hul.harvard.edu/jhove/): This is a site full of technical jargon, definitely targeting the technical user, not the casual user or the repository administrator.

I guess each site has its advantages for the type of user it is seeking. As a librarian or a non-profit or museum curator, I find the first 3 more attractive and accessible.

I found Eprints to be more difficult to install than either DSpace or Drupal. I had problems creating repositories, and had to try four times before I got my repository irls675 installed and configured correctly. I still don’t know what went wrong. I eventually fixed the problems I was having by restoring a snapshot of my VM taken right after I installed Eprints but before I configured any repositories, and then going from that point. Once I got the repository irls675 configured and once I could access it via Firefox, things went more smoothly.

Not perfectly smoothly, however. Changing the description of the repository went as described. When I tried changing the logo, I had problems. I tried both methods mentioned in the instructions Bruce gave us. The first method did not work, but the second one did, which involved editing the file to reference the new file name for the logo, and then restarting the apache server and rebuilding the repository. Then we were offered the choice whether to change the theme. I elected not to install the “glass” theme, since others indicated that it did not look very different. I also chose to go with the LOC subject classifications rather than build my own taxonomy.

I found adding records to be easy, although I would like the ability to customize the fields. I had to put some information in the abstract field that I would have liked to create some special fields for. I found the LOC classifications to be limiting.

At first, searches would not return any of my records, even though I could browse and see them. BUT after I re-indexed the repository, then I could search all the fields, including the full-text of the attached files. Success!!

I installed two programs from the ePrints Bazaar: the “Batch-edit eprints via Excel Export/Import – version 1.0.0″ package, and the comments and notes package. To my delight, I can now add comments and notes to my records. I can’t figure out how to use the other package (the batch edit).