SHARE

Thursday, January 15, 2009

Article download now available!

Since the public launch of BHL in Feb 2008, the BHL Technical development team has received repeated requests for an interface that would allow users to download a PDF for an individual article within one of the digitized books in BHL. This is actually a fairly challenging task, as previously reported, but with the right technology and a little bit of luck we've devised a solution that is working very well in production and is receiving positive feedback. Here's how it works.

To download a PDF of this article, hold your mouse over the "Download/About this book" link and click "Select pages to download". From the resulting page, check the boxes next to pages 175 through 197, then click the "Next" button.

From here we ask you to do a little optional data entry by adding the article's title and author(s). We don't require this, but if you take the time to fill out this information we'll hold onto it and index it so that other users will be able to find the article in future searches. In this way your work will benefit the wider community of BHL users.

After clicking "Submit", your job will get added to our queue and you'll get an e-mail notification that we've received your request. And then the tech-fun begins! We use an open source application called iText to generate the PDF by passing off URLs to the JPEG2000 image for each page, stored on Internet Archive's servers. iText converts that series of pages into a single PDF and writes the file out to BHL servers. Depending on server & network load, and the size of each article, this process can take anywhere from a few seconds to several minutes. Once complete, you'll then receive another e-mail notifying you that your PDF is available for download. For the request above, you would receive the following:

We include a cover sheet in the PDF that lists value-added information like bibliographic metadata about the title and volume, as well as attribution for the library that contributed the volume and the organization that sponsored its digitization. We also include the OCR text for the article (actually it's a selection you can make when choosing pages to include; the PDF above has the text included).

This feature is in production now, but it's still new and needs refinement, so we encourage users to try it out and provide feedback. We'll continue to improve the functionality based on requests and suggestions from users over time. Please either leave your comments below or submit them to our Feedback form.

6 comments:

Seeing as you're sticking a cover page on the article and it's being generated anyway, would it be too much bother to embed XMP metadata in the article so that reference managing software like Jabref and Papers can parse this information automatically from the article? I know it's a tall order but just to stick it on the radar somewhere.

I see that you mention on your site that you'll support openurl linking... [http://www.biodiversitylibrary.org/Tools.aspx] any idea of a timeline, and how do we get to add you items to our knowledgebase so we can implement the linking?

Any further information or an API would be appreciated.

On a related note, I've tried finding information about how to access the information (using a machine API) for Open Content Alliance material, and have found the OAI-PMH interface for the Internet Archive, but can you give an further info on this too?

FOLLOW BHL

About BHL

The Biodiversity Heritage Library is a consortium of major natural history, botanical, and research libraries that cooperate to digitize and make accessible the literature of biodiversity held in their collections as a part of a global "biodiversity commons."Learn more.