As reported by the BBC. This time it’s the oldest known manuscript of the Bible, the Codex Sinaiticus, which has been in several pieces in different countries since its 19th-century discovery. The digital version is expected to be online in 2009, and will enable not just scholars but anyone to view the entire text together. The article indicates that a translation of the original Greek will also be available.

This brings up several issues. First, there is the fact that when historical texts like this are digitized, they are often in languages which are not familiar to many potential users. Even someone who speaks modern Greek is likely to have difficulty with a text in the fourth-century version of the language, which furthermore does not have the modern convention of space left between words. Thus although digitization does improve access, that alone is not sufficient for many.

Translations are therefore necessary in order to make these texts truly available for wide use. Then more questions are raised: Who will do the translation? Into what language(s)? Will the translations be copyrighted and who will own the copyright? For a text like the Bible, where the choice of wording for translation can have significant effects on interpretation and thereby affect religious understanding, the problem of who will translate and how is especially critical. In the case of the Codex Sinaiticus, which includes two additional books in the New Testament and which has other important textual differences from other manuscripts, this may be a major consideration.

The article doesn’t mention how the digital form will be made available – through a website, on a cd-rom, or what other means. Nor does it give any indication of image format, methods of searching/viewing, etc.; these may be things as yet in flux. It will be interesting to see how the digitization of the Codex is carried out and presented, and what effects the availability of this text has on Biblical studies and indeed on religion generally.

The Organization for Transformative Works (http://transformativeworks.org/) is a nonprofit organization created by fans to support the creating and distribution of fanfiction, fanart, and similar transformative works. The founders of OTW believe that current law does permit these sorts of works to be created and freely distributed. They have already established an online, international peer-reviewed scholarly journal, Transformative Works and Cultures, of which the first issue came out in September 2008.

Of interest with regard to collections is that OTW is also in the process of creating open-source archive software with which to host such works. OTW itself will have a multifandom archive, and the software will be available to others to use as well. They are slightly behind on their projected timeline (they were hoping for a public launch of the archive in August 2008, and it hasn’t happened yet), doubtless due to the fact that this is original software being developed by and for OTW rather than some out-of-the-box package that might not suit the specific needs of fan creators and their works.

Copyright is always an issue to be considered in creating digital archives, and the OTW holds the position that fanworks fall into the category of “fair use.” It will be interesting to see what happens once the archive is in place and fanfiction (and eventually fanart, fanvids, etc.) is made available through it. A reconsideration of what exactly copyright protects may be in order.

The John Rylands University Library at the University of Manchester has over 4600 images in its digital library, including several hundred images of papyri (http://rylibweb.man.ac.uk/insight/papyrus.htm). The largest numbers are written in Coptic and Greek, but there are also some Demotic and hieroglypic texts. The library plans to add more digitized materials over time, as this is only a fraction of their holdings (none of their Arabic-language papyri have yet been digitized, for instance).

Of interest is the fact that there are two methods for accessing the materials: a browser, and a downloadable client which provides greater functionality in searching and viewing. The client requires a username and password, but the page that explains the two options also provides a public username and password, so access is not restricted to University of Manchester users. This option strikes me as an excellent one, allowing individual users to choose what will be most effective for them.

The same page also gives overall copyright information on the collection (with a note that individual images may have different copyright restrictions). In general, private study and educational uses are permitted, although the latter must acknowledge the university, and boilerplate acknowledgment language is provided. Other uses require written permission and usually fees, and links to the request forms appear on the page.

I’m finding the differences between the ways that scholarly digital image collections are organized to be very interesting. The best of them have good searchable metadata, easy-to-use interfaces, and images that can be viewed in different resolutions. These are all obviously things to think about when creating or revamping such a collection.

The John Rylands Library is continuing to put additional rare and fragile manuscripts online (not just papyri). There’s a recent article from the Telegraph that indicates that a 14th-century recipe book is among the items to be digitally photographed and added to the collection in the next year. It’s really quite astonishing (and wonderful!) to see all of this work being done.

Oxford University has digital facsimiles of more than 80 medieval manuscripts scanned and online here: http://image.ox.ac.uk/. The images are copyrighted but personal research use is permitted.

The project of digitizing occurred in several phases. First a number of Celtic MSS were digitized, then additional medieval manuscripts deemed particularly valuable, useful, and/or fragile. These two phases were carried out with government funding. A server failure led to the takeover of the project by the Oxford University Library’s automation department, which also redesigned the website, and it now is controlled by the Oxford Digital Library.

During the site redesign, it was discovered that some images are missing from some of the MSS. A statement on the site indicates that the library is in the process of determining what is missing and what resources are needed to correct the problem.

Several potential issues with the creation of digital collection are thus highlighted. Funding may be temporary, and insufficient to digitize as much material as might be desired (by no means all of the medieval MSS held by Oxford colleges are included). If later problems are discovered, the funding may no longer be available to correct those problems. The technology may also fail, as happened with the server that originally housed this collection. This meant that the material was moved and now falls under the auspices of a different body.

The copyright restrictions on the images mean that although an individual may download a single copy of each for private personal use (they may also be displayed in an academic lecture), from another website only a URL linking to the image location may be used, not the image itself. This is a reasonable restriction, under copyright law, but if the image locations were again later to be changed, it would make access difficult. That’s merely something to be considered.

The descriptions of the MSS (i.e., the metadata) are quite limited and not really searchable; the MSS are listed by college and shelfmark, with brief descriptions in the browsing area and longer ones when you click through to a specific MS. Medievalists are used to such things, though, so it’s less of a limitation than would be the case for born-digital items.

This post is simply some musings about what collections are, and what’s necessary to make them valuable.

One fascinating thing about digital/online collections is how incredibly varied they can be. Text-based, still images, images of texts, sounds, videos… no one’s managed to capture and transmit touch or taste or smell, so far as I know, but sight has long been the primary sense upon which we rely for information transmission, and sound the secondary one. (I’m thinking long-term and long-distance here, as opposed to in-person communication.)

So the medium isn’t key to defining a collection, though a digital collection is by definition digitized in some manner.

To call something a collection does imply that a number of different items are included. How many? That can vary tremendously. Dozens? Hundreds? Thousands? Millions? Perhaps that doesn’t matter as much as the fact that once a collection has over perhaps fifty items, what becomes important is how to find a given desired item. Searching is key, especially when the searcher is not already familiar with the collection’s contents.

Any search relies on metadata of some sort. The conclusion I am reaching is that coming up with metadata categories, and then terms, is absolutely key to making collections of any significant size actually usable and useful.

All the information in the world is useless if it’s piled in a random heap.

The Disruptive Library Technology Jester has a link from here (http://dltj.org/article/jpeg2000-survey/) to a survey being carried out by David Lowe, Preservation Librarian at the University of Connecticut. Chapter 3 in the Lesk textbook is on “Images of Pages” and chapter 4 on “Multimedia Storage and Retrieval,” so people who are interested in collections of digital images may want to investigate this. The DLTJ post also has links below to some related posts/discussions of JPEG2000.

I’ve never been much of an image person myself (despite having taken art classes in high school and as an undergrad) and so my understanding of the different image file types is pretty minimal. I think this is probably an area I need to read up on, at least a little bit. Ideally digitized images would be faithful to the original, have a small file size, and bear metadata as part of the file – but I suspect it’s a case of “choose any two of the three”.

The Dead Sea Scrolls are going to be digitized! (Article in the Guardian.) This will be a great boon to ancient historians and religious historians. There’s been a tremendous amount of scholarly controversy (with political implications at times, even) since their discovery, and because of the extremely fragile nature of the fragments, very few people have been allowed to examine the originals. Digitizing them will involve multiple extremely high-res photographs using regular light, infrared, and multispectral cameras, which will actually be better than looking at them in person since it will enable the researchers to see ink that has faded to badly to be seen by the naked eye. Wow.

It will be about five years before the project is completed, but that’s a blink of the eye, historically. I’ll be interested to see how they structure the finding aids, and how readily available these images will be (will there be a fee, for instance?).

According to the first reading (Lee 2000), it would appear that one of my favorite sites to use with students who are writing research papers on medieval history does indeed qualify as a collection: Internet Medieval Sourcebook. This site collects many medieval texts, mostly short selections in translation, mostly from public-domain works and occasionally donated by contributors. The texts are organized thematically (with sidebar links to each of these general topics), and then either topically and/or chronologically within the theme. The collection is aimed at a user group of students studying this general topic, and indirectly at the instructors teaching them. Although the website resides on the servers of Fordham University, it is freely accessible to outsiders and also incorporates links to additional online sites and resources. One neat thing about this site is that it has been around since 1996 (so it’s positively ancient in web terms) and was created by someone who at that time was a graduate student in history, rather making it up as he went along. So in itself it’s kind of a historical artifact as a collection.

I don’t have a lot of experience finding online digital collections, so I decided I’d take a look at the library website for the University of Minnesota, where I did my PhD in history. I know that they have a number of special collections and I wondered if they’d have digitized some of them. I found out that there’s a really nifty collection of posters and postcards from WWI and WWII that has been put online, as a cooperative project between the university and the Minneapolis Public Library which also owns many of these items: “A Summons to Comradeship” – World War I and II Posters and Postcards. Each image is identified with a set of metadata, which can then be used to search the whole collection. One drawback I see is that there appears to be no way to simply browse at random except by doing a search for all the items. A search results in a page with thumbnails of the found items, each with a truncated title as caption, so a user can’t just flip through readable-sized images and see what might be interesting; you have to click on the thumbnails one at a time to get a larger image. Nevertheless this is a fabulous and well-organized resource.

Clearly historical collections are the ones that I’m most directly interested in. A friend pointed me at the Bethlehem Digital History Project, which has digitized images of primary source materials (texts, art, etc.) also transcriptions and translations, and some information on the context, of Bethlehem, Pennsylvania from 1741-1844. There are also some more modern items. From an educational perspective, the fact that there are transcriptions/translations of the older documents is very helpful, since students might find it difficult to cope with the original paleography and orthography; but having the photographs of the originals makes it useful for more professional-level researchers as well. The items are reasonably well identified (even including the locations of the original documents) but the collection is not searchable in the way that the WWI and II poster collection is; one has to basically browse through, with the documents divided by general content and then subdivided by type.

Lee, Hur-Li. 2000. What is a collection? Journal of the American Society for Information Science 51 (12): 1106-13.