Monday, October 31, 2011

David Crotty of Oxford University Press made the headline-grabbing charge that PLoS will this year be more profitable than Elsevier. I responded skeptically in comments, and Kent Anderson, a society publisher, joined in to support David. Comments appear to have closed on this post, but I have more to say. Below the fold I present a more complete version of my analysis and respond to David's objections.

Saturday, October 29, 2011

The library at the University of British Columbia invited me to speak during their Open Access Week event. Thinking about the discussions at the recent Dagstuhl workshop I thought it would be appropriate to review the problems with research communication and ask to what extent open access can help solve them. Below the fold is an edited text of the talk with links to the sources.

Wednesday, October 19, 2011

Duane Dunston has posted a long description of the use of digital signatures to assure the integrity of preserved digital documents. I agree that the maintaining the integrity of preserved documents is important. I agree that digital signatures are very useful. For example, the fact the GPO is signing government documents is important and valuable. It provides evidence that the document contains information the federal government currently wants you to believe. Similarly, the suggestion by Eric Hellman to use signatures to verify that Creative Commons licenses have been properly applied.

However, caution is needed when applying digital signatures to the problem of maintaining the integrity of digital documents in the long term. Details below the fold.

Friday, October 14, 2011

Twopostings by Ann Okerson to the "liblicense" mail alias about Wiley's latest financial report reveal a detail about the way Wiley reports its financial data that I missed, and that means I may have somewhat over-estimated its profitability in my post on What's Wrong With Research Communication?. Follow me below the fold for the details.

One new feature that ACM will roll out in the fall will enable authors to obtain a special link for any of their ACM articles that they may post on their personal page. Anyone who clicks on this link can freely download the definitive version of the paper from the DL. In addition, authors will receive a code snippet they can put on their Web page that will display up-to-date citation counts and download statistics for their article from the DL.

Kun Qian of the University of Magdeburg addressed the fact that the OAIS standard does not deal with security issues, proposing an interesting framework for doing so.

Manfred Thaller described work in the state of North-Rhine Westphalia to use open source software such as IRODS to implement a somewhat LOCKSS-like distributed preservation network for cultural heritage institutions using their existing storage infrastructure. Information in the network will be aggregated by a single distribution portal implemented with Fedora that will feed content to sites such as Europeana.

Felix Ostrowski of Humboldt University, who works on the LuKII project, discussed an innovative approach to handling metadata in the LOCKSS system using RDFa to include the metadata in the HTML files that LOCKSS boxes preserve. Unlike the normal environment in which LOCKSS boxes operate, where they simply have to put up with whatever the e-journal publisher decides to publish, LuKII has control over both the publisher and the LOCKSS boxes. They can therefore use RDFa to tightly bind metadata to the content it describes.

My take on the preservation issues of linked data is as follows.

Linked data uses URIs. Linked data can thus be collected for preservation by archives other than the original publisher using existing web crawling techniques such as the Internet Archive’s Heritrix. Enabling multiple archives to collect and preserve linked data will be essential; some of the publishers will inevitably fail for a variety of reasons. Motivating web archives to do this will be important, as will tools to measure the extent to which they succeed. The various archives preserving linked data items can republish them, but only at URIs different from the original one, since they do not control the original publisher’s DNS entry. Links to the original will not resolve to the archive copies, removing them from the world of linked data. This problem is generic to web archiving. Solving it is enabled by the Memento technology, which is on track to become an IETF/W3C standard. It will be essential that both archives preserving, and tools accessing linked data implement Memento. There are some higher level issues in the use of Memento, but as it gets wider use they are likely to be resolved before they become critical for linked data. Collection using web crawlers and re-publishing using Memento provide archives with a technical basis for linked open data preservation, but they also need a legal basis. Over 80% of current data sources do not provide any license information; these sources will be problematic to archive. Even those data sources that do provide license information may be problematic, their license may not allow the operations required for preservation. Open data licenses do not merely permit and encourage re-use of data, they permit and encourage its preservation.