Islandora CLAW

Islandora CLAW (formerly known as Islandora 7.x-2.x) is the next generation of Islandora. Still in development, this major upgrade will be compatible with Drupal 8 and Fedora 4. For more details, please check out the following resources:

Latest CLAW News

Create a roadmap for the future of the Islandora platform, including tools and strategies for migration

The Technical Advisory Group (TAG for short) has been working on a prioritized list of upcoming features and improvements for Islandora CLAW to help guide its development. We're aiming to release Islandora CLAW (and drop the CLAW codename) after 7.x-1.12 is released. Once that happens, this roadmap will be used to set sprint goals and other development priorities. We're opening up the roadmap to review by the entire community, and are asking for your feedback. You can leave comments either in this google doc or in the individual Github issues. We've also ranked these issues using a Github project.

Priority was agreed upon following the general rule of providing "must-have" features before migration. In other words, features which, if missing, would prevent someone from adopting the software should receive higher priority. Documentation and examples involving migrations ranked first, with multi-site support following up second. Also on the list are features built around the Fedora API specification, UI/UX improvements, new derivatives, and a lot more.

If there's anything you think is missing that's high priority, feel free to leave a suggestion in the google doc, or create an issue on Github and give it the Roadmap label. This is your chance to help shape the development of the project, so if you really need something before migrating in, this is a good opportunity to have your voice heard. After this review, the finalized list of features will be presented to the Board of Directors for approval. Once approved, the roadmap will be prominently displayed on our web site to help give people a sense of the direction of the software.

The Islandora community has just wrapped up a very successful sprint dedicated to migrating from 7.x to Islandora CLAW. We at the Islandora Foundation want to give a big thanks to everyone who put in time during this sprint, as well as the organizations who lent us their talent on the company dime. We also want to give a special shout out to the Metadata Interest Group, who collectively put in a ton of time and tackled some intense questions for those who want to use a migration to Islandora CLAW as a chance to do metadata cleanup. During the course of two weeks, we managed to accomplish a lot. As of right now you can:

Migrate over objects based on content type

Migrate ALL the datastreams (except AUDIT, which is a special case)

Extract metadata from any XML datastream and make it a Drupal field

Model authorities such as people, organizations, and subjects

Convert MODS to CSV using Cara Key's (LSU) XML2CSV tool

There's still some work left to do, though. On the horizon for the near term, be on the look out for:

Migrating the AUDIT datastream

Modeling more/different types of authorities

Examples of extracting authorities from FOXML

A workflow for those who want to use OpenRefine to reconcile linked data authorities during the migration process

Moving forward, this is an excellent chance for people to try out the tools we're developing and point them at their existing repositories. Our migration tool, originally developed by Jared Whiklo (University of Manitoba), is available on Github. And if you want to give modeling authorities a go, check out our new controlled_access_terms module, which was made by Seth Shaw (University of Nevada Las Vegas). If anyone has feedback/issues/questions, please feel free to create an issue or post a message on the mailing list.
Here's a full list of all the people and organizations who helped make this once-considered-impossible feat a reality:

Benjamin Rosner - Barnard Collge, CU

Pat Dunlavey - Born-Digital

Andrija Sagic - Library "Milutin Bojic"

Ann McShane - Library Company of Philadelphia

Cara Key - Louisiana State University

Jason Peak - Louisiana State University

Jonathan Green - LYRASIS

Rachel Leach - Mount Holyoke College

Mark Jordan - Simon Fraser University

Adam Soroka - Smithsonian Institution

Rachel Tillay - Tulane University

Pete Clarke - University College Dublin

Jared Whiklo - University of Manitoba

Mike Bolam - University of Pittsburgh

Seth Shaw - University of Nevada Las Vegas

Paul Pound - University of Prince Edward Island

Rosie Le Faive - University of Prince Edward Island

Nat Kanthan - University of Toronto Scarborough

Marcus Barnes - University of Toronto Scarborough

Carolyn Moritz - Vassar College

Thanks to everyone involved! And if you missed out on this sprint, don't fret. We'll be holding another Islandora CLAW community sprint later this year after Islandora 7.x-1.12 is released.

Late last year, a working group of the Confederation of Open Access Repositories (COAR) released a report with recommendations to adopt "new technologies, standards, and protocols that will help repositories become more integrated into the web environment and enable them ​to ​play ​a ​larger ​role ​in ​the ​scholarly ​communication ​ecosystem." Islandora's own Institutional Repository Interest Group took up the report and measured Islandora against it, looking at both the current functionality available in Islandora 7.x, and how we can best shape Islandora CLAW to meet these recommendations for the future (complete with issues in the CLAW GitHub so we can track our progress). They have shared their own results, written up by convenor Bryan Brown:

#1: Exposing Identifiers

The brunt of the recommendation here seems to be implementing best practices listed at http://signposting.org/ regarding typed HTTP links. I’m not sure what Islandora 7.x is doing in terms of typed HTTP links, but I’m assuming nothing beyond whatever Drupal 7 does by default. It could certainly be doing more, but there’s a lot to chew on in the best practices in terms of deciding what actually needs to be done, and how this should be done for different types of objects. CLAW, being a linked data application that operates primarily via HTTP, should definitely be doing these things. I’ve made a use case for this at https://github.com/Islandora-CLAW/CLAW/issues/860.

#2: Declaring ​Licenses ​at ​a ​Resource ​Level

Very similar to Behavior #1 (Exposing Identifiers), this recommends using best practices from http://signposting.org/ to use typed HTTP links to expose the URI for the license that best describes a resource. Good in theory, but not all licenses have machine-readable URIs, and would require either migrating existing free-text licenses to ones that have a URI, or in the case of special one-off licenses, creating URIs for local licenses (which wouldn’t be very interoperable). COAR recommends using Creative Commons licenses since they have readily available URIs, but CC licenses aren’t really a good fit for scholarly works since publishing introduces a lot of issues that CC licenses don’t cover. As for the human readable part, that’s just a matter of your metadata and your theming. 7.x and CLAW both should be able to display human-readable rights statements, but neither can do the HTTP link part currently. CLAW use case at https://github.com/Islandora-CLAW/CLAW/issues/860.

#3: Discovery ​through ​Navigation

Even more emphasis on using the best practices at http://signposting.org/. 7.x’s Islandora Google Scholar module adds a link to the PDF for citation/thesis objects as an HTML meta tag, but that’s it. Its easy to see how adding this as a typed HTTP link, especially for compound objects would be helpful to let a machine know about the different parts of a larger meta-object. This feature would be nice for 7.x, but as a Linked Data Application CLAW should definitely have it. Covered again by https://github.com/Islandora-CLAW/CLAW/issues/860.

Members of the IR IG are not sold on this one for use in university IRs. Perhaps there are very specific types of repo systems where peer review, comments and annotations are useful, perhaps for aggregators or publishing platforms. In a university IR, it seems like it could actually hinder adoption because faculty might not want folks interacting with their scholarship, and would request mediation for such things which would slow down already overworked IR staff. Drupal already has tons of modules for things like this, so you could probably modify one to work with Islandora objects in 7.x, and in CLAW you wouldn’t even have to write any code, just turn the module on and configure it. Turning those annotations into linked data on the object would be a bit more difficult, but that difficulty would be more in deciding how the metadata should look than how to implement.

#5: Resource ​Transfer

This seems to be suggesting a modern form of OAI-PMH, but in a way that includes assets in the transfer. Strong recommendation for ResourceSync, which we have no experience with, but looks like it would do the job. 7.x will probably never have this, but CLAW should focus on it. Use case at https://github.com/Islandora-CLAW/CLAW/issues/857.

#6: Batch ​Discovery

We aren’t really not sure how this differs from Behavior #5 (Resource Transfer) since this seems to be a use case where someone used “Resource Transfer” technology to put all of your repo’s stuff in an aggregator so that it could be found in multiple places. You take care of #5, you already take care of #6. Covered by use case https://github.com/Islandora-CLAW/CLAW/issues/857.

#7: Collecting ​and ​Exposing ​Activities

This seems to be a mash-up of #4 and #5: capture interactions, turn them into metadata that you expose, and then push that metadata along with the rest of your data with ResourceSync. There are a LOT of recommendations for possible ways to do this, which underscores the fact that there’s not a clear standard for this and probably not a lot of consumers for this kind of data either. This seems like a “nice to have”, not a “have to have”.

#8: Identification ​of ​Users

This seems like a good idea, and ORCID seems like the obvious best choice in a scholarly context. We don’t know much about the other two ID systems involved (Social Network Identities and WebID), perhaps they would be good for folks who don’t have an ORCID, but then again perhaps this could be a good way to get people to use/understand ORCID. Use of ORCID could potentially lock out non-academic users, which may be a bug or a feature depending on your goals. Whichever you pick, the problem is going to be getting something that people use across the web in order to deliver on the promises outlined in this section. In an age where people are wary about privacy and the web knowing too much about you, we don’t think this one would get as much broad adoption as COAR thinks.

#9: Authentication ​of ​Users

We don’t understand how this is different from #8, it seems like the two go together to such a degree that separating them is only confusing.

#10: Exposing ​Standardized ​Usage ​Metrics

This is a nice dream, but much harder than it sounds. Current generation repositories are pretty close to doing all they can in terms of capturing views/downloads on objects, although client-side triggers are better than server-side ones in order to avoid problems with caching, and Piwik seems to be a winner in the international community due to its focus on privacy and flexibility (although it does require setting up your own Piwik server). Standardizing the way usage stats are exposed from the same repo is a good idea as well, but none of us have experience with SUSHI or COUNTER.

All this can be done to perfect aggregation of usage stats on the same repo, but aggregating/summing stats from external sources is not going to be a practical option until there is a centralized source that does this with a solid API.

#11: Preserving ​Resources

While we agree with the sentiment here, we’re not sure they are saying anything new. Fedora should take care of the actual preservation bits, and Islandora has always requested least-common-denominator open format file types for archival master datastreams and used derivative processes to spin into other formats.

I've taken the liberty of putting CLAW's Drupal modules on drupal.org as sandbox projects. It is my intention to promote these to full projects once CLAW is released so that our modules can be distributed through drupal.org and made available under the 'drupal' namespace on Packagist. We've always been on the sidelines of the Drupal community, and this feels like a step in the right direction. Not only will our modules be available somewhere other than just Github, but Islandora will also get exposure to the wider Drupal community.

This does not mean that we're adopting Drupal's workflow, as CLAW encompasses more than just Drupal modules. As of now, there will be no impact on day to day development, which will continue as-is on Github. However, the subtleties of its inclusion in the release process will need to get discussed and ironed out as we work through our initial release.

They're not much to look at, but here are the links if you're interested:

The Islandora community wrapped up our first Nova Scotian iCamp last week, having spent three days learning and sharing with this amazing Mount Saint Vincent University view as a backdrop:

We opened bit with an update about the state of Islandora CLAW and all of the features it already has in place as it races towards a first release:

Followed by a day of hands-on workshop training. The Admin Track spent most of their day building an Islandora 7.x site from scratch, and we ended the track by debuting the first Islandora CLAW workshop and switching to build mode in the latest software, making objects and collections with CLAW's more Drupal-y interface. The Developer Track did their workshop Choose-Your-Own-Adventure style, building the lesson through questions and requests from the group.

The last day of camp was turned over to session from attendees, including gems like: