Post navigation

Getting to CrossMark

This week, we launched our participation in CrossRef’s CrossMark program. It’s an exciting step for PLOS, and getting there was a learning experience we hope you’ll find interesting.

The Program

CrossMark is a service of CrossRef that is gaining traction among scholarly publishers, with more than 30 publishers to date, and nearly half a million scholarly documents. The purpose of the CrossMark logo appearing on article pages is to give researchers a consistent way to know the status of any article, from any participating publisher. When someone clicks the CrossMark logo from either the online version of the article, or the PDF, they see a popup like this one. It indicates that either the article is up to date, or that updates are available.

It’s clear that the CrossMark service is valuable for keeping content current, which assists the integrity and completeness of the scholarly record. It’s also worth highlighting that we’d like our initial CrossMark participation to be the first step toward additional exciting uses in the future. We could extend our CrossMark usage to…

The Journey

Getting from “we want to participate in CrossMark” to “the CrossMark logo is live” was a process that took time. Seven months, if you want to know the truth! Don’t let that scare you if you’re a publisher interested in kicking off your own CrossMark participation. The main reason it took us 7 months is that we bundled the CrossMark initiative into a larger corrections handling overhaul, which included a massive data migration effort. Anyone who has been through one of these will tell you the same thing: data migrations are not for the faint of heart. And in retrospect, this bundling of initiatives was a decidedly un-Agile way to go.

So the overall initiative included overhauling our corrections handling process, which meant switching systems for inputting and publishing correction notices. This new process required system development, which in turn required documentation, training, and hands-on practice for a pretty big chunk of our staff. And then there was the data migration effort, which took a long time on its own. (None of this part of the initiative included our CrossMark program implementation.)

Then, we tackled the CrossMark piece, which was fairly straightforward in the scheme of the overall project. We added the CrossMark logo to articles: the CrossMark logo now appears on every PLOS article page on our journal sites, and on the downloadable PDFs for all newly-published articles going forward. And we updated our deposit toolchain to include the CrossMark metadata. But there were a few complications, because of the aforementioned data migration.

First, we chose to create a back-deposit of CrossMark data for our entire corpus. Over ten years of publishing equals somewhere around 110,000 articles, as well as over 3,000 migrated corrections. Naturally, things change over time. How does a person get a grasp of the minor differences between article XML generated over ten years? You can look at a few files from various periods in each year, but that’s just barely scratching the surface. You still have no clear idea of what might actually be different. A metaphorical needle in a gigantic digital haystack. So we wrote some XSL transforms, threw the whole lot at ’em, and temporarily kicked some cans down the road. We figured we’d let CrossRef’s submission results tell us if something was wrong. After sending off 110,000+ XML files (with a slight chuckle) and letting the script run for about twelve hours, we had a pretty decent success rate. After some slight tweaking, the rest were good to go as well.

Dealing with back-deposits for our migrated corrections was a bit dirtier, and required a little more clean-up. First they had to be re-formatted simply for display on our website in their new form, and then mined for the needed CrossMark deposit information before sending the XML off for deposit (thanks for that .jar file, CrossRef!). The vast majority of the work was accomplished with a small toolset, really. Some .jar files (provided by CrossRef), and some XSLT files did most of the heavy lifting. Though how you compile and prepare your corpus could vary from ours.

And now a few words about article PDFs for our CrossMark program. As we mentioned, the CrossMark logo appears on PDFs for articles we publish going forward. We chose to back-update the online versions of our articles to include full CrossMark functionality, but we decided not to update the 110,000+ downloadable PDFs for previously-published articles. It was a decision based more on our unique volume situation, and less about the process of updating the PDFs. The marking and stamping process is simple, once you have it set up. But we decided that the testing and remediation challenges associated with replacing 110,000+ active PDFs was too much to take on at this time. CrossRef leaves it up to the publisher in terms of whether you choose to fully update your corpus, or start participating in CrossMark from a given date onward. We took a bit of a hybrid approach because we chose to add CrossMark functionality to all HTML articles, but only to PDFs for newly-published articles.

So there you have it! Overall, getting to CrossMark turned out to be a bit more of a journey than we anticipated, but we have arrived, and we’re glad we took the trip. We hope this post is useful to any of you who may be considering kicking off a CrossMark participation program of your own.

Post navigation

Molly Sharp is a Senior Product Manager at PLOS, heading up PLOS's content management system (called Lemur), which aims to help provide context, relevance, and discoverability for the content of the world's largest journal.