Mendeley scrobbles your papers

Mendeley is a social web application for academic authors that has been receiving quite a lot of attention recently. Victor Keegan wrote about it in The Guardian last week, likening it to the streaming music service Last.fm:

How does it work? At the basic level, students can “drag and drop” research papers into the site at mendeley.com which automatically extracts data, keywords, cited references, etc, thereby creating a searchable database and saving countless hours of work. That in itself is great, but now the Last.fm bit kicks in, enabling users to collaborate with researchers around the world, whose existence they might not know about until Mendeley’s algorithms find, say, that they are the most-read person in Japan in their niche specialism. You can recommend other people’s papers and see how many people are reading yours, which you can’t do in Nature and Science. Mendeley says that instead of waiting for papers to be published after a lengthy procedure of acquiring citations, they could move to a regime of real-time citations, thereby greatly reducing the time taken for research to be applied in the real world and actually boost economic growth. There are lots of research archives. For the physical (but not biological) sciences there is ArXiv, with more than half a million e-papers free online – but nothing on the potential scale of Mendeley. Around 60,000 people have already signed up and a staggering 4m scientific papers have been uploaded, doubling every 10 weeks. At this rate it will soon overtake the biggest academic databases, which have around 20m papers.

The site has grown fast, aided by significant investment capital from investors associated with Last.fm, Skype and Warner Music Group. The company itself is formed of a team of researchers, graduates and software engineers from a number of prestigious UK, German and US institutions (including several of our Partners – Stanford, Imperial College, Warwick and Cambridge). It currently has over 4.7m downloadable items and is adding tens of thousands every day. I checked over the last two days, and it added just over 62,000 on Tuesday, and nearly 64,000 on Wednesday. Statistics for users reveal that bioscientists (19.4%) and computer and information scientists (19.1%) are the largest groups, with medics (7%) trailing way behind in third place. I contacted the company and asked for data on the breakdown of papers. Not unexpectedly, these three disciplines dominate – though the percentages reveal that bioscientists are the most prolific users: biosciences (28%), computer and information science (13%), and medicine (8%). That may be fitting, given the basis of the name of the service:

The chemists and biologists among you may have already deduced from whom we derive our name: Dmitri Ivanovich Mendeleyev (alternatively spelled Mendeleev), who developed the periodic table of elements, and Gregor Mendel, who is often called the “father of modern genetics”.

If it realises the potential many people are now predicting, the library community is bound to ask why a web application based on an entertainment model should have proved so much more attractive than the painstakingly built repositories we have been holding under the noses of our academic authors over the last several years?

I think there may be a few reasons for this. First, its appeal is intuitive. Put your papers in our service and we will give you lots of webscale data back on how popular they are. The system can show you instantly how your research profile compares with the average researcher in your field. Second, it is instant. The map of research adjusts daily as new papers are added. Want to find out who is the most popular author in your field today? Mendeley can tell you. The Netherlands had its Cream of Science service (now absorbed into the NARCIS site), of course, but it did not have dynamism like this. And third, the demands it makes are low compared to the benefits it provides. A range of simple tools allow you to ship your papers into it. You can copy across your reference software databases with a simple export operation. You can use a bookmarklet to flick a reference encountered in a range of bibliographic sources (including Worldcat) straight in. And – you can scrobble. Scrobbling is the word Last.fm uses to describe the use of a tool that works invisibly in the background to add your music choices to your Last.fm account. Scrobble notices every track you play on iTunes, every mp3 file you add to your device, and uses that information to feed its own profile of you and – at the same time – to feed its statistical record on artists and tracks, in order to churn out data on popularity. In Mendeley, the same notion is applied via the Watched Folder facility. With it, you can designate folders on your hard disk that Mendeley will monitor, and from which it will suck new papers as they appear.

By adopting these approaches, Mendeley has grabbed the attention of users because it understands what they like. They like simplicity. They like instantaneous results. Mendeley also sells itself on the basis of speed: no more waiting for your papers to be published and then to be cited – find out how your status as a researcher is sitting right now. What do they not like? Tedious rules about copyright (the Mendeley FAQ, perhaps ironically, quotes the E-prints Self-Archiving FAQ to reassure authors about the extent of Open Access tolerance among publishers). They don’t like rigorous requirements for metadata (Mendeley automatically extracts metadata, and asks users to help it make corrections where it gets things wrong). In other words, the requirements libraries often put up front are almost dismissed as non-issues. But the instant gratification of statistics providing reputation metrics are apparent to authors almost immediately they begin to use the system. And if they are not totally accurate, that’s not too surprising. This is a webscale service after all; we expect to see roughly accurate patterns and trends.

There is much to be improved in Mendeley. Its people data is very poor at present, for one thing. And of course it does not do the job of preservation that institutionally-held repositories do. Text-mining is unsupported, as is linkage to datasets. So far, these desiderata have not been the point. But once a service like Mendeley is truly webscale, the history of successful social web applications suggests that additional functionality can be added in.

Mendeley is an interesting site, but I disagree with your rosy predictions. They’ve punted a big pile of difficult things which librarians are pretty sure are essential, and I don’t see how that gets easier with scale, either in number of documents or number of users.

From your article, Mendeley is not doing: authorities, metadata review, classification, preservation, linking, or rights management. They are crowd-sourcing the metadata, which has the Zipf problem, that is, it works great for popular stuff and is noisy and spotty for everything else.

Yes, you get a few extra programmers with an open, plugin-based system, but volunteer catalogers are a lot harder to find.

It looks to me like they did a good job on all the easy parts. I’m sure it is neat for the most popular stuff, but it probably serves best for writing undergrad papers rather than detailed scholarship.

grumble grumble…I’d sure like to know who’s taking on the task of building and maintaining the taxonomy of disciplines for services like Mendeley. I’ve yet to stumble on where Library Science falls in their catalog. My work would be “neatly and intuitively organized” under Computer and Information Science I guess. But I think there are some good reasons why I’m in a Graduate School of Library and Information Science and not in Computer Science. I’m sure lots of other disciplines will have the same kind of nit picky feelings about which branch of the tree they, and their fellow flock, sit on. Academia.edu lets users edit, but that just meant regular editing wars between different camps.

Seems like their experience with similar genre debates among passionate fans in the Last.fm community could suggest some workable solutions.

(this is just based on looking at the website, guess I’ll have to give it a whirl to see what the desktop client is like)

Very interesting. Being an archivist, I wonder if Mr. Gunn would comment on whether he is referring to OAI-PMH metadata harvesting protocol or the OAIS conceptual model for archiving? Are we talking about long term or permanent retention, or merely storage and exposure of descriptive metadata? If the former, *everyone* in my neighborhood wants to know where the sustainable business model is!!!

Very interesting and stimulating post on Mendeley. I particularly liked your noting of the killer differences between library IRs and Mendeley being simplicity of use, easily extracted metadata which is good enough and web scale data crunching combined with a bullish approach to copyright etc, issues we tend to vex over endlessly.

Thanks for the mention! I serve as the academic community liaison for Mendeley and I think you’ve really put your finger on the issue with repositories as they currently exist. In fact, a general issue with most public-facing projects, software or otherwise, is how to hide the important technology behind a friendly and inviting face. That’s one thing Mendeley is trying very hard to get right, but they want to get the back-end right too. OAI-compliant archiving, proper metadata, and linkage to datasets are all important initiatives. They’re aware that their citation and article use data are themselves potentially an important part of the web of linked data. Mendeley depends on the relationships it’s building with librarians and archivists such as hang around here to help it get things right, so please don’t hesitate to get in touch by email or @mrgunn on Twitter.

Comments are closed.

OCLC Research

Hanging Together is the blog of OCLC Research. Learn more about OCLC Research on our website.