What can machines discover from scholarly content?

Just as you thought that everything was known about the academic user journey, a workshop comes along (the WDSM Workshop on Scholarly Web Mining, SWM 2017, held in Cambridge, February 10 2017) that presents a whole new set of tools and investigations to consider.

It was a rather frantic event, squeezing no fewer than 11 presentations into a half-day session, even if the event took place in the sumptuous and rather grand surroundings of the Council Chamber in the Cambridge Guildhall. Trying to summarise all 11 presentations would be a challenge; were there any common areas of inquiry?

Around four of the presentations were about recommender systems. One was about a system for annotations; some were about tagging, both identifying mathematical expressions in Wikipedia, and adding MeSH terms to scholarly content. Then there was one trying to make some sense of the information about researchers in Google Scholar, one outline of the European Union Future TDM project, and finally one ambitious presentation attempting to map research excellence to citations and other indicators.

One theme that was discussed by more than one presentation was the contrast between collaborative filtering and content filtering: do we indicate relevance because other people found the papers relevant (collaborative filtering), or do we recommend articles because of an overlap of content, even if nobody else has ever noticed a link or gone from one article to the other?

If there was a common conclusion that could be drawn from all this, it is that AI and machine learning tools are increasingly being used to facilitate the academic user journey, obviously, but more interesting is that more than one of the presentations grappled with the challenge of trying to use AI to make sense of what human user data is available, whether using logs of search and discovery, or researchers’ self-declared interests, or the author profiles on Google Scholar where the academics can state their research interests. There is a similar challenge in the attempt to use machine-learning tools to improve what is currently a human process of adding MeSH (or any other taxonomy) terms to content. The difficulty, perhaps, is that trying to harmonise a machine-based system to a human-created universe is doomed to imperfection; perhaps we should go one step back and address the problem from scratch, or use the insights of machine learning to start without a taxonomy; in other words, a bottom-up approach rather than top down. Adam Kehoe, of the University of Illinois, found big discrepancies when comparing automatic classification of computer science papers compared with the MeSH categories assigned to them. Perhaps this is because the humans who added MeSH terms to these articles didn’t understand the articles too well, it being outside the core competence of MeSH, which is medical topics. But the F1 scores for comparing MEDLINE records and machine-learning versions of the same were very poor (less than 0.5). Perhaps machines and humans just think differently. Perhaps we are asking the wrong questions.

Another talk, looking at citations and readership, found that the number of citations of an article are a poor correlation of the research excellence of the article – a finding that, if correct, blows a hole in the entire citation system used for evaluating scholarly research. However, the criteria used by the paper for “research excellence”, which was by asking humans to identify one “seminal” paper in their subject area, might not be scalable as a measure of all scholarly research ever written. Again, the problem seems to be how to map the vagaries of human thinking (our intuitive criteria of what makes an article “seminal”, compared with the tools emerging from machine learning. Automatic tools can count citations with amazing accuracy, but citations, it turns out, don’t measure the article’s excellence.

In any case, if you like a challenge, the team from Mendeley looking at recommender systems pointed out that there are many types of recommendations, and users want different kinds of recommendations at different times. They might want novel concepts; they might want familiar concepts; they might even want serendipity. Getting a machine to determine which kind of recommendation you want, given that your preferences may change from minute to minute, will be challenging.