SEO Analysis of WRAP, the Warwick University Repository

SEO Analysis of a Selection of Russell Group University Repositories

A post published in August 2012 on an MajesticSEO Analysis of Russell Group University Repositories highlighted the importance of search engine optimisation (SEO) for enhancing access to research papers and is part of a series of articles on different repositories and provided summary statistics of the SEO rankings for 24 Russell Group University repositories.

This work adopted an open practice approach in which the initial findings were published at an early stage in order to solicit feedback on the value of such work and the methodology used. There was much interest in this initial work, especially on Twitter. Subsequent email discussions led to a number of repository managers at Russell group universities agreeing to publish more detailed findings for their repository, together with contextual information about the institutional and the repository which I, as a remote observer, would not be privy too.

We agreed to publish these findings on this blog during Open Access Week. I am very grateful to the contributors for finding time to carry out the analysis and publish the findings during the start of the academic year – a very busy period for those working in higher education.

The initial post was written by Yvonne Budden, the repository manager for WRAP, the Warwick Research Archives Project. It is appropriate that this selection of guest blog post begins with a contribution about the Warwick repository as Jenny Delasalle, a colleague of Yvonne’s at the University of Warwick and myself will be giving a talk on “What Does The Evidence Tell Us About Institutional Repositories?” at the ILI 2012 conference to be held in London next week.

SEO Analysis of the University of Warwick’s Research Repositories

The following summary of a MajesticSEO survey of the University of Warwick’s research repositories, together with background information about the university and the repository environment has been provided by Yvonne Budden.

A Little Background on Warwick

The University of Warwick is one of the UK’s leading universities with an acknowledged reputation for excellence in research and teaching, for innovation and for links with business and industry. Founded in 1965 with an initial intake of 450 undergraduates, Warwick now has in excess of 22,000 students and employs close to 5,000 staff. Of those staff just fewer than 1,400 are academic or research staff. Warwick is a research intensive institution and our departments cover a wide range of disciplines, including medicine and WMG, a specialist centre dedicated to innovation and business engagement. In the 2008 RAE nineteen of our departments were ranked in the top ten for their unit of assessment and 65% of the submitted research outputs were ranked 3* or 4*.

University of Warwick’s Research Repositories

Warwick’s research repositories began in the summer of 2008 with the Warwick Research Archives Project (WRAP), a JISC funded project that created a full text, open access archive for the University. WRAP funding was taken by the Library and in April 2011 we launched the University of Warwick Publications service, which was designed to ‘fill the gaps’ around the WRAP content with a comprehensive collection of work produced by Warwick researchers. The services work on the same technical infrastructure but WRAP remains distinct and exposes only the full text open access material held. The system runs on the most recent version of the EPrints repository software, using a number of plugins for export, statistics monitoring and most recently to assist in the management of the REF2014 submission. To date we do not have a full text mandate for WRAP and engagement with both WRAP and the Publications service varies across the departments. Deposit to the services is highly mediated through the repository team and so engagement is not necessarily reflected in the number of papers available per department, especially as some departments benefit more from the service’s policy of pro-active acquisition of new material where licenses allow. I would judge that our best engagement in terms of full text deposit comes from Social Science researchers but we also have some strong champions in the Medical School, History, Life Sciences and Psychology.

Size and Usage Statistics

At the end of August 2012 WRAP contained 6,554 full text items covering a range of item types, journal articles, theses, conference papers, working papers and more. The Publications service contained a further 40,753 records. In terms of usage since its launch the system has seen 900,997 visits according to Google Analytics, an average of just over 18,000 a month in the 50 months active. To track downloads we use the EPrints plugin, IR Stats, this counts file downloads either directly or through the repository interface. IR Stats will only count one download per twenty-four hours from each source, but will count multiple downloads if an item has multiple files attached. Over the life of WRAP the files held have been downloaded a grand total of 730,304 times with 49.08% of downloads coming from Google or Google Scholar.

Expectations of the Survey

Going into the survey using the MajesticSEO system wasn’t sure what to expect from the results, the majority of the work we’ve done so far with the statistics is with the Google Analytics and the IR Stats package. Looking at the referral sources in the our Google output I can indicate a number of sources I might expect to see back links into the system, including our Business School (wbs.ac.uk) and the Bielefeld Academic Search Engine(BASE) as well as a number of smaller sources. The Warwick Blogs service seems to have fallen out of favour over the past few years with the number of hits from there dropping as people move to other platforms. Above all I’m most curious to see if the SEO analysis can help with the work I am doing in promoting the use of WRAP and the material within it. If this work can assist me in creating the kinds of ‘interest stories’ that help to persuade researchers to deposit it could become another valuable source of information. We are also looking at expanding the range of metrics we have access to, looking at the IRUS project as well as the forthcoming updated version of IR Stats, recently demonstrated at Open Repositories 2012.

Our Survey Results

The data for this survey was generated on the 10th September 2012 using the ‘fresh index’ option, although the images were captured on 19 October. The current results can be found if you have a MajesticSEO account (which is free to obtain). The summary for the site is given below showing 413 referring domains and 2,523 backlinks.

Figure 1: MajesticSEO analysis summary for wrap.warwick.ac.uk

On first glance this seems to be rather low in terms of backlinks, it also shows a fairly low number of educational domains linking to us. The top five backlinks in to the system can be seen below, ranked as standard by the system by a combination of citation and trust flow:

Figure 2: Top 5 Backlinks

Interestingly this lists some of the popular referrers we see in Google Analytics driving traffic to us, but not some others I might have expected to see. The top referring domains are shown below:

Figure 3: Top Referring Domains

This is the only place in the results where Google features at all. The top five pages, as ranked by the flow metrics show a fairly distinct anomaly, as two of the pages are not listing any flow metric information despite this supposedly being the method by which they are ranked:

Figure 4: Findings Ranked by Flow Metrics

The top five pages as sorted by number of backlinks can be seen in the table below:

A research paper on the impact of cotton in poor rural households in India.

The WRAP homepage.

A PDF of an economics working paper on currency area theory.

A PDF of an economics working paper on happiness and productivity.

The record for a PhD thesis on Women poets.

Summary

The top ten backlinks into the WRAP system include a range of sources, from this blog, two Wikipedia pages and two referrals from the PhilPapersrepository, which monitors journals, personal pages and repositories for Philosophy content. We also see a two of pages that collect literature on health topics who are linking back to us, a Maths blog and the newsletter of the British Centre of Science Education.

Interestingly in Figure 3 there is no mention of the University of Warwick or any of its related domains (wbs.ac.uk for the Business School, for instance). I assume this is because MajesticSEO are excluding ‘self’ links, so as WRAP is a Warwick subdomain they are excluding a lot of the links I am aware of. This may also take into account the lack of any backlinks from the Warwick Blogs service. Many of the domains listed here are blog platforms of one form or another, which may be because of the database driven architecture of these platforms and the way the MajesticSEO system are reading those links. For example, if a researcher puts a link to his most recent paper in WRAP on the frame of the blog and this propagates onto every post in the blog, does this count as a single link or as many? We are also seeing links from sources such as the BBC and Microsoft, where, again, it would be nice to be able to see who was linking to what and from where in these domains.

The top pages, as listed by number of backlinks in Table 1, show a trend for linking directly to the file of the full text material we hold in WRAP. This information would tie in nicely with the fact that item three is the most downloaded paper in WRAP over the lifetime of the repository, with 9,162 downloads to the end of August 2012. So in this case we can draw a tentative line between the number of downloads and the number of backlinks. However we can’t follow this theory through, especially as the top paper linked to externally, Paper 1 as listed in Table 1, has been downloaded only a fraction of the number of times compared to the currency working paper. When listed by the flow metrics, as in Figure 4 the pages largely follow the results as seen for the Opus repository at Bath and link to pages about the repository. This is apart from the two anomalous results where despite having no citation or trust flow scores they are ranked second and third, when ranked on flow metrics.

Discussion

I think when looking at metrics the most important thing for a repository manager to do is to be able to build stories around the metrics, as these help the researchers to engage with the figures. Was this spike in downloads because of featuring in a conference, or an author moving to a new institution, or for some other reason? What can I show my users that are going to help them to make the decision to use us over other options and to expend scare time resources maintain a blog or Twitter account? Here the issue, I have with the data we have discovered is that the number of backlinks into a repository will never conclusively prove that a paper will get more downloads, as ably illustrated by the example above. Many researchers are not interested in the fuzzy conclusions we can draw at this point; they want to see clear, conclusive proof that links = downloads = citations.

I also think that search engine performance is an increasingly difficult area to be really conclusive about, especially now users can ‘train’ their Google results to prefer the links they click on most often. This was recently a cause of concern for us as it was reported that our Department of Computer Science (DCS)’s EPrints repository was overtaking our Google ranking and that WRAP didn’t feature until page two of the results now. This wasn’t the case, but because the user reporting this to us was heavily involved in the area of computer science his Google rankings had preferred the DCS repository to the WRAP one as the results were more relevant to his interests. In the same was as when I search for ‘RSP’ my top result is now the Repositories Support Project and not, RSP the Engineering Company or the Peterborough Health and Safety firm as it was initially

We need to always be conscious of what the researcher want from metrics and whether it is possible for us to give it to them. As with any metrics we need to be aware that we have to be explicit in what it is that we are saying and what can be inferred by it. If we are users of metrics don’t understand how the metrics are being developed or how the search engines ranking algorithms work, we won’t be able to confidently predict what we can do to improve them. It may also come down to the way researchers are using these services and for what purpose, which may be why we are not seeing any evidence of the use of services like Academia.edu and LinkedIn. I would imagine if researchers are using services to showcase their work to prospective employers and other researchers they may prefer to link to the publisher’s version of their work rather than the repository versions. I suspect the interest story from the SEO data may be more about ‘who’ is linking to their work rather than where they are linking from, which is detail we cannot and possibly should not be able to provide.