Repository reports and more on SEO

I’ve been trying to get to grips with what usage data I can generate from our repository – both for research but particularly OER for a small JISC funded follow up to Unicycle. I don’t really have anything equivalent to IRStats for EPrints – see this report from USIR for the type of data that can be generated from Salford’s EPrints repository – but I do have Google Analytics running on http://repository.leedsmet.ac.uk/ and intraLibrary’s own reporting tool.

The issue is complicated for us slightly in that we effectively have two repository sites running on two different servers! There is intraLibrary itself hosted for us by Intrallect and there in the Open Search SRU interface on a Leeds Met server. From Analytics I can get data on traffic to Open Search including hits on the metadata page for individual records but I cannot identify whether the full text/resource was actually downloaded. However, I CAN get this info from intraLibrary itself.

The dual server set-up also creates issues for SEO and I’ve been trying to ensure that full text, where available, is indexed by Google. Though we have made some progress, I’m still not sure the issue has been fully resolved…intraLibrary generates a Public URL for each record – if this is not stored in the metadata (as was the case for us) then it is re-generated each time the record is accessed – interpreted as a dynamic URL by Googlebot and not indexed. I was able to work with Intrallect to ensure that a Public URL is generated when a record is created and stored in the metadata; Mike embeds this now-consistent URL in the results from Open Search which (hopefully) will now be indexed by Google.

There are currently a total of 250 PDFs in intraLibrary (188 research and 62 OER) and certainly *some* of these are being indexed; searching Google for filetype:pdf site:http://repository-intralibrary.leedsmet.ac.uk/ returns 53 records (up from 52 earlier in the week so will keep an eye on this) whereas Filetype:pdf site:http://repository.leedsmet.ac.uk/ does not return any PDFs because the they are not at that address so I don’t think we’ll be able to generate the nice nested – landing page/full text – search results that you see from EPrints repositories, at least while intraLibrary and Open Search are on seperate servers.

It is interesting to consider the implications of some of this on usage reporting, especially in the context of OER which are disseminated more widely than research (via Jorum, Xpert and potentially also the institutional VLE.)

According to Google Analytics, the most viewed OER on Open Search in September was Employability & Career Development: Assessing your Skills, Talents and Attributes which was viewed a total of 26 times – 13 absolute unique visitors – it does not feature in the report from intraLibrary, however, as it’s an external URL and does not utilise the intraLibrary Public URL (need to rectify this – there is a Public URL available that would redirect enabling us to record follow through).

It gets really interesting when you look at the most accessed item according to the intraLibrary report – Numeracy Basics – interactive quiz came in third from GA with the not terribly impressive stat of 19 hits (6 absolute unique) but the Public URL was apparently access a whopping 588 times! I’m not sure yet where all these hits have come from (think I may be able to get IP info from intraLibrary) but may be someone has linked to it from the VLE – it is also in Xpert and I posted a link to it at https://repositorynews.wordpress.com/2010/10/01/xpert-vs-jorum/ but that was 1st October – this particular resource isn’t yet in Jorum (http://open.jorum.ac.uk/xmlui/handle/123456789/5817 – viewed 290 times on JO – is a hosted version so definitely not linking to the intraLibrary Public URL.)

Also pertinent here, I think, is a twitter discussion I had recently with @glittrgirl (Suzanne Hardy of PORSCHE) and others about managing duplicate OER records and it occurs to me that we are not, in fact, duplicating records at all – Jorum harvests full IMSCP so the record will point at our intraLibrary install (the example above notwithstanding that *is* actually duplicated in JO!) and Xpert harvests our OAI-PMH which, again, will point to the same link…(might be more of a duplication issue with ACErep though…need to think that through.)

Thanks Dave – it’s very much a work in progress of course – I am actually in error about stuff not being duplicated in Jorum and the fact that we are transferring IMSCP, in fact, means that they are very much duplicated (it’s Friday afternoon!)

Something funny going on with “Numeracy Basics – interactive quiz” though – according to a report I’ve just run for October it was hit 32729 times that month! Nothing else is close – next was 218 – wondering if it’s Google spiders or something…I linked to it in Xpert from here on 1st Oct…