Google Scholar and DSpace

Nearly all repository managers aim to offer a discovery and access interface for scientific research. In a recent study, @tmire examined the indexing of repositories by Google Scholar. The use of EPrints, Digital Commons and DSpace is recommended in Google's Inclusion Guidelines for Webmasters. Configuration issues like the metadata fields used and how these are exposed by the repository highly affect the compliance with the Google's inclusion guidelines.

Methodology and results

Google Scholar crawls automatically repositories without the need for intermediary repository staff. The challenge is that repository managers are left in the dark, not knowing why their repository or specific items are not being crawled and indexed. This study used O'Brien and Arlitschapproach, and it was found that the average indexing ratio for 10 recent DSpace repositories was 64.8%. Another approach was used to get a sense of which items are included and which ones are missing. They selected items from five repositories and explicitly searched for their titles in Scholar to see if they would receive a hit from the repository. Older items had higher chances to be found than newer items.

Conclusion

The study concluded that Google Scholar indexing is still much of a black box today, improving repository coverage could be particularly challenging for repository managers. Google Scholar indexing and the associated ratios are likely to further improve for DSpace 4 repositories. This recent release of DSpace included several enhancements explicitly requested by the Google Scholar team.The study postulated that Google Scholar crawler 'should' find it easier to retrieve recent submissions in DSpace 4 repositories.