Google feature changes; open access discoverability

So, I’ve found out about a couple new things from Google I hadn’t known about. (Google is such a prominent player in our space, we need to keep up with what’s going on there so we know how to exploit it to maximum effect. I need to remember to go explore google’s interfaces and documentation more regularly to see changes). 1. Google search API now allows server-side access. 2. Google search allows limit on usage license. And both these things got me started about open access discoverability again.

1. Google API allows server-side access!

“For Flash developers, and those developers that have a need to access the AJAX Search API from other Non-Javascript environments, the API exposes a simple RESTful interface….

“An area to pay special attention to relates to correctly identifying yourself in your requests. Applications MUST always include a valid and accurate http referer header in their requests. In addition, we ask, but do not require, that each request contains a valid API Key.”

This is huge. I’ve complained before about how it was difficult to incorporate Google features into my own service-oriented software in a maintainable way when only javascript AJAX functions were allowed.

Now if only they’d do the same thing for the Google Books Discoverability api. That’s where I really need it; it’s still not clear to me how I might usefully incorporate automated general google search (including google scholar) into my library applications dealing with scholarly materials, because of the high chance that what Google returns will be for-pay and not available to my users: I don’t want to show them that.

So it was with interest I noticed a new feature:

2. Google search supports usage rights limit

Take a look at the Google advanced search page. Click on “Date, usage rights, numeric range, and more”. Look, there’s a “usage rights” limit which filters by CC licenses. When did that show up? Of course, it can only include things in the filter that advertise a CC license in a way that Google’s bots can recognize. (Not sure how this is done, Google doesn’t say; I think I recall there’s a standard CC-endorsed way to do this?).

Unfortunately, some initial test searches revealed that this is a tiny piece of the actual open access pie. Many scholarly materials that ARE available online open access are not in fact in Google’s indexes. Probably because they don’t advertise it properly in a machine-readable way? Still, this is a great step by Google, and indicates that Google recognizes users are increasingly having trouble with getting too much restricted content in their google search results.

But my frustration remains with the scholarly open access community. If the problem is that open access repositories aren’t advertising CC licenses properly–why aren’t these software packages (many of them open source) being fixed? Why isn’t there general concerted funded effort from the open access repository community to solve this general problem: And the general problem is there’s no good place to search aggregated open access content and ONLY open access content. To use in software that wants to answer the question “Is there an open access version of the article with this title and author available?” No good way to do it. And this lack of discoverability is a huge problem with the utility of the existing open access repository domain. I don’t understand why there isn’t more concerted effort to solve it.

Although, in fairness, I did recently become aware of a European initiative, that’s apparently actually funded, to address at least part of this issue. Registering in machine readable format whether content is open access is the first step to building aggregated indexes. (It’s a dirty secret of the ‘open access repository’ domain that much of the content in so-called “open access repositories” is not in fact open access at all, it’s behind IP and password based restrictions. A cursory sample of items in repositories listed in the OpenDOAR–whose collection policies say that a reason for EXCLUSION from OpenDOAR is “Site requires login to access any material (gated access) – even if freely offered”–will reveal that that collection policy is quite often honored in the breach. Although I guess DOAJ has less of a problem with that, and that SPARC/DOAJ initiative is just about DOAJ, so it’s not clear to me that the SPARC project will really address my problem. I guess the SPARC project is about people not being sure if they can re-use material in DOAJ journals—my problem is being able to do a meta-search limited to publically available open access content in the first place, and I don’t care if it’s licensed for re-use, I just want to find only stuff that is actually viewable online for free!

Hmph. What can we do to get the open repositories communities to take note of this problem and address resources toward it?

5 thoughts on “Google feature changes; open access discoverability”

Hit the developers and the OA ideologues. Repository managers have known this is a problem forever. OA ideologues have denied it is a problem, because if it’s going into an IR, it must be OA, right? Developers have gone along with that because it lessens the dev burden.

The other problem is widespread cluelessness among faculty about copyright in general and reuse rights in particular. Go over to Peter Murray-Rust’s place if you want to see the debate in living color — and these are the relatively clued sorts! This one won’t be fixed short of a revolution in faculty thinking, I’m afraid. 90%+ of faculty I meet day-to-day are very much stuck on “but what if somebody uses it in a way I don’t like, or without crediting me?”

Simple example from DSpaceland: Tell the DSpace devs to fix the OAI engine so that it doesn’t expose material in access-restricted collections. We repository managers have only been asking for that, oh, FOREVER.

>”I just want to find only stuff that is actually viewable online for free”

One hand-built solution is Jurn, indexing over 3,000 open and free ejournals in the arts and humanities, at the (free) article level wherever possible. Just type Jurn into Google – it’s the No.1 result.