Please Please Understand My Search Query

With Google announcing that social network information will now be included in search, there’s been a lot of talk among my friends in the industry and on the academic side. I heard some skepticism last night from the Off The Hook crowd that it’s really going to accomplish much for Google given that Google+ isn’t as popular as expected.

James Grimmelmann has been voicing his disagreement with the move over the past few days on Twitter. The way this might strengthen antitrust lawsuits against Google is one of his concerns. Another is that most people won’t understand the inclusion of social results in their search findings, and more importantly will be confused about the boundaries of public and private, what is within their own Google files and what is elsewhere in the cloud. (Doubtless, just like universal login.) And, like so many bad things dished up by social networking sites, it’s opt-out.

Ultimately James says that it’s this implementation of using social data that he objects to, not the concept in general. And that’s basically what I’m thinking, too. Because in the ideal, keying search results to people’s own posts and friends could actually do a lot to serve up more relevant results, and help people understand the things they read online better. It could have prevented the hapless people I studied in my dissertation from exposing themselves to public ridicule and security problems.

The best example I have to demonstrate this is the number of people who commented on Jonathan Coulton’s blog when he wrote a post titled “Please Please Cancel My Account.” The post read, in full:

“Here’s a recording (if that link’s swamped, here’s a mirror) of a guy trying to cancel his AOL account. Now THAT is funny. Thanks Dr. Smith…”

A number of people arrived on the thread, apparently from search engine results, and asked to have various kinds of accounts cancelled. Hi5. Facebook. Playboy. AOL, of course.

How’d it happen? Coulton and his readers figured out that at the time the first comments appeared, Coulton’s post was the #1 Google hit for the search “cancel my account.” That was in 2007. By the time I began my analysis in 2008, it still had that ranking; at some point before I finished it was down to #3, and by now it’s somewhere on the second page of results. The obvious explanation for this is that Jonathan Coulton, popular with link-writing, Internet-savvy geeks everywhere, has an incredibly high PageRank which skews the page’s calculated relevance.

Consider what the average person, when speaking to someone else, would mean to imply if they spoke the phrase “please cancel my account.” They’d likely be asking for that action to be taken. They’d assume that they were talking to a person who could actually do this for them. They’d assume that person would know which account was meant by “my account.” If those assumptions were wrong, there would be an opportunity for the person they were speaking to to reply something like “which account?” or “I can’t do that for you, you need to ask someone else.” It is exceedingly unlikely that anyone saying “please cancel my account” in a human conversation would expect the reply “Here is a funny recording of a guy trying to cancel his account,” and it is likely they’d treat that as an error that needed to be fixed through further conversation.

Search engines have traditionally ignored words like “my,” not being able to do much with them. In general, understanding context the way humans do presents a thorny problem for computers (and I love Lucy Suchman’s analysis of how it presents a problem for “intelligent” photocopiers as well). This is why semantic search is a holy grail. If computers could interpret words which require an understanding of possession; perspective and the immediate environment (“here,” “over there”); human relationships (“my mom,” “his daughter”); and person (“you,” “me”), communicating with a search engine would be a lot more like asking a question of another person who’s in the room with you.

This makes for a tremendous counterbalance to Google’s original PageRank algorithm, which instead of caring about what *you* care about tends to amount to a mass averaging of what a majority of people in the world care about. (Scratch that — I mean what a majority of the more influential writers and link-makers on the Internet care about. See for example what Google thinks “western art” means, as opposed to what a teacher might want her students to study.)

Of course, all of this is in the abstract. I ultimately don’t have faith that semantically-smarter results will be the main consequence of looping Google+ into Google’s search results. James’s points are all well-taken. This also feels sort of like too little, too late; Google’s changes to their ranking systems in the past few years (valuing newer pages more highly and paying attention to things like Twitter) and changes in the overall Internet ecosphere (the rise of content farms) have made Google’s results feel increasingly poor-quality, to me. And I’m trying to sort out in my head how advertising money might play into this.

But it’s undeniable that both Google and Facebook will be investing time and resources in developing the clevverness that social network data can bring to search engines. This early stage of the inclusion of Google+ in results may be crap, and we all wish they’d done a better job and not gone opt-out on it, but it will evolve. It’s not likely to go away.

Post a Comment

Your email is never published nor shared. Required fields are marked *