Should museums just give up now and let Google take responsibility for knowledge?

Wow – so the introduction of Google Knowledge Graph today has some fascinating implications for museums, knowledge, and everything else. As the Mashable account of the new move by the company explains:

Starting today, a vast portion of Google Search results will work with you to intuit what you really meant by that search entry. Type in an ambiguous query like “Kings” (which could mean royalty, a sports team or a now-cancelled TV show), and a new window will appear on the right side of your result literally asking you which entity you meant. Click on one of those options and your results will be filtered for that search entity.

…In addition to the window which will help users find the right “thing,” Google will also surface summaries for things, which, again, will try to be somewhat comprehensive by tapping into the various databases of knowledge. A search for Frank Lloyd Wright, for instance, will return a brief summary, photos of Wright, images of his famous projects and perhaps, most interestingly, related “things.” People who search for Wright are also looking for other notable architects. It’s a feature that may remind users of Amazon’s penchant for delivering “people who liked this book also bought or searched for this one” results.

And from the Google blog post on the Knowledge Graph, comes this little nugget:

We’ve always believed that the perfect search engine should understand exactly what you mean and give you back exactly what you want. And we can now sometimes help answer your next question before you’ve asked it, because the facts we show are informed by what other people have searched for. For example, the information we show for Tom Cruise answers 37 percent of next queries that people ask about him.

This is fascinating. And in some ways quite monumental for museums. How on earth can museums compete in such an environment? Why would anyone come to a museum (or at least, an online collection) for information when they can go to Google and get information that is likely to be tailored to their needs? And at the same time, how can we find information that runs against this “official” line? Is this simply grand narratives on a much grander scale, only controlled by a commercial entity? Google argues that this will lead to serendipitous discovery – but surely the potential for truly serendipitous discovery is actually reduced, not improved?

however you may feel about content farms like eHow, wikiHow, whatever, what they do do very well is they find questions that people are actually looking for, and answer them directly and completely. And so what we need to do is, by combining content from lots of sources, we can actually really focus on what people want, and worry just exclusively about making that content that’s unique to us.

But what on earth is unique to us? We aren’t the only institutions for whom history is our domain. Nor are we the only ones that tell stories. We have objects, yes, but objects maybe don’t mean all that much online, when all that they can present is a simulacrum of the physical. So what can we offer that Google cannot? Authority? I think that most people would think that Google was fairly authoritative for the majority of information that they are looking for (particularly if it does point to sites like museums and libraries).

So should museums simply give up trying to find better models for presenting their own collections, and work with Google? Or can we instead prove to be an effective counter-point to the global meta-narrative that Google is writing for us using algorithms? Does this move by Google have the potential to essentially change what a museum is or does online in the Internet age?

This is obviously a very quickly bashed out post, filled with first reactions rather than deep contemplation. But I would love to hear what you think about this more too. Lots to discuss on this issue.

What do you think? What could the implications of Knowledge Graph be for museums?

I don’t think this affects the position of museums as knowledge creators. Museums have a lot of unique interesting content – historic photographs of a local street, mass-produced toys with additional insightful notes about their general significance or early works from a now popular artist.
Search engines will always be working on ways to better surface good content. It does put more pressure on museums to expose the data in a structured form, from basic metadata embedded in a collection item webpage to more complex linked data structures.

I think Matt Polackio’s response to my last post could be useful here. He wrote:

the interpretation that is most available will always be privileged over the one that is less so. It’s the same reason political candidates spend billions of dollars on advertising. It will become a real concern for institutions to make their interpretations available online or else they may well be ignored no matter how much merit they may have otherwise. I think this ultimately may be the big driver for online collections databases (at least for museums internally). Curators will receive more recognition for their efforts if those efforts are more accessible. Having access to an infrastructure that provides opportunities for professional recognition will become a line-item on a museum’s list of “reasons to work for us”. That might not happen right away because so many curators don’t seem to care about the online space very much yet, but over time it will, and hopefully it will eventually become just another expectation that everyone has of a museum.

Will Google’s interpretation of objects be the one that is most available? Because through this process, they are starting to interpret information and link that to objects, aren’t they? And isn’t that essentially what museums do? Is Google acting as the ultimate curator, then, but one that responds to the ‘wisdom of the crowds’?

Google doesn’t really have an interpretation of anything. Google just collects, filters and aggregates their information, and this doesn’t really change that. In fact, this doesn’t really change anything. Let’s say, for the moment, that we did all create online accessible collections and we all did agree on a public API so that we could all access and use each other’s data for whatever we wanted to do (putting aside the monumental development effort requirement and the rights issues). Google would be able to access that data just as easily as any museum, and I’m sure they would just as they jumped at the chance to parse and interpret micro-formatted data before anybody else did.

What’s happening in the Knowledge Graph is essentially a massive metadata assignment project. It’s not interpretive so much as it associative. If they find a ton of web pages that reference both Ben Franklin and Philadelphia, they’ll create a metadata link between the two terms. If they don’t find a lot pages or searches with those shared references, they won’t create that link. This is basically an AI-driven tagging initiative, only instead of searching for keywords in a purely lexical fashion, the algorithm is doing some kind of natural language processing on the source pages to try and interpret the meanings of the relationships between terms. This is going to work really well for the examples they’re showing (“Ben Franklin was from Philadelphia,” is pretty easy to parse out of any text about either of those terms) and will be a lot less effective for other terms (“art” for instance. Have you seen how broadly this term is applied to everything from painting to poetry to traditional crafts to music to military strategy and motorcycle maintenance? People can’t agree on what the word ‘art’ means most times and we all probably use at least a dozen different meanings of the term beyond the controversial ones. What chance does an algorithm have of satisfactorily taxonomizing that term? What about culturally sensitive topics like certain ethnic histories and traditions? How is the algorithm going to handle that?).

Google will be drawing their data from somewhere. They might as well be drawing it from us. The real question we have to ask is how much of our data will end up being served through google without attribution? I don’t care if they serve it up. I don’t care if they put it in their little panel on the right side of the screen. There’d better be attribution for where it came from. This is where we really need to put pressure on google.

Also, what they’re showing here is just a very shallow view of information. The right panel shows a few facts about Marie Curie with links to more information. Those few facts aren’t enough to really challenge a museum’s place in the world. The links presumably either go to sources for that information or they point to another Knowledge Graph entry that’s just as minimal as the one you came from (for instance, the Warsaw link is just another list of five or six facts about Warsaw, probably unrelated for the most part to Marie Curie). What we’re looking at in this demo video is something that is much less informative and useful than Wikipedia and who knows how accurate it is (since it’s probably being created almost entirely by algorithms with little to no human proof reading).

Now we could argue (and I would) that association is a form of interpretation and that google gets to decide who is and isn’t visible online. Google’s information will be based on whoever puts the “best” (by google’s algorithm’s standards) data online. This basically introduces a new angle to the whole SEO problem. But that’s really not very different than the situation we’re already in.

Matt, would this change the type or format of information you’d put online? We’ve been talking about museums pulling information from elsewhere for a little while, but would this be something you’d advocate now that Google is doing this?

Thanks so much for this post, Suse. Right this minute I am writing an invited talk about “Libraries in the Age of Google” … And pondering many of the questions you have raised. I am not getting closer to answers thugh – or rather I feel a bit like I am getting closer, but I do not like what those answers are likely to be.

Unfortunately I believe that the ethics and basic “goodness” as a societal force of libraries and museums makes us far more useful and significant than Google , while our methods, techniques and technologies are so far behind Google that we have – through our own inability to grapple with the tasks ahead and join together to create tools and solutions in the last 20 years – given Google the space that we could/should have occupied.

Right now, I am searching around for a solution or silver lining (which I suspect will lie in physical venue, local content and serving a geographically-based local clientele).

Ok, there are some interesting responses/questions coming up in regards to these issues. I had a bit of an exchange with Mia Ridge about this question on Twitter this afternoon, about Linked Open Data and the Semantic Web. She pointed out that these issues are not new ones, and on that topic, Leigh Dodds ‏from Talis Consulting has written this:

While Google can clearly still operate at a scale that most organisations can only dream of, the Knowledge Graph is something that is actually within the reach of most organisations already. The approach is a new and exciting addition to their search engine, but the technology and capability isn’t a radical leap forward. You could build a Knowledge Graph of your own.

Google haven’t created a whole new dataset, they’ve collated existing sources. The data already available in the Linked Data cloud — which includes Freebase — is there for anyone to reuse. It is perfectly feasible for an organisation to create its own “knowledge graph” to serve a particular product or domain by selecting from the available sources.

Indeed this is what the BBC has been doing for some time now. As they’ve described in numerous talks and interviews: by drawing on data from the web, and using it as their content management system, they’ve been able to create graphs of data to power their own innovative applications.

These product graphs weave together open data sources with the BBC’s own unique content. The fundamental technologies and the scale of operations may differ, but both Google and the BBC are deriving real value from focusing on “things, not strings”.

This is a useful perspective, and I look forward to seeing what other discussion and analysis emerges around the web through the coming few days. I think what I’ve been struggling with is trying to understand where “museum knowledge” fits into knowledge more generally, and what is the content that is unique to museums. It’s maybe not a question that is new or unique to Knowledge Graph, even if it was prompted by it.

Finally, to continue Matt’s discussion of association as interpretation: In Museums and the Interpretation of Visual Culture Eilean Hooper-Greenhill writes that through display “museums can make new meanings which are produced through new equivalences. Museums thus have the power to remap cultural territories, and to reshape the geographies of knowledge.” (p21) So maybe a question is worth asking is “Can museums online reshape geographies of knowledge in the age of Google?”

Yes, we absolutely can. There are two ways to do that. The first is to tackle the SEO problem. Again, google isn’t creating anything. They’re an aggregator. If they’re drawing their information from sources produced by us we’ll be shaping that information. We just have to figure out how to get them to draw that info from us (SEO). We’ll never have total control over how this information is interpreted and used, but that’s already true and it always will be. Tackling the SEO problem is easier said than done. Google keeps moving the goalposts on their search engine ranking, I’m sure they’ll constantly be tweaking the algorithms behind the knowledge graph too.

The second way is to compete with them in our specific domains. This is where google ends up being very similar to what wikipedia already is. Wikipedia is often a good place to start looking into something but it’s shallow, incomplete and fraught with inaccuracies. You start at wikipedia, then go to the sources at the bottom of the wikipedia article and then go wherever those sources take you. The knowledge graph is going to be very similar. It will provide lots of links between subjects but for real depth of knowledge you still have to go the source (which google will conveniently link you to because they’re a search engine). If anything, this could increase the visibility of our online presence rather than threaten it.

The key is to create something online that’s worth linking to. We should put more, not less, content up where everyone can see it. We should really move forward building our own infrastructure because now it’s even more important than it was before. Yes, google is completely overshadowing most museums with the knowledge graph right now, but it’s not because the knowledge graph is super awesome. It’s because most of our websites aren’t offering the full depth of information we have access to. That’s what we need to worry about.

Google doesn’t threaten us. If anything they can help us. Google doesn’t want to be in charge of content production, editing and curating. They just want to link everything to everything else so they can capitalize on serving those links (with ads). The biggest risk we have to face in this equation is google’s ability to sell adwords to sponsors who want to hijack terms to promote themselves. How is google going to expand that concept in relation to the knowledge graph? That’s the only possible pollutant in this whole matter that worries me.

Hi Suse, further to your above comment and discussion with Mia, the conclusion of Tim Sherratt’s Australia and New Zealand Society of Indexers keynote ‘Every story has a beginning’, which focuses on the Semantic Web, makes some relevant points. The fifth para is particularly eloquent:

Tim Hitchcock, another member of the ‘With Criminal Intent’ team, has described how online technologies can change the way we access archives. Instead of being forced to navigate the hierarchical structures that archives impose on records, which in turn tend to reflect the workings of the institutions that created the records, we can directly find the people whose lives were regulated, influenced, shaped or controlled by the policies of those institutions.

Instead of merely hearing ‘the institutional voice… in all its stentorian splendour’, he says, we can listen in to ‘the quieter tones uttered by the individual’.[8]

This reminds us that search boxes, along with other digital tools, themselves embody arguments. There are assumptions built into their code about what is relevant, what is significant, what is necessary.

We can build our own tools of course, and we can critique other people’s algorithms. But what if we just want to collect and share stories?

Linked Data gives us a way to present an alternative to Google’s version of the world. We can argue back against the search engines, defining our own criteria for relevance, and building our own discovery networks.

Changing the way we access resources changes the sorts of stories we can tell. Tim Hitchcock asks:

What happens when institutions and archives are ‘decentred’ in favour of the individual? What changes when we examine the world through the collected fragments of knowledge that we can recover about a single person, reorganised as a biographical narrative, rather than as part of an archival system?[9]

Do you think that we, as editors and curators, have a certain civic responsibility to expose audiences to viewpoints and information outside their comfort zones in an effort to counteract this algorithmically-driven confirmation bias, or are people better left unburdened by conflicting data points?

Pariser responds:

In some ways, I think that’s the primary purpose of an editor – to extend the horizon of what people are interested in and what people know. Giving people what they think they want is easy, but it’s also not very satisfying: the same stuff, over and over again. Great editors are like great matchmakers: they introduce people to whole new ways of thinking, and they fall in love.

Now, re-read Pariser’s answer, but substitute the word “museum” in every time he says “editor”. Great museums are like matchmakers: they introduce people to whole new ways of thinking, and they fall in love. Maybe this is what we are getting to here, with your comments Matt and Eleanor. Maybe when Google tries to anticipate every search need based on what others have searched for previously, following desire lines, it becomes the role of museums to provide those alternative routes to information and new discovery.

Would such a viewpoint then see museums as something other than gatekeepers of centralised knowledge? What would that mean?

Well, I don’t really think the “gatekeepers of centralized knowledge” role is being hired for anymore in any organization. Public engagement will have an impact on traditional scholarly roles no matter who’s in charge of the database (for successful ventures at least). The shape of that engagement will largely be determined by the tools that are made available to the general public. Google (all of google, not just this one part) is one of those tools. It’s not a very good one. Wikipedia is another of those tools. It’s better, but it suffers primarily from the fact that scholars are the ones most reluctant to engage through it (if you want wikipedia to be more accurate, start participating in wikipedia).

There aren’t really gatekeepers anymore so much as there are stores of knowledge (both human and non-human) and tools to access those stores. A museum visit is one way to access the stores of knowledge kept in a museum. A book from the shop is another tool. A tour is another. Our online presence (including but not limited to our web sites) is another tool.

People’s quality of engagement with us will be impacted by what tools we make available to them and what existing tools we use to engage with them. We should be laying the foundations for the information services we’re going to be expected to provide in the future. If we don’t, our audiences will use the tools provided by other organizations. They’ll be engaged, but not by us. We have to engage with people both with the tools that are currently available to us (google, wikipedia, social networking) and with new tools that serve the needs and goals of our organizations, tools that we may have to start making ourselves (because who else will?).

Ok – new question: do you think that Google are trying to co-opt the language of museums to buy authority? In describing their new mode of connecting information as being a “Knowledge Graph” filled with discrete “objects,” it feels like there is a real semantic association b/w this, and the language of museums. Particularly if we consider the growing bastardisation of the use of the word curator more generally… What do you think? Is it coincidence, or is Google trying to gain trust by semantically associating themselves with historical institutions that were authoritative?

I don’t think they’re trying to co-opt the language of museums. There are only so many words that be used for ‘things’, ‘information about things’ and ‘selecting a subset of things for a particular purpose’. Objects, knowledge graph and curators are reasonable choices many products might use to sum up similar concepts.