Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics. For more ranty and less considered opinions, see my Twitter feed.ISSN 2051-8188 View this blog in Magazine View.

Tuesday, December 05, 2017

Blue Planet II, the BBC, and the Semantic Web: a tale of lessons forgotten and opportunities lost

David Attenborough’s latest homage to biodiversity, Blue Planet II is, as always, visually magnificent. Much of its impact derives from the new views of life afforded by technological advances in cameras, drones, diving gear, and submersibles. One might hope that the supporting information online reflected the equivalent technological advances made in describing and sharing information. Sadly, this is not the case. Instead the BBC offers a web site with a video clips and a poster... a $%@£ poster.

This is a huge missed opportunity. Where do people go to learn more about the organisms featured in an episode? How do we discover related content on the BBC and elsewhere? How do we discover the science underpinning each episode that has been so exquisitely filmed and edited?

Perhaps the lack of an online resource reflects a lack of resources, or expertise? Yet one look at the series (and the "Into the blue" epilogues) tells us that resources are hardly limiting. Furthermore, the BBC has previously constructed rich, informative web sites to support natural history programming. The now deprecated BBC Nature Wildlife site had an extensive series of web pages for the organisms featured in BBC programmes, with links to individual clips. For each organism the corresponding web page listed key traits such as behaviours, habitats, and geographic distribution, and each of these traits had its own web page list all organisms with those traits (see, for example the page for Steller's Sea Eagle).

Underlying all this information was a simple vocabulary (the Wildlife Ontology), and the entire corpus is also available in RDF: in other words, the BBC used Semantic Web technologies to structure this information. To get this data you simply append ".rdf" to the URL for a web page. For example, below is the RDF for Steller's Sea Eagle. It is not pretty, but it is a great example of machine-readable data which enables all sorts of interesting things to be built.

For some reason, this web site is now deprecated. As an exercise I grabbed the RDF from the web site, did a little cleaning, and merged it together resulting in a set of around 94,500 triples (statements of the form “subject”, “predicate”, “object”). For example, this triple says that Steller's Sea Eagle is monogamous.

One reason the Semantic Web has struggled to gain widespread adoption is the long list of things you need to get to the point where it is usable. You need data consistently structured using the same vocabulary. You need identifiers that everyone agrees on (or at least can map their own identifiers too). And you need a triple store, which is essentially a graph database, a technology that is still unfamiliar to many. But in this case the BBC has done a lot of the hard work by cleverly minting identifiers based on Wikipedia URLs (”slugs”), and developing a vocabulary to express relationships between organisms, traits, and habitats. All that’s needed is a way to query this data. Rather than use a triple store (most of which are not much fun to install or maintain) I’ve used the delightfully simple approach of employing a Hexastore. Hexastores provide fast querying of graphs by indexing all six permutations of the subject, predicates, object triple (hence “hexa”). The approach is sufficiently simple that for moderately sized databases we can implement it in Javascript and run it in a web browser.

Once you load the page there are no further server requests, other than fetching images. Every query is “live” but takes place in the browser. You can click on the image for a species and get some textural information, as well as images representing traits of that organism. Click on a trait and you discover what organisms share those traits. This example is trivial, but surprisingly rich. I’ve found it fascinating to simply bounce around the images discovering unexpected facts about different species. There’s lots of potential for serendipitous discovery, as well as an enhanced appreciation for just how rich the BBC’s content is. If the Encyclopedia of Life were this engaging I’d be it’s biggest fan.

The question then, is why a similar approach was not taken for Blue Planet II? It can’t be a lack of resources, this series has amazing production values. And yet a wonderful opportunity has been missed. Why not build on the existing work and create an interactive resource that encourages people to explore more deeply and learn more? Much of the existing data could be used, as well as adding all the new species and behaviours we see on our TV screens. Blue Planet also highlights the impacts humans are having on the marine environment, these could be added as categories as well to show wat organisms are susceptible to different impacted (e.g., plastics).

That the BBC thinks a poster is an adequate for of engagement in the digital age speaks of a corporation that, in spite of many triumphs in the digital sphere (e.g., iPlayer) has not fully grasped the role the web can play in making its content more widely useful and relevant, beyond enthralling viewers on a Sunday evening. It also seems oblivious to the fact that it already knows how to deliver rich, informative online content (as evidenced by the now deprecated Wildlife application). So please, BBC, can we have a resource that enables us to learn more about the organisms and habitats that are the subjects of the grandeur and beauty we see on our TV screens?

Follow up

Below is some of the discussion this post generated on Twitter.

Very cool, your hexastore page has now become a great tool, using it for my revision now

Also going to contradict this view slightly (since the /programmes proposition sits in my portfolio). There's a lot more than a poster, for a start: https://t.co/xbHwsaCUHO and there are some assumptions about resource and available content that may not be as simple as suggested

“For some reason, this web site is now deprecated.” “That the BBC thinks a poster is an adequate for (sic) of engagement in the digital age” Have a look at this item, especially the comment by George Osborne at the bottom. https://t.co/OmHiYienzB

I took part in some SW pilots several years ago. The problem is the need to reach critical mass, particularly at a time when budgets are dropping. Public-facing providers like BBC and OU will always prioritise reach: anything additional needs to be off-the shelf to get adoption.