Online journalism news

Tag Archives: John O’Donovan

If you are more interested in the cogs and wheels behind the BBC News site’s redesign than the end product, a post by their chief technical architect John O’Donovan this week should be of interest.

The BBC has one of the oldest and largest websites on the internet and one of the goals of the update to the News site was to also update some of the core systems that manage content for all our interactive services.

O’Donovan first outlines the reasoning behind keeping with a Content Production System (CPS), rather than moving over to Content Management System (CMS), before giving a detailed look at the latest model – version 6 – that they have opted for.

The CPS has been constantly evolving and we should say that, when looking at the requirements for the new news site and other services, we did consider whether we should take a trip to the Content Management System (CMS) Showroom and see what shiny new wheels we could get.

However there is an interesting thing about the CPS – most of our users (of which there are over 1,200) think it does a pretty good job [checks inbox for complaints]. Now I’m not saying they have a picture of it next to their kids on the mantelpiece at home, but compared to my experience with many organisations and their CMS, that is something to value highly.

The main improvements afforded by the new version, according to O’Donovan, include a more structured approach, an improved technical quality of content produced and an ability to use semantic data to define content and improve layouts.

Using their World Cup site as an example, the BBC have posted an article explaining how they used, and intend to develop, linked data and semantic technologies to better present and share data.

John O’Donovan who is chief technical architect at BBC Future, explained how such technologies more effectively aggregate data on information-rich topics, such as the World Cup, which he says has more index pages than the entire rest of the BBC Sport site.

Another way to think about all this is that we are not publishing pages, but publishing content as assets which are then organised by the metadata dynamically into pages, but could be reorganised into any format we want much more easily than it could before.

The principles behind this are the ones at the foundation of the next phase of the internet, sometimes called the Semantic Web, sometimes called Web 3.0. The goal is to be able to more easily and accurately aggregate content, find it and share it across many sources. From these simple relationships and building blocks you can dynamically build up incredibly rich sites and navigation on any platform.

He identified the process as a shift from simply publishing stories and index pages, to publishing the content with intelligent tagging, saving time and improving accuracy.

He adds that it also enables the site to “accurately share this content and link out to other sites”, which was illustrated by a recent paidContent report where the BBC was rated number five for directing traffic to UK newspaper websites.

“Linked Data is about using the web to connect related data that wasn’t previously linked, or using the web to lower the barriers to linking data currently linked using other methods.” (http://linkeddata.org)

I talked about how 2009 was, for me, a key year in data and journalism – largely because it has been a year of crisis in both publishing and government. The seminal point in all of this has been the MPs’ expenses story, which both demonstrated the power of data in journalism, and the need for transparency from government. For example: the government appointment of Sir Tim Berners-Lee, the search for developers to suggest things to do with public data, and the imminent launch of Data.gov.uk around the same issue.

Even before then the New York Times and Guardian both launched APIs at the beginning of the year, MSN Local and the BBC have both been working with Wikipedia and we’ve seen the launch of a number of startups and mashups around data including Timetric, Verifiable, BeVocal, OpenlyLocal, MashTheState, the open source release of Everyblock, and Mapumental.

Q: What are the implications of paywalls for Linked Data?
The general view was that Linked Data – specifically standards like RDF [Resource Description Format] – would allow users and organisations to access information about content even if they couldn’t access the content itself. To give a concrete example, rather than linking to a ‘wall’ that simply requires payment, it would be clearer what the content beyond that wall related to (e.g. key people, organisations, author, etc.)

Leigh Dodds felt that using standards like RDF would allow organisations to more effectively package content in commercially attractive ways, e.g. ‘everything about this organisation’.

Q: What can bloggers do to tap into the potential of Linked Data?
This drew some blank responses, but Leigh Dodds was most forthright, arguing that the onus lay with developers to do things that would make it easier for bloggers to, for example, visualise data. He also pointed out that currently if someone does something with data it is not possible to track that back to the source and that better tools would allow, effectively, an equivalent of pingback for data included in charts (e.g. the person who created the data would know that it had been used, as could others).

Q: Given that the problem for publishing lies in advertising rather than content, how can Linked Data help solve that?
Dan Brickley suggested that OAuth technologies (where you use a single login identity for multiple sites that contains information about your social connections, rather than creating a new ‘identity’ for each) would allow users to specify more specifically how they experience content, for instance: ‘I only want to see article comments by users who are also my Facebook and Twitter friends.’

The same technology would allow for more personalised, and therefore more lucrative, advertising. John O’Donovan felt the same could be said about content itself – more accurate data about content would allow for more specific selling of advertising.

Martin Belam quoted James Cridland on radio: ‘[The different operators] agree on technology but compete on content’. The same was true of advertising but the advertising and news industries needed to be more active in defining common standards.

Leigh Dodds pointed out that semantic data was already being used by companies serving advertising.

Other notes
I asked members of the audience who they felt were the heroes and villains of Linked Data in the news industry. The Guardian and BBC came out well – The Daily Mail were named as repeat offenders who would simply refer to ‘a study’ and not say which, nor link to it.

Martin Belam pointed out that the Guardian is increasingly asking itself ‘how will that look through an API?’ when producing content, representing a key shift in editorial thinking. If users of the platform are swallowing up significant bandwidth or driving significant traffic then that would probably warrant talking to them about more formal relationships (either customer-provider or partners).

A number of references were made to the problem of provenance – being able to identify where a statement came from. Dan Brickley specifically spoke of the problem with identifying the source of Twitter retweets.

Dan also felt that the problem of journalists not linking would be solved by technology. In conversation previously, he also talked of ‘subject-based linking’ and the impact of SKOS [Simple Knowledge Organisation System] and linked data style identifiers. He saw a problem in that, while new articles might link to older reports on the same issue, older reports were not updated with links to the new updates. Tagging individual articles was problematic in that you then had the equivalent of an overflowing inbox.

Finally, here’s a bit of video from the very last question addressed in the discussion (filmed with thanks by @countculture):