Global news agency Reuters is one of those venerable organisations that it's hard not to respect. Superficially, at least, it would be easy to assume them to be an organisation whose days must surely be numbered in a world where the data they used to monetise is increasingly available on the open Web. They have, of course, recognised this and work to diversify the means by which they continue to generate revenue.

Two events in quick succession caused me to take a closer look at what Reuters were doing; a look that resulted in an interesting podcast exploration of the issues with Barak Pridor, CEO of Reuters' ClearForest subsidiary.

Wenig's attitude toward the shifting value proposition around data, at least as reported by O'Reilly, is refreshing, especially coming from someone so embedded in traditional models with their emphasis upon scarcity and control. Simply thinking the right thoughts isn't enough, of course; even if you're the CEO. As such, it was refreshing to see an apparent alignment of management rhetoric with practical implementation both inside Reuters (where ClearForest 'magic' is spreading through the organisation) and (more remarkably) on the open Web via the Calais API. As Barak and I discussed in the podcast, Reuters is making (some of) their ClearForest magic available to all comers via Calais, and that availability even extends to their competitors. It's surprising enough to hear the Web 2.0 cognoscenti talk of such sharing, but for a stalwart of traditional media it seemed truly remarkable, and we can only assume that many well-thumbed copies of The Cluetrain Manifesto and its ilk must adorn the shelves of Reuters executives.

As the Calais press release states;

"The Calais Web service enables publishers, bloggers and sites of all kinds to automatically metatag the people, places, facts and events in their content to increase its search relevance and accessibility on the Web. It also lets content consumers, such as search engines, news portals, bookmarking services and RSS readers, submit content for automatic semantic metatagging that is performed in well under a second.

The Calais Web service returns content in an open, interoperable and entirely portable format, with a unique identifier that can be easily integrated into social networks, widgets and semantic applications like Powerset, Freebase, Twine, Hakia, Wikia, Blue Organizer and more.

Calais is a new Reuters initiative that supports the interoperability of content and the development of the Semantic Web – a layer of contexutal intelligence that enables computers to ‘read’ content, detect connections and even make new ones. Calais leverages Reuters’ substantial investment in semantic technologies and Natural Language Processing (NLP) to offer free metadata generation services, developer tools and an open standard for the generation of semantic content."

During our conversation, Barak describes the way in which ClearForest's products analyse free text submitted to them and enrich the submitted text by inferring and applying structure that can then be used to improve downstream applications. The Calais API exposes much of this capability to external developers, who are now able to submit their own text via the API and have it passed back to them in enriched form. At a simple level, this enrichment may include such things as the addition of links to stock information whenever the name of a listed company is detected in the text. He also explains why the name Calais was chosen!

It will be interesting to see the extent to which third party application developers begin to draw upon network-based services such as Calais in order to enhance their own products. A WordPress module that automatically submits draft blog posts to Calais for enrichment, perhaps?

It does seem likely that this reasonably straightforward enrichment of existing processes will be one early success for the Semantic Web, ahead of widespread adoption of more disruptive improvements.