Dow Jones' DNA platform produces big data for machine learning

Reinsurance, public health and capital markets are some of the industries using the platform

Laurie is a staff writer for IDG titles including Techworld, Computerworld UK and CIO UK. She studied psychology followed by a Masters in Journalism, and has since worked in marketing and as a freelance consumer insight writer. Her particular interests include consumer tech, startup tech culture and how technology is shaping society.

Follow:

Share

Twitter

Facebook

LinkedIn

Dow Jones, the longstanding financial news behemoth founded more than 130 years ago, has developed its own mass data service called DNA that allows clients to extract pre-existing and real time data for their operations. Some of the biggest clients are industries such as biological surveillance, reinsurance and capital markets.

The Data, News and Analytics platform, or DNA for short, was launched in March 2017 but is still in beta.

General manager of the DNA Platform and Technology Partnerships at Dow Jones, Niranjan Thomas, explains the business case for making its data sets more readily available: "A couple of years back, what became really quite clear to us, both internally and from the market was that we really needed to be able to unlock all of the great data assets that we have and make it easier for one of our enterprise customers to be able to get at that data programmatically."

This was achieved primarily through the use of APIs and presenting information through feeds.

Dow Jones owns a number of well-established data products including its Newswire service, as well as Factiva.com, an aggregated news product, and more specialised products such as private equity and venture capital research tool VentureSource.

Thomas adds that the creation of DNA was precipitated by a shift in how the firm's customers were using data.

"Cloud computing is by far the biggest driver of that shift in terms of the appetite and ability of a particular large enterprise customer to be able to consume high volumes of data and really extract for themselves a level of insight that they haven't been able to do before," says Thomas.

The platform's home in the cloud means that third parties can download millions or even tens of millions of documents at once - ideal for machine learning projects.

The platform offers both 'streaming' and 'snapshot' data APIs. While snapshots allow third parties to extract data at extremely high volumes very quickly, the streaming capability allows organisations to feed real-time data into their environments.

Alongside this are APIs that allow organisations or individuals to search and retrieve specific articles and data records. While the former services are designed to be interacted with by data scientists, the APIs are more likely to be used by software engineer application developers, and integrated into content - such as a mobile article, for example.

Along with data directly from Dow Jones, the paid-for subscription service also delivers information from their publication, the Wall Street Journal, and third-party licensed data from the likes of Thomson Reuters and the New York Times.

But who is using it and how? Thomas says there has been a large uptake in the reinsurance industry, where clients are eager to invest heavily in data and analytics. He says this is due to a boom in the volume of property and casualty insurance claims paid out in recent years.

"That's where they're leveraging the DNA dataset, to be able to find deeper insights," he says. "It helps them better understand the risk profile. It helps them better manage risk going forward."

Another major use for the platform is in biological surveillance. Thomas explains how one key client in North America has taken advantage of the tool: "Really what they're looking for is an early warning around the spread of disease, so particularly the outbreak of specific medical conditions.

"The particular organisation has a very broad remit, not just within North America but also with global health agencies monitoring certain outbreaks." This includes viruses such as Zika.

To do so, Thomas says, they are leveraging a wide range of data sources, including high-quality news publications.

Another more obvious use for DNA is in the capital markets domain. "We're really seeing [asset managers] make a real shift to a more quantitative approach or what we call the quantamental - which is kind of between fundamental and quantitative," he says.

"They're looking for kind of the non-quantitative factors that may influence investment strategy."