DataFreeze - scripted static data exports

25 February 2013

Every hour, Spiegel Online serves more than half a million visitors. To
make that work, all content has to be served via a CDN. For data-driven
applications that means: no dynamic queries can be served easily, data
needs to be static. This doesn't need to be a showstopper for great
content, sites like the UNDP data explorer
demonstrate that often, a set of JSON file is enough to power a great
project.

DataFreeze (now dataset, ed.) facilitates
the creation of such applications by freezing relational data from a
SQL database into a set of easy-to-use JSON and CSV files. What data is
included gets controlled by a Freezefile - a simple YAML or JSON file
that specifies queries, output file names and formats. A sample
Freezefile would look like this:

Other blog posts

The first beta version of SpenDB features a small set of well-designed features for data import and analysis. Now the platform is ready to be adopted by anyone interested in exploring financial data, from budgets to procurement.

A few weeks ago, the US team of Hacks/Hackers announced their plans to turn the network of journalism innovators into a collaboration with Google News Labs, starting with an event in Berlin. I tweeted about this, and Phillip Smith wrote a thoughtful reaction. Given this invitation to debate, I wanted to outline my criticism in more detail.

Over the past few months, I have spent my weekends simplifying and modernizing the OpenSpending codebase to create SpenDB - a prototype-stage, light-weight data loading tool and analytical API for government financial data.

If we want to make open data relevant to investigative journalism, we have to simplify the way people access it. We must create a way for our data tools to talk to each other and trade information about the companies and people we are researching.

Developing open data standards is all the rage. In fact, chances are that you're drawing one up right now (I am). In that case, here's a list of things you may believe about your data standard, but that are probably not true.

I've had the chance to contribute to two influence mapping projects in South Africa and Mozambique. While both projects focus on finding possible conflicts of interest within a small group of politically exposed persons, their approach has been very different.

When we discuss data journalism, we often tend to think of nicely formatted spreadsheets full of financial data or crime stats. Yet most journalistic source material does not take the form of tables, but it comes in messy collections of documents, whether on paper, or scraped off a web site.

Building Grano started with a desire to map political and economic influences. Developing it further has made us re-examine our motivations: why would journalists want software to help map out the connections between people in politics and industry?

Hi! I'm a software developer and data wrangler working on
methods to support effective journalism, activism and civic participation.