The event served to introduce the motives behind Databox, the structure of the project and to gauge use cases within the community and potential application developers. The team presented the initial release of a working open source Databox platform, which includes basic data collection support through mobile sensing libraries and selected APIs, provides basic data flow policing and privacy policy enforcement, and supports installation and operation of simple personal data processing apps.

“Can we do detailed, user-centric, contextual analytics at a scalable rate without privacy disasters and legal challenges?”

The morning session began with a formal introduction by Hamed Haddadi into the research project itself, explaining the high-level goals of the project: “Can we do detailed, user-centric, contextual analytics at a scalable rate without privacy disasters and legal challenges?” Richard Mortier followed with a summary of the technical architecture of the Databox and described the driving motive as an open-source, personal networked system, NOT another data silo that acts as a honey pot – the focus being to move computation to where the data is, thus reducing the movement of data itself. Tosh Brown and Yousef Amar then followed with (working!) demonstrations of the Databox SDK and UI, and development of drivers and applications at the container level.

The afternoon session was driven by the attendees, who were all asked to propose applications for and uses of the Databox, with small focus groups facilitating this development.

See my raw notes from the event below.

Thank you to all those who attended, the Databox Project team, and to the staff at Darwin College.

Contribute to the open-source software Databox project

You can contribute to the open-source Databox prototype by visiting the repository and checking out the:

Motivations

The Databox seeks to collate, curate and mediate third-party access to your personal data, whilst creating a user-friendly environment to effectively manage your data. We are generating data more than ever in the form of wearables, social media etc, and our digital footprint can be used by third parties to infer a wealth of information about us. Currently the user has little choice about which data is shared and with whom it is shared – we need a privacy-aware data analytics platform.

Technical Architecture and Design Principles

Performing local data processing and moving data as little as possible has benefits including:

apps process the data, where the computation is. Apps installed as containers with explicit permissions upon installation and provided by the arbiter to allow them to access specific data.

UI and SDK

The SDK provides a user-friendly cloud environment for building Databox applications quickly, and finding approved applications to use on your own Databox – you simply require a GitHub login to access it. The graphical programming environment allows you drag in and connect nodes, view the function output, and debug if needed. There are other useful details such as built-in virtualisations that allow you to view your data as graphs, lists etc, and application manifests which include any resources your app needs and different levels of functionality to correspond with existing devices. Current applications include Hue lights, a mobile sensing driver and Twitter.

environment variables: urls for containers to connect to, data source metadata in Hypercat format, url for data source store, CA root certificate for the container for use over https (and a private key if you want to host on https server)