Value Added Data Systems – Principles and Architecture

Month: March 2016

Data wrangling is the process by which data is identified, extracted, integrated and cleaned for analysis. The New York Times reports that “Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data”.

The VADA project exists to put data wrangling on a firmer footing, in which automation, more systematic use of the available evidence, and carefully targeted user input lead to more efficient data wrangling. One of the goals of the project is to encourage a larger community of researchers and developers to work on techniques and tools for data wrangling. With this in mind, a paper on “Data Wrangling for Big Data: Challenges and Opportunities” has been written by members of the VADA team, and published in the Vision Track of the 19th International Conference on Extending Database Technology, March 15-18, 2016, in Bordeaux, France. This paper makes the case that a concerted effort to address specific challenges in data wrangling can be expected to yield substantial rewards.