Tracking Data Changes

Matt Macduff (left) and Sherman Beus (right) will present their work at BigData 2014 in Anchorage, Alaska.

As part of the upcoming Third International Congress on Big Data, or BigData 2014, computer scientists and engineers from the Atmospheric Radiation Measurement (ARM) Data Integration team at Pacific Northwest National Laboratory and Rensselaer Polytechnic Institute (RPI) will showcase their model for versioning complex data sets, such as those generated and stored daily from the ARM Climate Research Facility’s fixed and mobile user facilities. Currently, the collection and storage pace for that data exceeds 10 terabytes monthly. The authors—Matt Macduff and Sherman Beus (both of PNNL) and their co-author Benno Lee (RPI)—will present their paper, “Versioning Complex Data,” during the BigData session on Friday, June 27, 2014.

The team’s work addresses issues with changes to complex, large data sets meant as references for analysis and how to track and communicate those changes to users. Their data versioning model presumes continuity in data, meaning adjacent data generally belong to the same version. With that, they are able to accommodate routine new data, such as those collected daily by ARM’s widespread climate science instrumentation, without changing the data’s version number—only growing the set size. A change to existing data requires a new version of the file. Manual methods still are required to discern major and minor changes to the data, and an analysis of historical ARM data changes shows this is manageable.

“While we tested the model against the time-focused domain of ARM, it would also work for other domains, such as geo-spatial,“ Macduff, an engineer and team lead with ARM’s Data Integration team, explained. “With some simple accommodations for the flow of data changes in a domain, it creates valuable information for potential users. I look forward to sharing our findings at BigData2014.”

BigData 2014 kicks off on June 27, 2014 in Anchorage, Alaska. This year, the event is focusing on exploring business and economic insights provided by value-added services.