IIT Database Group

iBench

The study of data integration is as old as the field of data management. However, given the maturity of this area it is surprising that rigorous empirical evaluations of research ideas are so scarce. We argue that a stronger focus on empirical work would benefit the integration community as a whole and identify one major roadblock for this work - the lack of comprehensive benchmarks, scenario generators, and publicly available implementations of quality measures. This makes it difficult to compare integration solutions, understand their generality, and understand their performance for different application scenarios. Based on this observation we discuss the requirements for such benchmarks. We argue that the major abstractions used in reasoning about integration problems have undergone a major convergence in the last decade and that this convergence is enabling the application of robust empirical methods to integration problems.

In the iBench project we develop an open source metadata generator (available on github) for creating arbitrarily large and complex mappings, schemas and schema constraints. iBench can be used with a data generator to efficiently generate realistic data integration scenarios with varying degrees of size and complexity. iBench can be used to create benchmarks for different integration tasks including (virtual) data integration, data exchange, schema evolution, mapping operators like composition and inversion, and schema matching.

Our first prototype implementation is based on STBenchmark, the first benchmark for schema mapping systems. Given a configuration file the benchmark generates a complete mapping scenarion (schemas, data, and mappings) by combining randomized instances of mapping primitives (e.g., vertical partitioning or de-normalization) into a complex scenario. In a first step, we have addressed several shortcomings of this benchmark (.e.g, no sharing of schema elements between mapping primitives and no support for logical mapping languages such as st-tgds or SO tgds).
Noteworthy new features are:

Support to generate st-tgds and SO tgds.

Support for arbitrary Skolem functions (SO tgds) and various Skolemization modes (ways to generate skolem arguments).