In this work we propose Anonym Database Sampler (ADS), a ﬂexible and modular system capable of extracting an anonymised, consistent and representative sample
from a relational database. ADS was envisioned for use in testing and development
environments. To this end, a sample speciﬁcation input is requested from the user, that
is used by ADS’s sampling engine to perform a stratiﬁed random sample. Afterwards
a First-choice hill climbing algorithm is applied to the sample, optimising the selected
data towards the speciﬁed requisites.
Finally, if some restrictions are still to be met, tuples and/or keys modiﬁcations are
performed, ensuring that the ﬁnal sample fully complies with the initial sample speci-
ﬁcation. While having a representative and sound database that developers can use in
these environments can be a great advantage, we assume that this representativeness
does not need to comply with a truly statistical representativity, which would be much
harder to obtain. Thereby, ADS samples are not appropriate for any kind of statistical
data analysis. After the sample being successfully extracted, due to the sensitivity of
the data contained in most organisation databases, a data anonymisation step is performed. The sampled data is consistently enciphered and masked, preventing data
privacy breaches that could occur by delivering to developers a database containing
some real operational data.