German Data Service Center for Business and Organizational Data (DSZ-BO)

When data access for researchers is provided via remote execution or on-site use, it can be beneficial for data users, if test datasets that mimic the structure of the original data are disseminated in advance. With these test data researchers can develop their analysis code and avoid delays due to otherwise likely syntax errors. It is not the aim of test data to provide any meaningful results or to preserve statistical inferences. Instead, it is important to maintain the structure of the data in a way that any code that is developed with these test data will also run on the original data without further modifications. Achieving this goal can be challenging and costly for complex datasets such as linked employer-employee datasets (LEED) as the links between the establishments and the employees also need to be maintained. We illustrate how useful test data can be develpoed for complex datasets in a straightforward manner at limited costs. Our apporach mainly relies on traditional statistical disclosure control (SDC) techniques such as data swapping and noise addition. The structure of the data is maintained by adding constraints on the swapping procedure.

IASSIST Quarterly

Special issue: A pioneer data librarianWelcome
to the special volume of the IASSIST Quarterly (IQ (37):1-4, 2013).
This special issue started as exchange of ideas between Libbie
Stephenson and Margaret Adams to collect