Abstract
In the endeavor of finding ways for easy data access for researchers not employed at a statistical agency remote data access seems to be an attractive alternative to the current standard of either altering the data substantially before release or allowing access only at designated data archives or research data centers. Data perturbation is often not accepted by the researchers since they do not trust the results from the altered data sets. But on-site access puts some heavy burdens on the researcher and the data providing agency both in terms of time and money. Remote data access or remote analysis servers that allow to submit queries without actually seeing the microdata have the potential of overcoming both these disadvantages. However, even if the microdata is not available to the researcher directly, disclosure of sensitive information for individual survey respondents is still possible.
In this paper we illustrate how an intruder could use some commonly available background information to reveal sensitive information using simple linear regression. We demonstrate the real risks from this approach with an empirical evaluation based on a German establishment survey, the IAB Establishment Panel. Although these kind of attacks can easily be prevented once the agency is aware of the problem, this small simulation aims to emphasize that there might be many ways to obtain sensitive information using multivariate analysis and not all of them are obvious. Thus, agencies thinking about actually implementing some form of remote data access should consider carefully which queries could be allowed by the system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Drechsler, J.: Multiple imputation of missing values in the wave 2007 of the IAB Establishment Panel. IAB Discussion Paper No. 6 (2010)
Fischer, G., Janik, F., Müller, D., Schmucker, A.: The IAB Establishment Panel – from sample to survey to projection. Tech. rep., FDZ-Methodenreport No. 1 (2008)
Gomatam, S., Karr, A.F., Reiter, J.P., Sanil, A.P.: Data dissemination and disclosure limitation in a world without microdata: A risk-utility framework for remote access servers. Statistical Science 20, 163–177 (2005)
Hoaglin, D.C., Welsh, R.E.: The Hat Matrix in Regression and ANOVA. The American Statistician 32, 17–22 (1978)
Kölling, A.: The IAB-Establishment Panel. Journal of Applied Social Science Studies 120, 291–300 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bleninger, P., Drechsler, J., Ronning, G. (2010). Remote Data Access and the Risk of Disclosure from Linear Regression: An Empirical Study. In: Domingo-Ferrer, J., Magkos, E. (eds) Privacy in Statistical Databases. PSD 2010. Lecture Notes in Computer Science, vol 6344. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15838-4_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-15838-4_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15837-7
Online ISBN: 978-3-642-15838-4
eBook Packages: Computer ScienceComputer Science (R0)