Abstract
The aim of this study was to compare different microdata protection methods for numerical variables under various conditions. Most of the methods used in this paper have been implemented in the R-package sdcMicro which is available for free on the comprehensive R archive network ( http://cran.r-project.org ). The other methods used can be easily applied using other R-packages. While most methods work well for homogeneous data sets, some methods fail completely when confidential variables contain outliers which is almost always the case with data from official statistics. To overcome these problems we have robustified popular methods such as microaggregation or shuffling which is based on a regression model. All methods have beed tested on bivariate data sets featuring different outlier scenarios. Additionally, a simulation study was performed.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Templ, M.: sdcMicro: A package for statistical disclosure control in R. In: Bulletin of the International Statistical Institute, 56th Session (2007)
Meindl, B., Templ, M.: The anonymisation of the CVTS2 and income tax dataset. an approach using R-package sdcMicro. In: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, Monographs of Official Statistics (to appear, 2007)
Karr, A., Oganian, A., Reiter, J., Woo, M.J.: New measures of data utility. Technical report (2006)
Templ, M.: Software development for SDC in R. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 347–359. Springer, Heidelberg (2006)
Muralidhar, K., Sarathy, R., Dankekar, R.: Why swap when you can shuffle? a comparison of the proximity swap and data shuffle for numeric data. In: Privacy in Statistical Databases. LNCS, pp. 164–176. Springer, Heidelberg (2006)
Muralidhar, K., Sarathy, R.: Data shuffling- a new masking approach for numerical data. Management Science 52(2), 658–670 (2006)
Templ, T.: sdcMicro: Statistical Disclosure Control methods for the generation of public- and scientific-use files, R package version 2.4.7 (2008)
Templ, M.: sdcMicro: A new flexible R-package for the generation of anonymised microdata - design issues and new methods. In: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, Monographs of Official Statistics (to appear, 2007)
R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2008) ISBN 3-900051-07-0
Brand, R., Giessing, S.: Report on preparation of the data set and improvements on sullivans algorithm. Technical report (2002)
Kim, J.: A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the Section on Survey Research Methods. American Statistical Association, pp. 303–308 (1986)
Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Lenz, R., Longhurst, J., Schulte-Nordholt, E., Seri, G., De Wolf, P.P.: Handbook on statistical disclosure control version 1.01 (2007)
Brand, R.: Microdata protection through noise addition. In: PSD 2004. LNCS, pp. 347–359. Springer, Heidelberg (2004)
Ting, D., Fienberg, S., Trottini, M.: ROMM methodology for microdata release. In: Monographs of official statistics, Work session on statistical data confidentiality, Eurostat, Luxembourg (2005)
Dalenius, T., Reiss, S.: Data-swapping: A technique for disclosure control. In: Proceedings of the Section on Survey Research Methods, vol. 6, pp. 73–85. American Statistical Association (1982)
Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the 1992 Symposium on Design and Analysis of Longitudinal Surveys, Statistics Canada, Ottawa, pp. 195–204 (1993)
Muralidhar, K., Parsa, R., Sarathy, R.: A general additive data perurbation method for database security. Management Science 45, 1399–1415 (1999)
Huber, P.: Robust Statistics. Wiley and Sons, New York (1981)
Moore, R.: Controlled data-swapping techniques for masking public use microdata sets. Technical report (1996)
Maronna, R.: Robust M-estimators of multivariate location and scatter. The Annals of Statistics 4(1), 51–67 (1976)
Rousseeuw, P.: Multivariate estimation with high breakdown point. In: Mathematical Statistics and Applications, Akademiai Kiado, Budapest, pp. 283–297 (1985)
Maronna, R., Zamar, R.: Robust multivariate estimates for highdimensional datasets. Technometrics 44, 307–317 (2002)
Domingo-Ferrer, J., Mateo-Sanz, J.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. on Knowledge and Data Engineering 14(1), 189–201 (2002)
Mateo-Sanz, J., Martínez-Ballesté, A., Domingo-Ferrer, J.: Fast generation of accurate synthetic microdata. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 298–306. Springer, Heidelberg (2004)
Burridge, J.: Information preserving statistical obfuscation. Statistics and Computing 13, 321–327 (2003)
Torra, V., Abowd, J., Domingo-Ferrer, J.: Using mahalanobis distance-based record linkage for disclosure risk assessment. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 233–242. Springer, Heidelberg (2006)
Griffin, R., Navarro, A., Flores-Baez, L.: Disclosure avoidance for the 1990 census. In: Proceedings of the Section on Survey Research Methods, pp. 516–521. American Statistical Association (1989)
Rubin, D.: Discussion of statistical disclosure limitation. Journal of Official Statistics 9(2), 461–468 (1993)
Iman, R., Conover, W.: A distribution-free approach to inducing rank correlation among input variables. Communications in Statistics B11, 311–334 (1982)
Stein, M.: Large sample properties of simulations using latin hypercube sampling. Technometrics 29, 143–151 (1987)
Wyss, G., Jorgensen, K.: Sandia’s latin hypercube sampling software. Technical report sand98-0210, Sandia National Laboratories, Albuquerque, NM (1998)
Minasny, B.: Sampling methods for uncertainty analysis, Matlab Toolbox for Latin Hypercube Sampling (2003)
Yancey, W., Winkler, W., Creecy, R.: Disclosure risk assessment in perturbative microdata protection. In: Inference Control in Statistical Databases. LNCS, pp. 49–60. Springer, Heidelberg (2002)
Mateo-Sanz, J.M., Sebe, F., Domingo-Ferrer, J.: Outlier protection in continuous microdata masking. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 201–215. Springer, Heidelberg (2004)
Mateo-Sanz, J., Domingo-Ferrer, J., Sebé, F.: Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Mining and Knowledge Discovery 11, 181–193 (2005)
Domingo-Ferrer, J., Mateo-Sanz, J., Torra, V.: Comparing sdc methods for microdata on the basis of information loss and disclosure risk. In: Pre-Proccedings of ETK-NTTS, vol. 2, pp. 807–826. Springer, Heidelberg (2001)
Templ, M., Meindl, B.: Robust statistics meets SDC: New disclosure risk measures for continuous microdata masking. In: Domingo-Ferrer, J., Saygin, Y. (eds.) PSD 2008. LNCS, vol. 5262, pp. 177–189. Springer, Heidelberg (2008) (submitted and in review)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Templ, M., Meindl, B. (2008). Robustification of Microdata Masking Methods and the Comparison with Existing Methods. In: Domingo-Ferrer, J., Saygın, Y. (eds) Privacy in Statistical Databases. PSD 2008. Lecture Notes in Computer Science, vol 5262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87471-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-87471-3_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87470-6
Online ISBN: 978-3-540-87471-3
eBook Packages: Computer ScienceComputer Science (R0)