Abstract
Disassociation is a bucketization based anonymization technique that divides a set-valued dataset into several clusters to hide the link between individuals and their complete set of items. It increases the utility of the anonymized dataset, but on the other side, it raises many privacy concerns, one in particular, is when the items are tightly coupled to form what is called, a cover problem. In this paper, we present safe disassociation, a technique that relies on partial suppression, to overcome the aforementioned privacy breach encountered when disassociating set-valued datasets. Safe disassociation allows the km-anonymity privacy constraint to be extended to a bucketized dataset and copes with the cover problem. We describe our algorithm that achieves the safe disassociation and we provide a set of experiments to demonstrate its efficiency.
Similar content being viewed by others
Notes
In what follows, we use km-disassociation to denote a dataset that is disassociated and satisfies km-anonymity.
Vertical partitioning creates km-anonymous record chunks.
References
Barakat, S., al Bouna, B., Nassar, M., Guyeux, C. (2016). On the evaluation of the privacy breach in disassociated set-valued datasets. In Callegari, C., van Sinderen, M., Sarigiannidis, P.G., Samarati, P., Cabello, E., Lorenz, P., Obaidat, M.S. (Eds.) Proceedings of the 13th International Joint Conference on e-Business and Telecommunications (ICETE 2016) - Volume 4: SECRYPT, Lisbon, Portugal, July 26-28, 2016 (pp. 318–326). SciTePress.
Bewong, M., Liu, J., Liu, L., Li, J. (2017). Utility aware clustering for publishing transactional data. In Kim, J., Shim, K., Cao, L., Lee, J.-G., Lin, X., Moon, Y.-S. (Eds.) Advances in Knowledge Discovery and Data Mining (pp. 481–494). Cham: Springer International Publishing.
Biskup, J., Marcel, P.B., Wiese, L. (2011). On the inference-proofness of database fragmentation satisfying confidentiality constraints. In: Proceedings of the 14th Information Security Conference, Xian, China.
Barbaro, M., & Zeller, T. (2006). A face is exposed for aol searcher no. 4417749.
Chen, L., Zhong, S., Wang, L.-E., Li, X. (2016). A sensitivity-adaptive ρ-uncertainty model for set-valued data. In International Conference on Financial Cryptography and Data Security 2016 (pp. 460–473). Berlin: Springer.
Ciriani, V., De Capitani Di Vimercati, S., Foresti, S., Jajodia, S., Paraboschi, S., Samarati, P. (2010). Combining fragmentation and encryption to protect privacy in data storage. ACM Transactions on Information and System Security, 13, 22:1–22:33.
De Capitani di Vimercati, S, Foresti, S., Jajodia, S., Livraga, G., Paraboschi, S., Samarati, P. (2013). Extending loose associations to multiple fragments. In Proceedings of the 27th International Conference on Data and Applications Security and Privacy XXVII, DBSec’13 (pp. 1–16). Berlin: Springer.
Dwork, C., McSherry, F., Nissim, K., Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In Proceedings of the Third Conference on Theory of Cryptography, TCC’06 (pp. 265–284). Berlin: Springer.
Fard, A.M., & Wang, K. (2010). An effective clustering approach to web query log anonymization. In: Proceedings of the 2010 International Conference on Security and Cryptography (SECRYPT) (pp. 1–11). IEEE.
He, Y., & Naughton, J.F. (2009). Anonymization of set-valued data via top-down, local generalization. Proceedings of the VLDB Endowment, 2(1), 934–945.
Jia, X., Pan, C., Xu, X., Zhu, K.Q., Lo, E. (2014). ρ-uncertainty anonymization by partial suppression. In Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (Eds.) Database Systems for Advanced Applications, volume 8422 of Lecture Notes in Computer Science (pp. 188–202). Berlin: Springer International Publishing.
Loukides, G., Liagouris, J., Gkoulalas-Divanis, A., Terrovitis, M. (2015). Utility-constrained electronic health record data publishing through generalization and disassociation. In Gkoulalas-Divanis, A., & Loukides, G. (Eds.) Medical Data Privacy Handbook (pp. 149–177). Berlin: Springer International Publishing.
Loukides, G., Liagouris, J., Gkoulalas-divanis, A., Terrovitis, M. (2014). Disassociation for electronic health record privacy. Journal of Biomedical Informatics, 50, 46–61.
Li, T., Li, N., Zhang, J., Molloy, I. (2012). Slicing: a new approach for privacy preserving data publishing. IEEE Transactions on Knowledge and Data Engineering, 24(3), 561–574.
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M. (2006). l-diversity: Privacy beyond k-anonymity. In: Proceedings of the 22nd IEEE International Conference on Data Engineering (ICDE 2006), Atlanta Georgia.
Samarati, P. (2001). Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering, 13(6), 1010–1027.
Sweeney, L. (2002). k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5), 557–570.
Terrovitis, M., Mamoulis, N., Kalnis, P. (2008). Privacy-preserving anonymization of set-valued data. PVLDB, 1(1), 115–125.
Terrovitis, M., Mamoulis, N., Liagouris, J., Skiadopoulos, S. (2012). Privacy preservation by disassociation. Proceedings of the VLDB Endowment, 5(10), 944–955.
Wang, J., Deng, C., Li, X. (2018). Two privacy-preserving approaches for publishing transactional data streams. IEEE Access, pp. 1–1.
Ke, W., Wang, P., Fu, A.W., Wong, R.C.-W. (2016). Generalized bucketization scheme for flexible privacy settings. Information Sciences, 348, 377–393.
Xiao, X., & Tao, Y. (2006). Anatomy: Simple and effective privacy preservation. In: Proceedings of 32nd International Conference on Very Large Data Bases (VLDB 2006), Seoul, Korea, September 12-15.
Zhang, H., Zhou, Z., Ye, L., Xiaojiang, D.U. (2015). Towards privacy preserving publishing of set-valued data on hybrid cloud. In: IEEE Transactions on cloud computing.
Acknowledgments
This work is funded by the InMobiles companyFootnote 3 and the Labex ACTION program (contract ANR-11-LABX-01-01). Computations have been performed on the supercomputer facilities of the Mésocentre de calcul de Franche-Comté. Special thanks to Ms. Sara Barakat for her contribution in identifying the cover problem.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Awad, N., Al Bouna, B., Couchot, JF. et al. Safe disassociation of set-valued datasets. J Intell Inf Syst 53, 547–562 (2019). https://doi.org/10.1007/s10844-019-00568-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-019-00568-7