Abstract
In today’s ever-increasingly digital world, the concept of data privacy has become more and more important. Researchers have developed many privacy-preserving technologies, particularly in the area of data mining and data sharing. These technologies can compute exact data mining models from private data without revealing private data, but are generally slow. We therefore present a framework for implementing efficient privacy-preserving secure approximations of data mining tasks. In particular, we implement two sketching protocols for the scalar (dot) product of two vectors which can be used as sub-protocols in larger data mining tasks. These protocols can lead to approximations which have high accuracy, low data leakage, and one to two orders of magnitude improvement in efficiency. We show these accuracy and efficiency results through extensive experimentation. We also analyze the security properties of these approximations under a security definition which, in contrast to previous definitions, allows for very efficient approximation protocols.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Achlioptas, D.: Database-friendly random projections: Johnson-lindenstrauss with binary coins. Journal of Computer and System Sciences 66(4), 671–687 (2003)
Aggarwal, C., Yu, P.: A general survey of privacy-preserving data mining models and algorithms. In: Privacy-Preserving Data Mining, pp. 11–52 (2008)
Agrawal, R., Srikant, R.: Privacy-preserving data mining. ACM Sigmod Record 29, 439–450 (2000)
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., Zhu, M.: Tools for privacy preserving distributed data mining. ACM SIGKDD Explorations Newsletter 4(2), 28–34 (2002)
Du, W., Atallah, M.: Privacy-preserving cooperative statistical analysis. In: Proceedings of the 17th Annual Computer Security Applications Conference, p. 102. IEEE Computer Society (2001)
Dwork, C.: Differential Privacy: A Survey of Results. In: Agrawal, M., Du, D.-Z., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008)
Feigenbaum, J., Ishai, Y., Malkin, T., Nissim, K., Strausse, M., Wright, R.: Secure multiparty computation of approximations. ACM Transactions on Algorithms (TALG) 2(3), 435–472 (2006)
Fradkin, D., Madigan, D.: Experiments with random projections for machine learning. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 517–522. ACM (2003)
Goethals, B.: Frequent itemset mining implementations repository (2005)
Goethals, B., Laur, S., Lipmaa, H., Mielikäinen, T.: On Private Scalar Product Computation for Privacy-Preserving Data Mining. In: Park, C.-S., Chee, S. (eds.) ICISC 2004. LNCS, vol. 3506, pp. 104–120. Springer, Heidelberg (2005)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58(301), 13–30 (1965)
Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data (2005)
Ioannidis, I., Grama, A., Attallah, M.: A secure protocol for computing the dot-products in clustered and distributed environments. In: International Conference on Parallel Processing, 2002, pp. 379–384. IEEE (2002)
Johnson, W., Lindenstrauss, J.: Extensions of lipschitz mappings into a hilbert space. Contemporary Mathematics 26(189-206), 1 (1984)
Kantarcioglu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Transactions on Knowledge and Data Engineering 16(9), 1026–1037 (2004)
Kantarcioglu, M., Nix, R., Vaidya, J.: An Efficient Approximate Protocol for Privacy-Preserving Association Rule Mining. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 515–524. Springer, Heidelberg (2009)
Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Third IEEE International Conference on Data Mining, ICDM 2003, pp. 99–106. IEEE (2003)
Li, P., Hastie, T., Church, K.: Very sparse random projections. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 287–296. ACM (2006)
Lindell, Y., Pinkas, B.: Privacy Preserving Data Mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000)
Liu, K., Giannella, C., Kargupta, H.: An Attacker’s View of Distance Preserving Maps for Privacy Preserving Data Mining. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 297–308. Springer, Heidelberg (2006)
Liu, K., Kargupta, H., Ryan, J.: Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Transactions on Knowledge and Data Engineering, 92–106 (2006)
Menezes, A., Van Oorschot, P., Vanstone, S.: Handbook of applied cryptography. CRC (1997)
Pinkas, B.: Cryptographic techniques for privacy-preserving data mining. ACM SIGKDD Explorations Newsletter 4(2), 12–19 (2002)
Qiu, L., Li, Y., Wu, X.: Preserving privacy in association rule mining with bloom filters. Journal of Intelligent Information Systems 29(3), 253–278 (2007)
Ravikumar, P., Cohen, W., Feinberg, S.: A secure protocol for computing string distance metrics. In: Proceedings of the Workshop on Privacy and Security Aspects of Data Mining at the International Conference on Data Mining, pp. 40–46. IEEE (2004)
Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 639–644. ACM (2002)
Vaidya, J., Clifton, C.: Privacy preserving naıve bayes classifier for vertically partitioned data. In: 2004 SIAM International Conference on Data Mining, Lake Buena Vista, Florida, pp. 522–526 (2004)
Vaidya, J., Clifton, C.: Privacy-Preserving Decision Trees over Vertically Partitioned Data. In: Jajodia, S., Wijesekera, D. (eds.) Data and Applications Security 2005. LNCS, vol. 3654, pp. 139–152. Springer, Heidelberg (2005)
Vaidya, J., Clifton, C.: Secure set intersection cardinality with application to association rule mining. Journal of Computer Security 13(4), 593–622 (2005)
Wang, W., Garofalakis, M., Ramchandran, K.: Distributed sparse random projections for refinable approximation. In: Proceedings of the 6th International Conference on Information Processing in Sensor Networks, pp. 331–339. ACM (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 IFIP International Federation for Information Processing
About this paper
Cite this paper
Nix, R., Kantarcioglu, M., Han, K.J. (2012). Approximate Privacy-Preserving Data Mining on Vertically Partitioned Data. In: Cuppens-Boulahia, N., Cuppens, F., Garcia-Alfaro, J. (eds) Data and Applications Security and Privacy XXVI. DBSec 2012. Lecture Notes in Computer Science, vol 7371. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31540-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-31540-4_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31539-8
Online ISBN: 978-3-642-31540-4
eBook Packages: Computer ScienceComputer Science (R0)