The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion

Lin, Chun-Wei; Hong, Tzung-Pei; Yang, Kuo-Tung; Wang, Shyue-Liang

doi:10.1007/s10489-014-0590-5

The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion

Published: 20 September 2014

Volume 42, pages 210–230, (2015)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Chun-Wei Lin^1,2,
Tzung-Pei Hong^3,4,
Kuo-Tung Yang³ &
…
Shyue-Liang Wang⁵

382 Accesses
52 Citations
Explore all metrics

Abstract

Data mining technology is used to extract useful knowledge from very large datasets, but the process of data collection and data dissemination may result in an inherent threat to privacy. Some sensitive or private information concerning individuals, businesses and organizations has to be suppressed before it is shared or published. Privacy-preserving data mining (PPDM) has become an important issue in recent years. In the past, many heuristic approaches were developed to sanitize databases for the purpose of hiding sensitive information in PPDM, but data sanitization of PPDM is considered to be an NP-hard problem. It is critical to find the balance between privacy protection for hiding sensitive information and maintaining the discovery of knowledge, or even reducing artificial knowledge in the sanitization process. In this paper, a GA-based framework with two optimization algorithms is proposed for data sanitization. A novel evaluation function with three concerned factors is designed to find the appropriate transactions to be deleted in order to hide sensitive itemsets. Experiments are then conducted to evaluate the performance of the proposed GA-based algorithms with regard to different factors such as the execution time, the number of hiding failures, the number of missing itemsets, the number of artificial itemsets, and database dissimilarity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big data privacy: a technological perspective and review

Article Open access 26 November 2016

Priyank Jain, Manasi Gyanchandani & Nilay Khare

Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review

Article Open access 22 February 2023

U. H. W. A. Hewage, R. Sinha & M. Asif Naeem

Algorithms for frequent itemset mining: a literature review

Article Open access 24 March 2018

Chin-Hoong Chee, Jafreezal Jaafar, … William Yeoh

References

Amiri A (2007) Dare to share: protecting sensitive knowledge with data sanitization. Decis Support Syst 43(1):181–191
Article Google Scholar
Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 217–228
Bache K, Lichman M (2012) Uci machine learning repository. https://archive.ics.uci.edu/ml/datasets.html
Aggarwal CC, Yu PS (2009) A survey of uncertain data algorithms and applications. IEEE Trans Data Knowl Eng 21(5):609–623
Article Google Scholar
Aggarwal CC, Pei J, Zhang B (2006) On privacy preservation against adversarial data mining. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 510–516
Chen CM, Lin YH, Sun HM (2013) SASHIMI: secure aggregation via successively hierarchical inspecting of message integrity on WSN. Journal of Information Hiding and Multimedia Signal Processing
Lin CW, Hong TP (2013) A survey of fuzzy web mining. Wiley Interdiscip Rev: Data Min Knowl Disc 3(3):190–199
MathSciNet Google Scholar
Lin CW, Hong TP, Chang CC, Wang SL (2013) A greedy-based approach for hiding sensitive itemsets by transaction insertion. J Info Hiding Multimedia Signal Proc 4(4):201–227
Google Scholar
Cheung DW, Ng VT, Tam BW (1997) Incremetal updates of discovered multi-level association rules. Int J Art Intell Tools 6(2):273–290
Article Google Scholar
Cheung DWL, Han J, Ng V , Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. Int Conf Data Eng:106–114
Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. Int Workshop Info Hiding:369–383
Giannotti F, Lakshmanan LVS, Monreale A, Pedreschi D, Wang HW (2012) Privacy-preserving mining of association rules from outsourced transaction databases. IEEE Syst J 7(3):385– 395
Article Google Scholar
Lan GC, Hong TP, Tseng VS (2011) Discovery of high utility itemsets from on-shelf time periods of products. Expert Syst Appl 38(5):5851–5857
Article Google Scholar
Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases data and knowledge engineering
Holland JH (1992) Adaptation in natural and artificial systems. MIT Press
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl Dis 8(1):53–87
Article MathSciNet Google Scholar
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC (2004) Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440
Article Google Scholar
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106
Google Scholar
Dunning LA, Kresman R (2013) Privacy preserving data sharing with anonymous id assignment. IEEE Trans Info Forensics Security 8(2):402–413
Article Google Scholar
Dehkordi MN, Badie K, Zadeh AK (2009) A novel method for privacy preserving in association rule mining based on genetic algorithms. J Software 4(6):555–562
Article Google Scholar
Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866–883
Article Google Scholar
Berkhin P (2006) A survey of clustering data mining techniques. In: Grouping multidimensional data, pp 25–71
Quinlan JR (1983) C4.5: Programs for machine learning. Morgan Kaufmann Publishers Inc.
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: The international conference on very large data bases, pp 487–499
Agrawal R, Srikant R (1995) Mining sequential patterns. In: The international conference on data engineering, pp 3–14
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: ACM SIGMOD international conference on management of data, pp 439–450
Agrawal R, Imielinski T, Swami A (1993) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925
Article Google Scholar
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: The international conference on extending database technology: advances in database technology, pp 3–17
Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. In: The conference on emerging artificial intelligence applications in computer engineering: real word AI systems with applications in eHealth, HCI, information retrieval and pervasive technologies, pp 3–24
Han S, Ng WK (2007) Privacy-preserving genetic algorithms for rule discovery. In: International conference on data warehousing and knowledge discovery, pp 407–417
Oliveira SM, Zaane O, Saygin Y (2004) Secure association rule sharing. Lect Notes Comput Sci 3056:74–85
Article Google Scholar
Oliveira SRM, Zaane OR (2002) Privacy preserving frequent itemset mining. In: IEEE international conference on privacy, security and data mining, pp 43–54
Liu TH, Wang Q, Zhu HF (2014) A multi-function password mutual authentication key agreement scheme with privacy preserving. Journal of Information Hiding and Multimedia Signal Processing
Hong TP, Wang CY (2007) Maintenance of association rules using pre-large itemsets. In: Intelligent databases: technologies and applications, pp 44–60
Hong TP, Lin CW, Wu YL (2008) Incrementally fast updated frequent pattern trees. Expert Syst Appl 34(4):2424–2435
Article Google Scholar
Hong TP, Wang CY, Tao YH (2001) A new incremental data mining algorithm using pre-large itemsets. Intell Data Anal 5:111–129
MATH Google Scholar
Wu TY, Tseng YM (2013) Further analysis of pairing based traitor tracing schemes for broadcast encryption. Security and Communication Networks
Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. ACM SIGMOD Record
Wu YH, Chiang CM, Che ALP (2007) Hiding sensitive association rules with limited side effects. IEEE Trans Knowl Data Eng 19(1):29–42
Article Google Scholar
Lindell Y, Pinkas B (2000) Privacy preserving data mining. In: The annual international cryptology conference on advances in cryptology, pp 36–54
Zheng Z, Kohavi R, Mason L (2001) Real world performance of association rule algorithms. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 401– 406

Download references

Acknowledgment

This research was partially supported by: the Natural Scientific Research Innovation Foundation in Harbin Institute of Technology under the grant HIT.NSRIF.2014100; the Shenzhen Strategic Emerging Industries Program under the grant ZDSY20120613125016389; and the Shenzhen Peacock Project, China, under the grant KQC201109020055A.

Author information

Authors and Affiliations

Innovative Information Industry Research Center (IIIRC), Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China
Chun-Wei Lin
Shenzhen Key Laboratory of Internet Information Collaboration, School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China
Chun-Wei Lin
Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan, Republic of China
Tzung-Pei Hong & Kuo-Tung Yang
Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan, Republic of China
Tzung-Pei Hong
Department of Information Management, National University of Kaohsiung, Kaohsiung, Taiwan, Republic of China
Shyue-Liang Wang

Authors

Chun-Wei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Tzung-Pei Hong
View author publications
You can also search for this author in PubMed Google Scholar
Kuo-Tung Yang
View author publications
You can also search for this author in PubMed Google Scholar
Shyue-Liang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chun-Wei Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, CW., Hong, TP., Yang, KT. et al. The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion. Appl Intell 42, 210–230 (2015). https://doi.org/10.1007/s10489-014-0590-5

Download citation

Published: 20 September 2014
Issue Date: March 2015
DOI: https://doi.org/10.1007/s10489-014-0590-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion

Abstract

Access this article

Similar content being viewed by others

Big data privacy: a technological perspective and review

Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review

Algorithms for frequent itemset mining: a literature review

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Big data privacy: a technological perspective and review

Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review

Algorithms for frequent itemset mining: a literature review

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation