Skip to main content

Advertisement

Log in

The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Data mining technology is used to extract useful knowledge from very large datasets, but the process of data collection and data dissemination may result in an inherent threat to privacy. Some sensitive or private information concerning individuals, businesses and organizations has to be suppressed before it is shared or published. Privacy-preserving data mining (PPDM) has become an important issue in recent years. In the past, many heuristic approaches were developed to sanitize databases for the purpose of hiding sensitive information in PPDM, but data sanitization of PPDM is considered to be an NP-hard problem. It is critical to find the balance between privacy protection for hiding sensitive information and maintaining the discovery of knowledge, or even reducing artificial knowledge in the sanitization process. In this paper, a GA-based framework with two optimization algorithms is proposed for data sanitization. A novel evaluation function with three concerned factors is designed to find the appropriate transactions to be deleted in order to hide sensitive itemsets. Experiments are then conducted to evaluate the performance of the proposed GA-based algorithms with regard to different factors such as the execution time, the number of hiding failures, the number of missing itemsets, the number of artificial itemsets, and database dissimilarity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28

Similar content being viewed by others

References

  1. Amiri A (2007) Dare to share: protecting sensitive knowledge with data sanitization. Decis Support Syst 43(1):181–191

    Article  Google Scholar 

  2. Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 217–228

  3. Bache K, Lichman M (2012) Uci machine learning repository. https://archive.ics.uci.edu/ml/datasets.html

  4. Aggarwal CC, Yu PS (2009) A survey of uncertain data algorithms and applications. IEEE Trans Data Knowl Eng 21(5):609–623

    Article  Google Scholar 

  5. Aggarwal CC, Pei J, Zhang B (2006) On privacy preservation against adversarial data mining. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 510–516

  6. Chen CM, Lin YH, Sun HM (2013) SASHIMI: secure aggregation via successively hierarchical inspecting of message integrity on WSN. Journal of Information Hiding and Multimedia Signal Processing

  7. Lin CW, Hong TP (2013) A survey of fuzzy web mining. Wiley Interdiscip Rev: Data Min Knowl Disc 3(3):190–199

    MathSciNet  Google Scholar 

  8. Lin CW, Hong TP, Chang CC, Wang SL (2013) A greedy-based approach for hiding sensitive itemsets by transaction insertion. J Info Hiding Multimedia Signal Proc 4(4):201–227

    Google Scholar 

  9. Cheung DW, Ng VT, Tam BW (1997) Incremetal updates of discovered multi-level association rules. Int J Art Intell Tools 6(2):273–290

    Article  Google Scholar 

  10. Cheung DWL, Han J, Ng V , Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. Int Conf Data Eng:106–114

  11. Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. Int Workshop Info Hiding:369–383

  12. Giannotti F, Lakshmanan LVS, Monreale A, Pedreschi D, Wang HW (2012) Privacy-preserving mining of association rules from outsourced transaction databases. IEEE Syst J 7(3):385– 395

    Article  Google Scholar 

  13. Lan GC, Hong TP, Tseng VS (2011) Discovery of high utility itemsets from on-shelf time periods of products. Expert Syst Appl 38(5):5851–5857

    Article  Google Scholar 

  14. Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases data and knowledge engineering

  15. Holland JH (1992) Adaptation in natural and artificial systems. MIT Press

  16. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl Dis 8(1):53–87

    Article  MathSciNet  Google Scholar 

  17. Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC (2004) Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440

    Article  Google Scholar 

  18. Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  19. Dunning LA, Kresman R (2013) Privacy preserving data sharing with anonymous id assignment. IEEE Trans Info Forensics Security 8(2):402–413

    Article  Google Scholar 

  20. Dehkordi MN, Badie K, Zadeh AK (2009) A novel method for privacy preserving in association rule mining based on genetic algorithms. J Software 4(6):555–562

    Article  Google Scholar 

  21. Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866–883

    Article  Google Scholar 

  22. Berkhin P (2006) A survey of clustering data mining techniques. In: Grouping multidimensional data, pp 25–71

  23. Quinlan JR (1983) C4.5: Programs for machine learning. Morgan Kaufmann Publishers Inc.

  24. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: The international conference on very large data bases, pp 487–499

  25. Agrawal R, Srikant R (1995) Mining sequential patterns. In: The international conference on data engineering, pp 3–14

  26. Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: ACM SIGMOD international conference on management of data, pp 439–450

  27. Agrawal R, Imielinski T, Swami A (1993) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925

    Article  Google Scholar 

  28. Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: The international conference on extending database technology: advances in database technology, pp 3–17

  29. Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. In: The conference on emerging artificial intelligence applications in computer engineering: real word AI systems with applications in eHealth, HCI, information retrieval and pervasive technologies, pp 3–24

  30. Han S, Ng WK (2007) Privacy-preserving genetic algorithms for rule discovery. In: International conference on data warehousing and knowledge discovery, pp 407–417

  31. Oliveira SM, Zaane O, Saygin Y (2004) Secure association rule sharing. Lect Notes Comput Sci 3056:74–85

    Article  Google Scholar 

  32. Oliveira SRM, Zaane OR (2002) Privacy preserving frequent itemset mining. In: IEEE international conference on privacy, security and data mining, pp 43–54

  33. Liu TH, Wang Q, Zhu HF (2014) A multi-function password mutual authentication key agreement scheme with privacy preserving. Journal of Information Hiding and Multimedia Signal Processing

  34. Hong TP, Wang CY (2007) Maintenance of association rules using pre-large itemsets. In: Intelligent databases: technologies and applications, pp 44–60

  35. Hong TP, Lin CW, Wu YL (2008) Incrementally fast updated frequent pattern trees. Expert Syst Appl 34(4):2424–2435

    Article  Google Scholar 

  36. Hong TP, Wang CY, Tao YH (2001) A new incremental data mining algorithm using pre-large itemsets. Intell Data Anal 5:111–129

    MATH  Google Scholar 

  37. Wu TY, Tseng YM (2013) Further analysis of pairing based traitor tracing schemes for broadcast encryption. Security and Communication Networks

  38. Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. ACM SIGMOD Record

  39. Wu YH, Chiang CM, Che ALP (2007) Hiding sensitive association rules with limited side effects. IEEE Trans Knowl Data Eng 19(1):29–42

    Article  Google Scholar 

  40. Lindell Y, Pinkas B (2000) Privacy preserving data mining. In: The annual international cryptology conference on advances in cryptology, pp 36–54

  41. Zheng Z, Kohavi R, Mason L (2001) Real world performance of association rule algorithms. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 401– 406

Download references

Acknowledgment

This research was partially supported by: the Natural Scientific Research Innovation Foundation in Harbin Institute of Technology under the grant HIT.NSRIF.2014100; the Shenzhen Strategic Emerging Industries Program under the grant ZDSY20120613125016389; and the Shenzhen Peacock Project, China, under the grant KQC201109020055A.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chun-Wei Lin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, CW., Hong, TP., Yang, KT. et al. The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion. Appl Intell 42, 210–230 (2015). https://doi.org/10.1007/s10489-014-0590-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-014-0590-5

Keywords

Navigation