Abstract
Data mining technology is used to extract useful knowledge from very large datasets, but the process of data collection and data dissemination may result in an inherent threat to privacy. Some sensitive or private information concerning individuals, businesses and organizations has to be suppressed before it is shared or published. Privacy-preserving data mining (PPDM) has become an important issue in recent years. In the past, many heuristic approaches were developed to sanitize databases for the purpose of hiding sensitive information in PPDM, but data sanitization of PPDM is considered to be an NP-hard problem. It is critical to find the balance between privacy protection for hiding sensitive information and maintaining the discovery of knowledge, or even reducing artificial knowledge in the sanitization process. In this paper, a GA-based framework with two optimization algorithms is proposed for data sanitization. A novel evaluation function with three concerned factors is designed to find the appropriate transactions to be deleted in order to hide sensitive itemsets. Experiments are then conducted to evaluate the performance of the proposed GA-based algorithms with regard to different factors such as the execution time, the number of hiding failures, the number of missing itemsets, the number of artificial itemsets, and database dissimilarity.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Amiri A (2007) Dare to share: protecting sensitive knowledge with data sanitization. Decis Support Syst 43(1):181–191
Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 217–228
Bache K, Lichman M (2012) Uci machine learning repository. https://archive.ics.uci.edu/ml/datasets.html
Aggarwal CC, Yu PS (2009) A survey of uncertain data algorithms and applications. IEEE Trans Data Knowl Eng 21(5):609–623
Aggarwal CC, Pei J, Zhang B (2006) On privacy preservation against adversarial data mining. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 510–516
Chen CM, Lin YH, Sun HM (2013) SASHIMI: secure aggregation via successively hierarchical inspecting of message integrity on WSN. Journal of Information Hiding and Multimedia Signal Processing
Lin CW, Hong TP (2013) A survey of fuzzy web mining. Wiley Interdiscip Rev: Data Min Knowl Disc 3(3):190–199
Lin CW, Hong TP, Chang CC, Wang SL (2013) A greedy-based approach for hiding sensitive itemsets by transaction insertion. J Info Hiding Multimedia Signal Proc 4(4):201–227
Cheung DW, Ng VT, Tam BW (1997) Incremetal updates of discovered multi-level association rules. Int J Art Intell Tools 6(2):273–290
Cheung DWL, Han J, Ng V , Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. Int Conf Data Eng:106–114
Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. Int Workshop Info Hiding:369–383
Giannotti F, Lakshmanan LVS, Monreale A, Pedreschi D, Wang HW (2012) Privacy-preserving mining of association rules from outsourced transaction databases. IEEE Syst J 7(3):385– 395
Lan GC, Hong TP, Tseng VS (2011) Discovery of high utility itemsets from on-shelf time periods of products. Expert Syst Appl 38(5):5851–5857
Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases data and knowledge engineering
Holland JH (1992) Adaptation in natural and artificial systems. MIT Press
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl Dis 8(1):53–87
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC (2004) Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106
Dunning LA, Kresman R (2013) Privacy preserving data sharing with anonymous id assignment. IEEE Trans Info Forensics Security 8(2):402–413
Dehkordi MN, Badie K, Zadeh AK (2009) A novel method for privacy preserving in association rule mining based on genetic algorithms. J Software 4(6):555–562
Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866–883
Berkhin P (2006) A survey of clustering data mining techniques. In: Grouping multidimensional data, pp 25–71
Quinlan JR (1983) C4.5: Programs for machine learning. Morgan Kaufmann Publishers Inc.
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: The international conference on very large data bases, pp 487–499
Agrawal R, Srikant R (1995) Mining sequential patterns. In: The international conference on data engineering, pp 3–14
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: ACM SIGMOD international conference on management of data, pp 439–450
Agrawal R, Imielinski T, Swami A (1993) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: The international conference on extending database technology: advances in database technology, pp 3–17
Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. In: The conference on emerging artificial intelligence applications in computer engineering: real word AI systems with applications in eHealth, HCI, information retrieval and pervasive technologies, pp 3–24
Han S, Ng WK (2007) Privacy-preserving genetic algorithms for rule discovery. In: International conference on data warehousing and knowledge discovery, pp 407–417
Oliveira SM, Zaane O, Saygin Y (2004) Secure association rule sharing. Lect Notes Comput Sci 3056:74–85
Oliveira SRM, Zaane OR (2002) Privacy preserving frequent itemset mining. In: IEEE international conference on privacy, security and data mining, pp 43–54
Liu TH, Wang Q, Zhu HF (2014) A multi-function password mutual authentication key agreement scheme with privacy preserving. Journal of Information Hiding and Multimedia Signal Processing
Hong TP, Wang CY (2007) Maintenance of association rules using pre-large itemsets. In: Intelligent databases: technologies and applications, pp 44–60
Hong TP, Lin CW, Wu YL (2008) Incrementally fast updated frequent pattern trees. Expert Syst Appl 34(4):2424–2435
Hong TP, Wang CY, Tao YH (2001) A new incremental data mining algorithm using pre-large itemsets. Intell Data Anal 5:111–129
Wu TY, Tseng YM (2013) Further analysis of pairing based traitor tracing schemes for broadcast encryption. Security and Communication Networks
Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. ACM SIGMOD Record
Wu YH, Chiang CM, Che ALP (2007) Hiding sensitive association rules with limited side effects. IEEE Trans Knowl Data Eng 19(1):29–42
Lindell Y, Pinkas B (2000) Privacy preserving data mining. In: The annual international cryptology conference on advances in cryptology, pp 36–54
Zheng Z, Kohavi R, Mason L (2001) Real world performance of association rule algorithms. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 401– 406
Acknowledgment
This research was partially supported by: the Natural Scientific Research Innovation Foundation in Harbin Institute of Technology under the grant HIT.NSRIF.2014100; the Shenzhen Strategic Emerging Industries Program under the grant ZDSY20120613125016389; and the Shenzhen Peacock Project, China, under the grant KQC201109020055A.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lin, CW., Hong, TP., Yang, KT. et al. The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion. Appl Intell 42, 210–230 (2015). https://doi.org/10.1007/s10489-014-0590-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-014-0590-5