Abstract
Privacy Preserving Data Mining (PPDM) is an important research area in data mining, which aims at protecting the privacy during the data mining process so that personal data and sensitive information is not revealed to unauthorized persons. PPDM is a critical task as data often contain sensitive information about individuals that can easily compromise their privacy such as their financial status, political beliefs or medical history. In particular, many algorithms were proposed to hide sensitive frequent itemsets (values that frequently co-occur) in a transaction database. However, a major problem is that those algorithms are based on removing entire transactions (records) instead of single items (values). Hence, many changes are made to databases, which lead to losing useful information. To avoid making too many changes to a database while still preserving privacy, this paper proposes a novel PPDM algorithm called NSGAII4ID for hiding sensitive frequent itemsets by only removing some items from transactions rather than removing whole transactions. To our best knowledge, this is the first paper exploring this approach. Sanitization is treated as a multi-objective optimization problem where the goal is to reduce four side effects, namely hiding failure, missing cost, artificial cost, and database dissimilarity. However, we observe that only three side effects matter in our case (the artificial cost is always zero). Besides, another key difference is that NSGAII4ID sanitizes a database by performing optimization at two levels. At the transaction level, a multi-objective optimization algorithm is applied to find a subset of candidate transactions to be modified. Then, at the item level, NSGAII4ID searches for an optimal subset of items to be removed from each candidate transaction. We show that this problem is exactly the Set cover Problem (SCP) which we solve by using a fast greedy polynomial algorithm. We have conducted extensive experiments on four datasets to compare NSGAII4DT with state-of-the-art PPDM algorithms in terms of runtime, memory cost and total number of removed items during sanitization. The obtained results show that NSGAII4DT achieves a very good balance in minimizing side effects compared with the state-of-the art algorithms.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
For transactions having TID = 2 and TID = 5, the GSCA could have returned another minimal set of victim items which is {i3}.
the source code of our algorithm is available on the Github at the following address: https://github.com/mira34/NSGAII4ID-algorithm
The generator may be donwloaded at : https://sourceforge.net/projects/ibmquestdatagen/
References
AbdelAziz A, Alarabi L, Basalamah S, Hendawi A (2021) A multi-objective optimization method for hospital admission problem—a case study on covid-19 patients. Algorithms 14(2):38
Abdollahzadeh B, Gharehchopogh F (2021) A multi-objective optimization algorithm for feature selection problems. Engineering with Computers
Aggarwal C, Pei J, Zhang B (2006) On privacy preservation against adversarial data mining. In: Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 510–516
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data bases (VLDB), pp 487–499
Ahmed U, Lin J, Srivastava G, Djenouri Y (2021a) A deep q-learning sanitization approach for privacy preserving data mining. In: International Conference on Distributed Computing and Networking, pp 43–48
Ahmed U, Srivastava G, Lin J (2021b) A machine learning model for data sanitization. Comput Netw 189:107914
Aminifar A, Rabbi F, Lamo KPY (2021) Privacy preserving distributed extremely randomized trees. In: Proc. 36th Annual ACM Symposium on Applied Computing, pp 1102–1105
Bilal N, Galinier p, Guibault F (2013) A new formulation of the set covering problem for metaheuristic approaches. International Scholarly Research Notices
Coleto-Alcudia V, Vega-Rodríguez M (2022) A multi-objective optimization approach for the identification of cancer biomarkers from rna-seq data. Expert Syst Appl 193:116480
Cormen T, Leinserson C, Rivest R, Stein C (2009) Introduction to Algorithms, third edition. The MIT Press
Deb K, Pratap A, Agrawal S, Meyarivan T (2002) A fast and elitist multi-objective genetic algorithm: Nsga-ii. IEEE Trans Evol Comput 6(2):182–197
Ewees A, Elaziz M A, Oliva D (2021) A new multi-objective optimization algorithm combined with opposition-based learning. Expert Syst Appl 165:113844
Fournier-Viger P, Lin J, Gomariz A, Gueniche T, Soltani A, Deng Z (2016) The spmf open-source data mining library version 2. In: Proc. Joint European conference on machine learning and knowledge discovery in databases, pp 36–40
Fournier-Viger P, Lin J, Vo B, Chi T, Zhang J, Le H (2017) A survey of itemset mining. Wiley Interdiscip Rev: Data Min Knowl Discov 7(4):e1207
Goyal S, Bedi P, Rajawat A, Shaw RN, Ghosh A (2022) Multi-objective fuzzy-swarm optimizer for data partitioning. In: Proceedings of Advanced Computing and Intelligent Technologies, pp 307–318
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87
Hong T, Lin C, Yang K, Wang S (2013) Using tf-idf to hide sensitive itemsets. Appl Intell 38(4):502–510
Kalia P, Bansal D, Sofat S (2021) A hybrid approach for preserving privacy for real estate data. Int J Inf Comput Secur 15(4):400–410
Kousika N, Premalatha K (2021) An improved privacy-preserving data mining technique using singular value decomposition with three-dimensional rotation data perturbation. The Journal of Supercomputing
Lee J, Jun S (2021) Privacy-preserving data mining for open government data from heterogeneous sources. Gov Inf Q 38(1):101544
Lin C, Hong T, Chang C, Wang S (2013) A greedy-based approach for hiding sensitive itemsets by transaction insertion. J Inf Hiding Multim Signal Process 4(4):201–214
Lin C, Zhang B, Yang K, TP TH (2014) Efficiently hiding sensitive itemsets with transaction deletion based on genetic algorithms. The Scientific World Journal, Hindawi
Lin C, Hong T, Yang K, Wang S (2015) The ga-based algorithms for optimizing hiding sensitive itemsets through transaction deletion. Appl Intell 42(2):210–230
Lin C, Liu Q, Fournier-Viger P (2016) A sanitization approach for hiding sensitive itemsets based on particle swarm optimization. Eng Appl Artif Intell 53:1–18
Lin J, Wu J, Fournier-Viger P, Djenouri Y, Chen C, Zhang Y (2019a) A sanitization approach to secure shared data in an iot environment. IEEE Access 7:25359–25368
Lin J, Zhang B, Fournier-Viger P, Djenouri Y (2019b) Hiding sensitive itemsets with multiple objective optimization. Soft Computing 23(23):12779–12797
Lin J, Srivastava G, Zhang Y, Djenouri Y, Aloqaily M (2020) Privacy-preserving multiobjective sanitization model in 6g iot environments. IEEE Internet Things J 8(7):5340–5349
Merani M, Croce D, Tinnirello I (2021) Rings for privacy: an architecture for large scale privacy-preserving data mining. IEEE Trans Parallel Distrib Syst 32(6):1340–1352
Nasiri N, Keyvanpour M (2020) Classification and evaluation of privacy preserving data mining methods. In: Proc. 11th IEEE International Conference on Information and Knowledge Technology (IKT), pp 17–22
Oliveira S, Zaiane O (2003) Protecting sensitive knowledge by data sanitization. In: Proc. Third IEEE International conference on data mining, pp 613–616
Pika A, Wynn M, Budiono S, Hofstede AT, der Aalst WV, Reijers H (2020) Privacy-preserving process mining in healthcare. Int J Environ Res 17(5):1612
Shastri M, Pandit A (2021) Remodeling: improved privacy preserving data mining (ppdm). Int J Inf Technol 13(1):131–137
Virupaksha S, Dondeti V (2021) Anonymized noise addition in subspaces for privacy preserved data mining in high dimensional continuous data. Peer Peer Netw Appl 14(3):1608–1628
Wang J, Zhang J, Bao W, Zhu X, Cao B, Yu P (2018a) Not just privacy: Improving performance of private deep learning in mobile cloud. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 2407–2416
Wang J, Wu L, Zeadally S, Khan M, He D (2021) Privacy-preserving data aggregation against malicious data mining attack for iot-enabled smart grid. ACM Transactions on Sensor Networks (TOSN) 17 (3):1–25
Wang R, Lai S, Wu G, Xing L, Wang L, Ishibuchi H (2018b) Multi-clustering via evolutionary multi-objective optimization. Inf Sci 450:128–140
Wu J, Zhan J, Lin J (2017) Ant colony system sanitization approach to hiding sensitive itemsets. IEEE Access 5:10024–10039
Wu J, Lin J, FV P, Djenouri Y, Chen C, Li Z (2019a) The density-based clustering method for privacy-preserving data mining. Math Biosci Eng 16(3):1718–1728
Wu J, Srivastava G, Jolfaei A, Fournier-Viger P, JCW JL (2021a) Hiding sensitive information in ehealth datasets. Futur Gener Comput Syst 117:169–180
Wu J, Srivastava G, Tayeb S, Lin J (2021b) A pso-based sanitization process with multi-thresholds model. In: Proceedings of international conference on pattern recognition, pp 439–446
Wu J, Srivastava G, Yun U, Tayeb S, Lin J (2021c) An evolutionary computation-based privacy-preserving data mining model under a multi threshold constraint. Trans Trans Emerg Telecommun Technol 32 (3):1–17
Wu T, Lin J, Zhang Y, Chen C (2019b) A grid-based swarm intelligence algorithm for privacy-preserving data mining. Appl Sci 9(4):774
Zhang X, Jiang S, Liu Y, Jiang T, Zhou Y (2021) Privacy-preserving scheme with account-mapping and noise-adding for energy trading based on consortium blockchain. IEEE Transactions on Network and Service Management. https://doi.org/10.1109/TNSM.2021.3110980
Zorarpacıimath E, Özel S (2021a) Privacy preserving classification over differentially private data. Wiley Interdiscip Rev: Data Min Knowl Discov 11(3):e1399
Zorarpacıimath E, Özel S (2021b) Privacy preserving rule-based classifier using modified artificial bee colony algorithm. Expert Syst Appl 183:115437
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lefkir, M., Nouioua, F. & Fournier-Viger, P. Hiding sensitive frequent itemsets by item removal via two-level multi-objective optimization. Appl Intell 53, 10027–10052 (2023). https://doi.org/10.1007/s10489-022-03808-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03808-6