Abstract
With the development of IoT applications, machine learning dramatically improves the utility of variable IoT systems such as autonomous driving. Although the pretrain-finetune framework can cope well with data heterogeneity in complex IoT scenarios, the data collected by sensors often contain unexpected noisy data, e.g., out-of-distribution (OOD) data, which leads to the reduced performance of fine-tuned models. To resolve the problem, this paper proposes MuGAN, a method that can mitigate the side-effect of OOD data via the generative adversarial network (GAN)-based machine unlearning. MuGAN follows a straightforward but effective idea to mitigate the performance loss caused by OOD data, i.e., “flashbacking” the model to the condition where OOD data are excluded from model training. To achieve the goal, we design an adversarial game, where a discriminator is trained to identify whether a sample belongs to the training set by observing the confidence score. Meanwhile, a generator (i.e., the target model) is updated to fool the discriminator into believing that the OOD data are not included in the training set but others do. The experimental results show that benefiting from the high unlearning rate (more than 90%) and retention rate (99%), MuGAN succeeds in lowering the model performance degradation caused by OOD data from 5.88% to 0.8%.
Similar content being viewed by others
References
Peng B, Chi M M, Liu C. Non-IID federated learning via random exchange of local feature maps for textile IIoT secure computing. Sci China Inf Sci, 2022, 65: 170302
Jung J, Kim B, Cho J, et al. A secure platform model based on ARM platform security architecture for IoT devices. IEEE Internet Things J, 2022, 9: 5548–5560
Imteaj A, Thakker U, Wang S, et al. A survey on federated learning for resource-constrained IoT devices. IEEE Internet Things J, 2021, 9: 1–24
Khan L U, Saad W, Han Z, et al. Federated learning for Internet of Things: recent advances, taxonomy, and open challenges. IEEE Commun Surv Tutorials, 2021, 23: 1759–1799
Zhang T, Gao L, He C, et al. Federated learning for the Internet of Things: applications, challenges, and opportunities. IEEE Internet Things M, 2022, 5: 24–29
He T X, Liu J, Cho K, et al. Analyzing the forgetting problem in pretrain-finetuning of open-domain dialogue response models. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021. 1121–1133
Krishnamurthi R, Kumar A, Gopinathan D, et al. An overview of IoT sensor data processing, fusion, and analysis techniques. Sensors, 2020, 20: 6076
Wu Z-F, Wei T, Jiang J W, et al. NGC: a unified framework for learning with open-world noisy data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021. 62–71
Wenzel F, Dittadi A, Gehler P V, et al. Assaying out-of-distribution generalization in transfer learning. 2022. ArXiv:2207.09239
Bourtoule L, Chandrasekaran V, Choquette-Choo C A, et al. Machine unlearning. In: Proceedings of IEEE Symposium on Security and Privacy (SP), 2021. 141–159
Cao Y Z, Yang J F. Towards making systems forget with machine unlearning. In: Proceedings of IEEE Symposium on Security and Privacy, 2015. 463–480
Ma Z, Liu Y, Liu X, et al. Learn to forget: machine unlearning via neuron masking. IEEE Trans Dependable Secure Comput, 2022. doi: https://doi.org/10.1109/TDSC.2022.3194884
Hsu T H, Wang Z H, See A R. A cloud-edge-smart IoT architecture for speeding up the deployment of neural network models with transfer learning techniques. Electronics, 2022, 11: 2255–2269
Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks. Commun ACM, 2020, 63: 139–144
Schelter S. “Amnesia”—machine learning models that can forget user data very fast. In: Proceedings of the 10th Conference on Innovative Data Systems Research, Amsterdam, 2020
Chen C, Sun F, Zhang M, et al. Recommendation unlearning. In: Proceedings of the ACM Web Conference, 2022. 2768–2777
Baumhauer T, Schöttle P, Zeppelzauer M. Machine unlearning: linear filtration for logit-based classifiers. Mach Learn, 2022, 111: 3203–3226
Izzo Z, Smart M A, Chaudhuri K, et al. Approximate data deletion from machine learning models. In: Proceedings of International Conference on Artificial Intelligence and Statistics, 2021. 2008–2016
Brophy J, Lowd D. Machine unlearning for Random forests. In: Proceedings of International Conference on Machine Learning, 2021. 1092–1104
Fu S P, He F X, Tao D C. Knowledge removal in sampling-based Bayesian inference. 2022. ArXiv:2203.12964
Rawat A, Requeima J, Bruinsma W, et al. Challenges and pitfalls of Bayesian unlearning. 2022. ArXiv:2207.03227
Chien E, Pan C, Milenkovic O. Certified graph unlearning. 2022. ArXiv:2206.09140
He Y Z, Meng G Z, Chen K, et al. Deepobliviate: a powerful charm for erasing data residual memory in deep neural networks. 2021. ArXiv:2105.06209
Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. 2015. ArXiv:1511.06434
Zhu J, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2017. 2223–2232
Oliver A, Odena A, Raffel C, et al. Realistic evaluation of deep semi-supervised learning algorithms. In: Proceedings of Advances in Neural Information Processing Systems 31, 2018
Morningstar W, Ham C, Gallagher A, et al. Density of states estimation for out of distribution detection. In: Proceedings of International Conference on Artificial Intelligence and Statistics, 2021. 3232–3240
Orekondy T, Schiele B, Fritz M. Knockoff nets: stealing functionality of black-box models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. 4954–4963
Tramèr F, Zhang F, Juels A, et al. Stealing machine learning models via prediction APIs. In: Proceedings of the 25th USENIX Security Symposium (USENIX Security 16), 2016. 601–618
Caesar H, Bankiti V, Lang A, et al. nuScenes: a multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. 11621–11631
Pan Z Y, Emaru T, Ravankar A, et al. Applying semantic segmentation to autonomous cars in the snowy environment. 2020. ArXiv:2007.12869
Nakanoya M, Im J, Qiu H, et al. Personalized federated learning of driver prediction models for autonomous driving. 2021. ArXiv:2112.00956
Li Z, Pan M X, Zhang T, et al. Testing DNN-based autonomous driving systems under critical environmental conditions. In: Proceedings of International Conference on Machine Learning, 2021. 6471–6482
Li J N, Xiong C M, Hoi S C H. Learning from noisy data with robust representation learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021. 9485–9494
Zhang L, Goldstein M, Ranganath R. Understanding failures in out-of-distribution detection with deep generative models. In: Proceedings of International Conference on Machine Learning, 2021. 12427–12436
Ulmer D, Cinà G. Know your limits: uncertainty estimation with ReLU classifiers fails at reliable OOD detection. In: Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence, 2021. 1766–1776
Arivazhagan M G, Aggarwal V S, Aaditya K, et al. Federated learning with personalization layers. 2019. ArXiv:1912.00818
McMahan B, Moore E, Ramage D, et al. The German traffic sign recognition benchmark: a multi-class classification competition. In: Proceedings of the 2011 International Joint Conference on Neural Networks, 2011. 1453–1460
Xu P, Ehinger K A, Zhang Y. TurkerGaze: crowdsourcing saliency with webcam based eye tracking. 2015. ArXiv:1504.06755
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014. ArXiv:1409.1556
Dinh T C, Tran N, Nguyen J. Personalized federated learning with Moreau envelopes. In: Proceedings of Advances in Neural Information Processing Systems, 2020, 33: 21394–21405
Luo B, Xiao W L, Wang S Q, et al. Tackling system and statistical heterogeneity for federated learning with adaptive client sampling. In: Proceedings of IEEE INFOCOM 2022-IEEE Conference on Computer Communications, 2022. 1739–1748
Schelter S, Grafberger S, Dunning T. HedgeCut: maintaining randomised trees for low-latency machine unlearning. In: Proceedings of the 2021 International Conference on Management of Data, 2021. 1545–1557
Gupta V, Jung C, Neel S, et al. Adaptive machine unlearning. In: Proceedings of Advances in Neural Information Processing Systems 34 (NeurIPS 2021), 2021
McMahan B, Moore E, Ramage D, et al. Communication-efficient learning of deep networks from decentralized data. In: Proceedings of Artificial Intelligence and Statistics, 2017. 1273–1282
Chen M, Zhang Z K, Wang T H, et al. When machine unlearning jeopardizes privacy. In: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, 2021
van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learning Res, 2008, 9: 2579–2605
Acknowledgements
This work was supported by National Key Research and Development Program of China (Grant No. 2022YFB3103500), National Natural Science Foundation of China (Grant Nos. U21A20464, 61872283), Natural Science Basic Research Program of Shaanxi (Grant No. 2021JC-22), Key Research and Development Program of Shaanxi (Grant No. 2022GY029), and China 111 Project (Grant No. B16037).
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Ma, Z., Yang, Y., Liu, Y. et al. Mitigate noisy data for smart IoT via GAN based machine unlearning. Sci. China Inf. Sci. 67, 132104 (2024). https://doi.org/10.1007/s11432-022-3671-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-022-3671-9