[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3583780.3615071acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Tackling Diverse Minorities in Imbalanced Classification

Published: 21 October 2023 Publication History

Abstract

Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers. When working with large datasets, the imbalanced issue can be further exacerbated, making it exceptionally difficult to train classifiers effectively. To address the problem, over-sampling techniques have been developed to linearly interpolating data instances between minorities and their neighbors. However, in many real-world scenarios such as anomaly detection, minority instances are often dispersed diversely in the feature space rather than clustered together. Inspired by domain-agnostic data mix-up, we propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes. It is non-trivial to develop such a framework, the challenges include source sample selection, mix-up strategy selection, and the coordination between the underlying model and mix-up strategies. To tackle these challenges, we formulate the problem of iterative data mix-up as a Markov decision process (MDP) that maps data attributes onto an augmentation strategy. To solve the MDP, we employ an actor-critic framework to adapt the discrete-continuous decision space. This framework is utilized to train a data augmentation policy and design a reward signal that explores classifier uncertainty and encourages performance improvement, irrespective of the classifier's convergence. We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets using three different types of classifiers. The results of these experiments showcase the potential and promise of our framework in addressing imbalanced datasets with diverse minorities.

References

[1]
Khaled Gubran Al-Hashedi and Pritheega Magalingam. 2021. Financial fraud detection applying data mining techniques: a comprehensive review from 2009 to 2019. Computer Science Review, Vol. 40 (2021), 100402.
[2]
Chumphol Bunkhumpornpat, Krung Sinapiromsaran, and Chidchanok Lursinsap. 2012. DBSMOTE: density-based synthetic minority over-sampling technique. Applied Intelligence, Vol. 36, 3 (2012), 664--684.
[3]
Zhangjie Cao, Kaichao You, Mingsheng Long, Jianmin Wang, and Qiang Yang. 2019. Learning to transfer examples for partial domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[4]
Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 2009. Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks, Vol. 20, 3 (2009), 542--542.
[5]
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, Vol. 16 (2002), 321--357.
[6]
Deli Chen, Yankai Lin, Wei Li, Peng Li, Jie Zhou, and Xu Sun. 2020. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the AAAI conference on artificial intelligence.
[7]
Huiyuan Chen, Chin-Chia Michael Yeh, Fei Wang, and Hao Yang. 2022a. Graph neural transport networks with non-local attentions for recommender systems. In Proceedings of the ACM Web Conference 2022. 1955--1964.
[8]
Huiyuan Chen, Kaixiong Zhou, Kwei-Herng Lai, Xia Hu, Fei Wang, and Hao Yang. 2022b. Adversarial graph perturbations for recommendations at scale. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1854--1858.
[9]
Ali Dabouei, Sobhan Soleymani, Fariborz Taherkhani, and Nasser M Nasrabadi. 2021. Supermix: Supervising the mixing data augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13794--13803.
[10]
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. 1861--1870.
[11]
Hui Han, Wen-Yuan Wang, and Bing-Huan Mao. 2005. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In International conference on intelligent computing. 878--887.
[12]
Songqiao Han, Xiyang Hu, Hailiang Huang, Mingqi Jiang, and Yue Zhao. 2022a. ADBench: Anomaly Detection Benchmark. In Neural Information Processing Systems.
[13]
Xiaotian Han, Zhimeng Jiang, Ninghao Liu, and Xia Hu. 2022b. G-Mixup: Graph Data Augmentation for Graph Classification. In International Conference on Machine Learning.
[14]
Haibo He, Yang Bai, Edwardo A Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks. 1322--1328.
[15]
Chia-Yu Hsu and Wei-Chen Liu. 2021. Multiple time-series convolutional neural network for fault detection and diagnosis and empirical study in semiconductor manufacturing. Journal of Intelligent Manufacturing, Vol. 32 (2021), 823--836.
[16]
Dino Ienco, Ruggero G Pensa, and Rosa Meo. 2016. A semisupervised approach to the detection and characterization of outliers in categorical data. IEEE transactions on neural networks and learning systems, Vol. 28, 5 (2016), 1017--1029.
[17]
Anubha Kabra, Ayush Chopra, Nikaash Puri, Pinkesh Badjatiya, Sukriti Verma, Piyush Gupta, et al. 2020. MixBoost: Synthetic Oversampling with Boosted Mixup for Handling Extreme Imbalance. arXiv preprint arXiv:2009.01571 (2020).
[18]
Aechan Kim, Mohyun Park, and Dong Hoon Lee. 2020. AI-IDS: Application of deep learning to real-time Web intrusion detection. IEEE Access (2020).
[19]
György Kovács. 2019. Smote-variants: A python implementation of 85 minority oversampling techniques. Neurocomputing, Vol. 366 (2019), 352--354.
[20]
Kwei-Herng Lai, Lan Wang, Huiyuan Chen, Kaixiong Zhou, Fei Wang, Hao Yang, and Xia Hu. 2023. Context-aware Domain Adaptation for Time Series Anomaly Detection. In Proceedings of the 2023 SIAM International Conference on Data Mining. 676--684.
[21]
Mingchen Li, Xuechen Zhang, Christos Thrampoulidis, Jiasi Chen, and Samet Oymak. 2021. AutoBalance: Optimized Loss Functions for Imbalanced Data. In Advances in Neural Information Processing Systems.
[22]
Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).
[23]
Hongyi Ling, Zhimeng Jiang, Meng Liu, Shuiwang Ji, and Na Zou. 2023 a. Graph Mixup with Soft Alignments. In International Conference on Machine Learning.
[24]
Hongyi Ling, Zhimeng Jiang, Youzhi Luo, Shuiwang Ji, and Na Zou. 2023 b. Learning Fair Graph Representations via Automated Data Augmentations. In International Conference on Learning Representations.
[25]
Xu-Ying Liu and Zhi-Hua Zhou. 2006. The influence of class imbalance on cost-sensitive learning: An empirical study. In International Conference on Data Mining.
[26]
Zhining Liu, Pengfei Wei, Jing Jiang, Wei Cao, Jiang Bian, and Yi Chang. 2020. MESA: Boost Ensemble Imbalanced Learning with MEta-SAmpler. In Conference on Neural Information Processing Systems.
[27]
Anh Nguyen, Jason Yosinski, and Jeff Clune. 2015. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition. 427--436.
[28]
Hien M Nguyen, Eric W Cooper, and Katsuari Kamei. 2011. Borderline over-sampling for imbalanced data classification. International Journal of Knowledge Engineering and Soft Data Paradigms, Vol. 3, 1 (2011), 4--21.
[29]
Guansong Pang, Chunhua Shen, Huidong Jin, and Anton van den Hengel. 2019b. Deep weakly-supervised anomaly detection. arXiv preprint arXiv:1910.13601 (2019).
[30]
Guansong Pang, Chunhua Shen, and Anton van den Hengel. 2019a. Deep anomaly detection with deviation networks. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 353--362.
[31]
Guansong Pang, Anton van den Hengel, Chunhua Shen, and Longbing Cao. 2021. Toward deep supervised anomaly detection: Reinforcement learning from partially labeled anomaly data. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 1298--1308.
[32]
Lorenzo Perini, Vincent Vercruyssen, and Jesse Davis. 2020. Quantifying the confidence of anomaly detectors in their example-wise predictions. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 227--243.
[33]
Lukas Ruff, Robert A. Vandermeulen, Nico Görnitz, Alexander Binder, Emmanuel Müller, Klaus-Robert Müller, and Marius Kloft. 2020. Deep semi-supervised anomaly detection. In International Conference on Learning Representations.
[34]
Connor Shorten and Taghi M Khoshgoftaar. 2019. A survey on image data augmentation for deep learning. Journal of big data, Vol. 6, 1 (2019), 1--48.
[35]
Wacharasak Siriseriwan and Krung Sinapiromsaran. 2017. Adaptive neighbor synthetic minority oversampling technique under 1NN outcast handling. Songklanakarin J. Sci. Technol, Vol. 39, 5 (2017), 565--576.
[36]
Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
[37]
Song Wang, Xingbo Fu, Kaize Ding, Chen Chen, Huiyuan Chen, and Jundong Li. 2023. Federated Few-Shot Learning. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
[38]
Qingsong Wen, Liang Sun, Fan Yang, Xiaomin Song, Jingkun Gao, Xue Wang, and Huan Xu. 2020. Time series data augmentation for deep learning: A survey. arXiv preprint arXiv:2002.12478 (2020).
[39]
Tong Wu and Jorge Ortiz. 2021. Rlad: Time series anomaly detection through reinforcement learning and active learning. arXiv preprint arXiv:2104.00543 (2021).
[40]
Show-Jane Yen and Yue-Shi Lee. 2006. Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset. In Intelligent Control and Automation. Springer, 731--740.
[41]
Daochen Zha, Zaid Pervaiz Bhat, Kwei-Herng Lai, Fan Yang, Zhimeng Jiang, Shaochen Zhong, and Xia Hu. 2023. Data-centric artificial intelligence: A survey. arXiv preprint arXiv:2303.10158 (2023).
[42]
Daochen Zha, Kwei-Herng Lai, Qiaoyu Tan, Sirui Ding, Na Zou, and Xia Ben Hu. 2022. Towards automated imbalanced learning with deep hierarchical reinforcement learning. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management.
[43]
Daochen Zha, Kwei-Herng Lai, Mingyang Wan, and Xia Hu. 2020. Meta-AAD: Active anomaly detection with deep reinforcement learning. In 2020 IEEE International Conference on Data Mining.
[44]
Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. 2018. mixup: Beyond Empirical Risk Minimization. In International Conference on Learning Representations.
[45]
Yue Zhao and Maciej K Hryniewicki. 2018. XGBOD: improving supervised outlier detection with unsupervised representation learning. In 2018 International Joint Conference on Neural Networks.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
October 2023
5508 pages
ISBN:9798400701245
DOI:10.1145/3583780
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. anomaly detection
  2. data augmentation
  3. imbalanced classification

Qualifiers

  • Research-article

Conference

CIKM '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 119
    Total Downloads
  • Downloads (Last 12 months)75
  • Downloads (Last 6 weeks)7
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media