More Web Proxy on the site http://driver.im/

research-article

Tackling Diverse Minorities in Imbalanced Classification

Authors:

Kwei-Herng Lai,

Mangesh Bendre,

Xia HuAuthors Info & Claims

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Pages 1178 - 1187

https://doi.org/10.1145/3583780.3615071

Published: 21 October 2023 Publication History

Abstract

Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers. When working with large datasets, the imbalanced issue can be further exacerbated, making it exceptionally difficult to train classifiers effectively. To address the problem, over-sampling techniques have been developed to linearly interpolating data instances between minorities and their neighbors. However, in many real-world scenarios such as anomaly detection, minority instances are often dispersed diversely in the feature space rather than clustered together. Inspired by domain-agnostic data mix-up, we propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes. It is non-trivial to develop such a framework, the challenges include source sample selection, mix-up strategy selection, and the coordination between the underlying model and mix-up strategies. To tackle these challenges, we formulate the problem of iterative data mix-up as a Markov decision process (MDP) that maps data attributes onto an augmentation strategy. To solve the MDP, we employ an actor-critic framework to adapt the discrete-continuous decision space. This framework is utilized to train a data augmentation policy and design a reward signal that explores classifier uncertainty and encourages performance improvement, irrespective of the classifier's convergence. We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets using three different types of classifiers. The results of these experiments showcase the potential and promise of our framework in addressing imbalanced datasets with diverse minorities.

References

[1]

Khaled Gubran Al-Hashedi and Pritheega Magalingam. 2021. Financial fraud detection applying data mining techniques: a comprehensive review from 2009 to 2019. Computer Science Review, Vol. 40 (2021), 100402.

Digital Library

[2]

Chumphol Bunkhumpornpat, Krung Sinapiromsaran, and Chidchanok Lursinsap. 2012. DBSMOTE: density-based synthetic minority over-sampling technique. Applied Intelligence, Vol. 36, 3 (2012), 664--684.

Digital Library

[3]

Zhangjie Cao, Kaichao You, Mingsheng Long, Jianmin Wang, and Qiang Yang. 2019. Learning to transfer examples for partial domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]

Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 2009. Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks, Vol. 20, 3 (2009), 542--542.

Digital Library

[5]

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, Vol. 16 (2002), 321--357.

[6]

Deli Chen, Yankai Lin, Wei Li, Peng Li, Jie Zhou, and Xu Sun. 2020. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the AAAI conference on artificial intelligence.

[7]

Huiyuan Chen, Chin-Chia Michael Yeh, Fei Wang, and Hao Yang. 2022a. Graph neural transport networks with non-local attentions for recommender systems. In Proceedings of the ACM Web Conference 2022. 1955--1964.

Digital Library

[8]

Huiyuan Chen, Kaixiong Zhou, Kwei-Herng Lai, Xia Hu, Fei Wang, and Hao Yang. 2022b. Adversarial graph perturbations for recommendations at scale. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1854--1858.

Digital Library

[9]

Ali Dabouei, Sobhan Soleymani, Fariborz Taherkhani, and Nasser M Nasrabadi. 2021. Supermix: Supervising the mixing data augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13794--13803.

[10]

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. 1861--1870.

[11]

Hui Han, Wen-Yuan Wang, and Bing-Huan Mao. 2005. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In International conference on intelligent computing. 878--887.

Digital Library

[12]

Songqiao Han, Xiyang Hu, Hailiang Huang, Mingqi Jiang, and Yue Zhao. 2022a. ADBench: Anomaly Detection Benchmark. In Neural Information Processing Systems.

[13]

Xiaotian Han, Zhimeng Jiang, Ninghao Liu, and Xia Hu. 2022b. G-Mixup: Graph Data Augmentation for Graph Classification. In International Conference on Machine Learning.

[14]

Haibo He, Yang Bai, Edwardo A Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks. 1322--1328.

[15]

Chia-Yu Hsu and Wei-Chen Liu. 2021. Multiple time-series convolutional neural network for fault detection and diagnosis and empirical study in semiconductor manufacturing. Journal of Intelligent Manufacturing, Vol. 32 (2021), 823--836.

Digital Library

[16]

Dino Ienco, Ruggero G Pensa, and Rosa Meo. 2016. A semisupervised approach to the detection and characterization of outliers in categorical data. IEEE transactions on neural networks and learning systems, Vol. 28, 5 (2016), 1017--1029.

[17]

Anubha Kabra, Ayush Chopra, Nikaash Puri, Pinkesh Badjatiya, Sukriti Verma, Piyush Gupta, et al. 2020. MixBoost: Synthetic Oversampling with Boosted Mixup for Handling Extreme Imbalance. arXiv preprint arXiv:2009.01571 (2020).

[18]

Aechan Kim, Mohyun Park, and Dong Hoon Lee. 2020. AI-IDS: Application of deep learning to real-time Web intrusion detection. IEEE Access (2020).

[19]

György Kovács. 2019. Smote-variants: A python implementation of 85 minority oversampling techniques. Neurocomputing, Vol. 366 (2019), 352--354.

Digital Library

[20]

Kwei-Herng Lai, Lan Wang, Huiyuan Chen, Kaixiong Zhou, Fei Wang, Hao Yang, and Xia Hu. 2023. Context-aware Domain Adaptation for Time Series Anomaly Detection. In Proceedings of the 2023 SIAM International Conference on Data Mining. 676--684.

[21]

Mingchen Li, Xuechen Zhang, Christos Thrampoulidis, Jiasi Chen, and Samet Oymak. 2021. AutoBalance: Optimized Loss Functions for Imbalanced Data. In Advances in Neural Information Processing Systems.

[22]

Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).

[23]

Hongyi Ling, Zhimeng Jiang, Meng Liu, Shuiwang Ji, and Na Zou. 2023 a. Graph Mixup with Soft Alignments. In International Conference on Machine Learning.

[24]

Hongyi Ling, Zhimeng Jiang, Youzhi Luo, Shuiwang Ji, and Na Zou. 2023 b. Learning Fair Graph Representations via Automated Data Augmentations. In International Conference on Learning Representations.

[25]

Xu-Ying Liu and Zhi-Hua Zhou. 2006. The influence of class imbalance on cost-sensitive learning: An empirical study. In International Conference on Data Mining.

Digital Library

[26]

Zhining Liu, Pengfei Wei, Jing Jiang, Wei Cao, Jiang Bian, and Yi Chang. 2020. MESA: Boost Ensemble Imbalanced Learning with MEta-SAmpler. In Conference on Neural Information Processing Systems.

[27]

Anh Nguyen, Jason Yosinski, and Jeff Clune. 2015. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition. 427--436.

[28]

Hien M Nguyen, Eric W Cooper, and Katsuari Kamei. 2011. Borderline over-sampling for imbalanced data classification. International Journal of Knowledge Engineering and Soft Data Paradigms, Vol. 3, 1 (2011), 4--21.

Digital Library

[29]

Guansong Pang, Chunhua Shen, Huidong Jin, and Anton van den Hengel. 2019b. Deep weakly-supervised anomaly detection. arXiv preprint arXiv:1910.13601 (2019).

[30]

Guansong Pang, Chunhua Shen, and Anton van den Hengel. 2019a. Deep anomaly detection with deviation networks. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 353--362.

[31]

Guansong Pang, Anton van den Hengel, Chunhua Shen, and Longbing Cao. 2021. Toward deep supervised anomaly detection: Reinforcement learning from partially labeled anomaly data. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 1298--1308.

[32]

Lorenzo Perini, Vincent Vercruyssen, and Jesse Davis. 2020. Quantifying the confidence of anomaly detectors in their example-wise predictions. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 227--243.

[33]

Lukas Ruff, Robert A. Vandermeulen, Nico Görnitz, Alexander Binder, Emmanuel Müller, Klaus-Robert Müller, and Marius Kloft. 2020. Deep semi-supervised anomaly detection. In International Conference on Learning Representations.

[34]

Connor Shorten and Taghi M Khoshgoftaar. 2019. A survey on image data augmentation for deep learning. Journal of big data, Vol. 6, 1 (2019), 1--48.

[35]

Wacharasak Siriseriwan and Krung Sinapiromsaran. 2017. Adaptive neighbor synthetic minority oversampling technique under 1NN outcast handling. Songklanakarin J. Sci. Technol, Vol. 39, 5 (2017), 565--576.

[36]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.

Digital Library

[37]

Song Wang, Xingbo Fu, Kaize Ding, Chen Chen, Huiyuan Chen, and Jundong Li. 2023. Federated Few-Shot Learning. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.

Digital Library

[38]

Qingsong Wen, Liang Sun, Fan Yang, Xiaomin Song, Jingkun Gao, Xue Wang, and Huan Xu. 2020. Time series data augmentation for deep learning: A survey. arXiv preprint arXiv:2002.12478 (2020).

[39]

Tong Wu and Jorge Ortiz. 2021. Rlad: Time series anomaly detection through reinforcement learning and active learning. arXiv preprint arXiv:2104.00543 (2021).

[40]

Show-Jane Yen and Yue-Shi Lee. 2006. Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset. In Intelligent Control and Automation. Springer, 731--740.

[41]

Daochen Zha, Zaid Pervaiz Bhat, Kwei-Herng Lai, Fan Yang, Zhimeng Jiang, Shaochen Zhong, and Xia Hu. 2023. Data-centric artificial intelligence: A survey. arXiv preprint arXiv:2303.10158 (2023).

[42]

Daochen Zha, Kwei-Herng Lai, Qiaoyu Tan, Sirui Ding, Na Zou, and Xia Ben Hu. 2022. Towards automated imbalanced learning with deep hierarchical reinforcement learning. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management.

Digital Library

[43]

Daochen Zha, Kwei-Herng Lai, Mingyang Wan, and Xia Hu. 2020. Meta-AAD: Active anomaly detection with deep reinforcement learning. In 2020 IEEE International Conference on Data Mining.

[44]

Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. 2018. mixup: Beyond Empirical Risk Minimization. In International Conference on Learning Representations.

[45]

Yue Zhao and Maciej K Hryniewicki. 2018. XGBOD: improving supervised outlier detection with unsupervised representation learning. In 2018 International Joint Conference on Neural Networks.

Index Terms

Tackling Diverse Minorities in Imbalanced Classification
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms

Recommendations

Counterfactual-based minority oversampling for imbalanced classification
Abstract
A key challenge of oversampling in imbalanced classification is that the generation of new minority samples often neglects the usage of majority classes, resulting in most new minority sampling spreading the whole minority space. In view of this, ...
An overlapping minimization-based over-sampling algorithm for binary imbalanced classification
Abstract
Imbalanced learning is an important branch of machine learning. It addresses the challenge of improving classifier accuracy for minority classes in imbalanced data sets. Currently, the mainstream methods for handling imbalanced learning are the ...
Cost-sensitive decision tree ensembles for effective imbalanced classification

Real-life datasets are often imbalanced, that is, there are significantly more training samples available for some classes than for others, and consequently the conventional aim of reducing overall classification accuracy is not appropriate when dealing ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

October 2023

5508 pages

ISBN:9798400701245

DOI:10.1145/3583780

General Chairs:
Ingo Frommholz
University of Wolverhampton, UK
,
Frank Hopfgartner
University of Koblenz, Germany
,
Mark Lee
University of Birmingham, UK
,
Michael Oakes
University of Birmingham, UK
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Min Zhang
Tsinghua University, China
,
Rodrygo Santos
Federal University of Minas Gerais, Brazil

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM '23

Sponsor:

CIKM '23: The 32nd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2023

Birmingham, United Kingdom

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
119
Total Downloads

Downloads (Last 12 months)75
Downloads (Last 6 weeks)7

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents