research-article

A MeanShift-guided oversampling with self-adaptive sizes for imbalanced data classification

Authors:

Lin Qi,

Shan HuangAuthors Info & Claims

Volume 672, Issue C

https://doi.org/10.1016/j.ins.2024.120699

Published: 01 June 2024 Publication History

Highlights

•

The method can address within-class imbalance, small sample sizes and small disjuncts.

•

The oversampling simultaneously considers majority and minority class distribution.

•

The introduced randomness and cut-off threshold can avoid overlapping and overfitting.

•

The assigned oversampling sizes are inversely proportional to density and distance.

•

The oversampling size assignment strategy can enhance minority boundary information.

Abstract

The imbalanced data classification has gained popularity in machine learning research domain due to its prevalence in numerous applications and its difficulty. However, the majority of contemporary work primarily focuses on addressing between-class imbalance issues. Previous researches have shown that combined with other elements, such as within-class imbalance, small sample size and the presence of small disjuncts, the imbalanced data significantly increase the difficulties for the traditional classifiers to learn. Therefore, we propose a novel MeanShift-guided oversampling with self-adaptive sizes for imbalanced data classification. The proposed MeanShift-guided oversampling technique can simultaneously consider the distribution of minority class and majority class within the sphere with the current minority instance as its center, which can favor addressing small sample size and avoiding overlapping issues often caused by the nearest neighbor (NN)-based oversampling techniques. The incorporation of random vector and flexible cut-off mechanism for vector length can enhance the diversity among the generated synthetic minority instances and avoid overlapping, which makes it suitable for small sample size and small disjuncts problems. To address between-class and within-class imbalance issues, we also introduce a self-adaptive sizes assignment strategy for each minority instance to be oversampled, where the assigned size is inversely proportional to its density and its distance from the majority class. In addition to eliminating within-class imbalance, the strategy can ensure that the informative border minority instances have more opportunities to be oversampled, thus improving classification performance. Extensive experimental results on some datasets with different distributions and imbalance ratios show the proposed algorithm outperforms other compared ones with significant difference.

References

[1]

A.A. Aburomman, M.B. Ibne Reaz, A novel weighted support vector machines multiclass classifier based on differential evolution for intrusion detection systems, Information Sciences. 414 (2017) 225–246,.

Highlights

Abstract

References

Recommendations

Self-adaptive oversampling method based on the complexity of minority data in imbalanced datasets classification

SVDD boundary and DPC clustering technique-based oversampling approach for handling imbalanced and overlapped data

SVDD-based weighted oversampling technique for imbalanced and overlapped dataset learning

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

Share

Share this Publication link

Share on social media

Affiliations