More Web Proxy on the site http://driver.im/

research-article

An imbalanced contrastive classification method via similarity comparison within sample-neighbors with adaptive generation coefficient

Authors:

Jiansheng LuAuthors Info & Claims

Volume 662, Issue C

https://doi.org/10.1016/j.ins.2024.120273

Published: 25 June 2024 Publication History

Abstract

Correct discrimination of samples in overlapping regions is crucial in imbalanced classification problems. Data-level methods generate new samples in overlapping areas to obtain a clearer classification boundary. However, the generated samples' reliability cannot be guaranteed and additional noise will be introduced. Recently, although a few researchers have introduced contrastive learning to address the above problems, they have neither explored the differences in information content of samples in the contrastive task, nor considered the complex samples in overlapping areas. This paper proposes a contrastive classification method based on the similarity comparison of sample-neighbors, which transforms the traditional label prediction task into a similarity analysis task. Considering the distribution of neighbor category and the information content in the comparison task, each sample's unique generation coefficient is calculated. On this basis, a similarity loss with the target-neighbor sample group is designed so that the model can calculate the similarity between different samples. Meanwhile, extra discriminator will supervise the generated samples of variational autoencoder (VAE), which prompts the model to focus on the characteristics of individual samples. Experimental results on 39 public datasets show that the proposed method outperforms typical imbalanced classification methods.

References

[1]

H. Ding, Y. Sun, N. Huang, Z. Shen, Z. Wang, A. Iftekhar, X. Cui, Rvgan-tl: a generative adversarial networks and transfer learning-based hybrid approach for imbalanced data classification, Inf. Sci. 629 (2023) 184–203,.

Digital Library

[2]

T.-H. Yang, Z.-Y. Liao, Y.-H. Yu, M. Hsia, RDDL: a systematic ensemble pipeline tool that streamlines balancing training schemes to reduce the effects of data imbalance in rare-disease-related deep-learning applications, Comput. Biol. Chem. 106 (2023),.

Digital Library

[3]

G. Jiang, R. Yue, Q. He, P. Xie, X. Li, Imbalanced learning for wind turbine blade icing detection via spatio-temporal attention model with a self-adaptive weight loss function, Expert Syst. Appl. 229 (2023),.

Digital Library

[4]

X. Wang, Z. Liu, J. Liu, J. Liu, Fraud detection on multi-relation graphs via imbalanced and interactive learning, Inf. Sci. 642 (2023),.

Digital Library

[5]

Y. Chen, X. Yang, H.-L. Dai, Cost-sensitive continuous ensemble kernel learning for imbalanced data streams with concept drift, Knowl.-Based Syst. 284 (2024),.

Digital Library

[6]

H. Tao, L. Yun, W. Ke, X. Jian, L. Fu, A new weighted svdd algorithm for outlier detection, in: Proceedings of the 28th Chinese Control and Decision Conference, CCDC 2016, 2016, pp. 5456–5461,.

[7]

F.T. Liu, K.M. Ting, Z.H. Zhou, Isolation forest, in: Proceedings - IEEE International Conference on Data Mining, ICDM, 2008, pp. 413–422,.

Digital Library

[8]

B. Halder, K. Azharul Hasan, T. Amagasa, M. Manjur Ahmed, Autonomic active learning strategy using cluster-based ensemble classifier for concept drifts in imbalanced data stream, Expert Syst. Appl. (2023),.

Digital Library

[9]

N.V. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer, Smote: synthetic minority over-sampling technique, J. Artif. Intell. Res. 16 (2002) 321–357,.

[10]

D.P. Kingma, M. Welling, Auto-encoding variational Bayes, in: 2nd International Conference on Learning Representations, ICLR 2014 - Conference Track Proceedings, 12 2013,.

[11]

A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, A.A. Bharath, Generative adversarial networks: an overview, IEEE Signal Process. Mag. 35 (2018) 53–65,.

[12]

M. Hudec, E. Mináriková, R. Mesiar, A. Saranti, A. Holzinger, Classification by ordinal sums of conjunctive and disjunctive functions for explainable ai and interpretable machine learning solutions, Knowl.-Based Syst. 220 (2021),.

[13]

J. Dou, Z. Gao, G. Wei, Y. Song, M. Li, Switching synthesizing-incorporated and cluster-based synthetic oversampling for imbalanced binary classification, Eng. Appl. Artif. Intell. 123 (2023),.

Digital Library

[14]

H. Ding, Y. Sun, Z. Wang, N. Huang, Z. Shen, X. Cui, Rgan-el: a gan and ensemble learning-based hybrid approach for imbalanced data classification, Inf. Process. Manag. 60 (2) (2023),.

Digital Library

[15]

V. Svetnik, A. Liaw, C. Tong, J.C. Culberson, R.P. Sheridan, B.P. Feuston, Random forest: a classification and regression tool for compound classification and qsar modeling, J. Chem. Inf. Comput. Sci. 43 (2003) 1947–1958,.

[16]

J.H. Friedman, Greedy function approximation: a gradient boosting machine, Ann. Stat. 29 (2001),.

[17]

C. Srinilta, S. Kanharattanachai, Application of natural neighbor-based algorithm on oversampling smote algorithms, in: 2021 7th International Conference on Engineering, Applied Sciences and Technology, ICEAST 2021 - Proceedings, 2021, pp. 217–220,.

[18]

A. Dixit, A. Mani, Sampling technique for noisy and borderline examples problem in imbalanced classification, Appl. Soft Comput. 142 (2023),.

Digital Library

[19]

M. Zheng, T. Li, R. Zhu, Y. Tang, M. Tang, L. Lin, Z. Ma, Conditional Wasserstein generative adversarial network-gradient penalty-based approach to alleviating imbalanced data classification, Inf. Sci. 512 (2020) 1009–1023,.

Digital Library

[20]

K. Huang, X. Wang, Ada-incvae: improved data generation using variational autoencoder for imbalanced classification, Appl. Intell. 52 (2021) 2838–2853,.

Digital Library

[21]

T.M. Cover, P.E. Hart, Nearest neighbor pattern classification, IEEE Trans. Inf. Theory 13 (1967) 21–27,.

Digital Library

[22]

Y. Liu, Z. Liu, S. Li, Z. Yu, Y. Guo, Q. Liu, G. Wang, Cloud-VAE: variational autoencoder with concepts embedded, Pattern Recognit. 140 (2023),.

Digital Library

[23]

Z. Zhai, X. Li, Z. Chang, Open zero-shot learning via asymmetric VAE with dissimilarity space, Inf. Sci. 647 (2023),.

Digital Library

[24]

H. Liu, Y. Endo, J. Lee, S. Kamijo, SandGAN: style-mix assisted noise distortion for imbalanced conditional image synthesis, Neurocomputing 559 (2023),.

Digital Library

[25]

Y. Dong, H. Xiao, Y. Dong, SA-CGAN: an oversampling method based on single attribute guided conditional GAN for multi-class imbalanced learning, Neurocomputing 472 (2022) 326–337,.

Digital Library

[26]

J. Li, Q. Zhu, Q. Wu, Z. Zhang, Y. Gong, Z. He, F. Zhu, Smote-nan-de: addressing the noisy and borderline examples problem in imbalanced classification by natural neighbors and differential evolution, Knowl.-Based Syst. 223 (2021),.

[27]

Z. Wei, L. Zhang, L. Zhao, Minority-prediction-probability-based oversampling technique for imbalanced learning, Inf. Sci. 622 (2023) 1273–1295,.

Digital Library

[28]

J. Devlin, M.W. Chang, K. Lee, K. Toutanova, Bert: pre-training of deep bidirectional transformers for language understanding, in: NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, vol. 1, 2019, pp. 4171–4186,.

[29]

Z. Lu, J. Ma, Z. Wu, B. Zhou, X. Zhu, A noise-resistant graph neural network by semi-supervised contrastive learning, Inf. Sci. 658 (2024),.

Digital Library

[30]

Y. Xiao, J. Huang, J. Yang, TFCSRec: time–frequency consistency based contrastive learning for sequential recommendation, Expert Syst. Appl. 245 (2024),.

Digital Library

[31]

X. Gao, X. Jia, J. Liu, B. Xue, Z. Huang, S. Fu, G. Zhang, K. Li, An ensemble contrastive classification framework for imbalanced learning with sample-neighbors pair construction, Knowl.-Based Syst. 249 (2022),.

[32]

X. Gao, Z. Meng, X. Jia, J. Liu, X. Diao, B. Xue, Z. Huang, K. Li, An imbalanced binary classification method based on contrastive learning using multi-label confidence comparisons within sample-neighbors pair, Neurocomputing 517 (2023) 148–164,.

Digital Library

[33]

A.B.L. Larsen, S.K. Sønderby, H. Larochelle, O. Winther, Autoencoding beyond pixels using a learned similarity metric, in: 33rd International Conference on Machine Learning, ICML 2016, vol. 4, 2015, pp. 2341–2349. https://arxiv.org/abs/1512.09300v2.

[34]

D.W. Hosmer, S. Lemeshow, R.X. Sturdivant, Applied Logistic Regression, third edition, 2013,.

[35]

P. Janik, T. Lobos, Automated classification of power-quality disturbances using svm and rbf networks, IEEE Trans. Power Deliv. 21 (2006) 1663–1669,.

[36]

S. García, A. Fernández, J. Luengo, F. Herrera, Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power, Inf. Sci. 180 (2010) 2044–2064,.

Digital Library

[37]

S.M. Taheri, G. Hesamian, A generalization of the Wilcoxon signed-rank test and its applications, Stat. Pap. 54 (2013) 457–470,.

[38]

H. Han, W.Y. Wang, B.H. Mao, Borderline-Smote: A New over-Sampling Method in Imbalanced Data Sets Learning, Lecture Notes in Computer Science, vol. 3644, 2005, pp. 878–887,.

Digital Library

[39]

J.D.L. Calleja, O. Fuentes, A distance-based over-sampling method for learning from imbalanced data sets, in: Twentieth International Florida Artificial Intelligence Research Society Conference, 2007, www.aaai.org.

[40]

F. Koto, Smote-out, smote-cosine, and selected-smote: an enhancement strategy to handle imbalance in data level, in: Proceedings - ICACSIS 2014: 2014 International Conference on Advanced Computer Science and Information Systems, 2014, pp. 280–284,.

[41]

T. Sandhan, J.Y. Choi, Handling imbalanced datasets by partially guided hybrid sampling for pattern recognition, in: Proceedings - International Conference on Pattern Recognition, 2014, pp. 1449–1453,.

Digital Library

[42]

G. Douzas, F. Bacao, Self-organizing map oversampling (somo) for imbalanced data set learning, Expert Syst. Appl. 82 (2017) 40–52,.

Digital Library

[43]

C. Bunkhumpornpat, K. Sinapiromsaran, C. Lursinsap, Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) LNAI, vol. 5476, 2009, pp. 475–482,.

Digital Library

[44]

G. Douzas, F. Bacao, F. Last, Improving imbalanced learning through a heuristic oversampling method based on k-means and smote, Inf. Sci. 465 (2018) 1–20,.

Digital Library

[45]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in python, J. Mach. Learn. Res. 12 (2011) 2825–2830.

[46]

A. Iranmehr, H. Masnadi-Shirazi, N. Vasconcelos, Cost-sensitive support vector machines, Neurocomputing 343 (2019) 50–64,.

Digital Library

[47]

X. Gao, B. Ren, H. Zhang, B. Sun, J. Li, J. Xu, Y. He, K. Li, An ensemble imbalanced classification method based on model dynamic selection driven by data partition hybrid sampling, Expert Syst. Appl. 160 (2020),.

[48]

J. Sun, J. Lang, H. Fujita, H. Li, Imbalanced enterprise credit evaluation with dte-sbd: decision tree ensemble based on smote and bagging with differentiated sampling rates, Inf. Sci. 425 (2018) 76–91,.

Digital Library

[49]

Q. Dai, J. wei Liu, J.P. Yang, Swsel: sliding window-based selective ensemble learning for class-imbalance problems, Eng. Appl. Artif. Intell. 121 (2023),.

Digital Library

[50]

D.G. Pereira, A. Afonso, F.M. Medeiros, Overview of Friedman's test and post-hoc analysis, 44 (2015) 2636–2653,.

Recommendations

An imbalanced binary classification method based on contrastive learning using multi-label confidence comparisons within sample-neighbors pair
Abstract
For imbalanced classification, data-level methods can achieve inter-class balance, but the samples generated do not contain new information and cannot avoid the problem of introducing noise. Algorithm-level methods may lead to ...
Adaptive Unified Contrastive Learning for Imbalanced Classification
Machine Learning in Medical Imaging
Abstract
Medical image classifiers often suffer from the imbalanced class distribution of datasets. For example, among the 7 classes in the ISIC2018 skin lesion detection dataset, over 67% of the instances belong to melanocytic nevus while only 1% belong ...
Value-Aware Resampling and Loss for Imbalanced Classification
CSAE '18: Proceedings of the 2nd International Conference on Computer Science and Application Engineering

Existing1 machine learning methods usually treat training samples equally, and their performance degrades significantly when facing imbalanced training data. This paper introduces Value-Aware Resampling and Loss (VARL) to tackle the imbalanced ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal

Information Sciences: an International Journal Volume 662, Issue C

Mar 2024

1436 pages

Issue’s Table of Contents

Elsevier Inc.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 25 June 2024

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents