[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

An imbalanced contrastive classification method via similarity comparison within sample-neighbors with adaptive generation coefficient

Published: 25 June 2024 Publication History

Abstract

Correct discrimination of samples in overlapping regions is crucial in imbalanced classification problems. Data-level methods generate new samples in overlapping areas to obtain a clearer classification boundary. However, the generated samples' reliability cannot be guaranteed and additional noise will be introduced. Recently, although a few researchers have introduced contrastive learning to address the above problems, they have neither explored the differences in information content of samples in the contrastive task, nor considered the complex samples in overlapping areas. This paper proposes a contrastive classification method based on the similarity comparison of sample-neighbors, which transforms the traditional label prediction task into a similarity analysis task. Considering the distribution of neighbor category and the information content in the comparison task, each sample's unique generation coefficient is calculated. On this basis, a similarity loss with the target-neighbor sample group is designed so that the model can calculate the similarity between different samples. Meanwhile, extra discriminator will supervise the generated samples of variational autoencoder (VAE), which prompts the model to focus on the characteristics of individual samples. Experimental results on 39 public datasets show that the proposed method outperforms typical imbalanced classification methods.

References

[1]
H. Ding, Y. Sun, N. Huang, Z. Shen, Z. Wang, A. Iftekhar, X. Cui, Rvgan-tl: a generative adversarial networks and transfer learning-based hybrid approach for imbalanced data classification, Inf. Sci. 629 (2023) 184–203,.
[2]
T.-H. Yang, Z.-Y. Liao, Y.-H. Yu, M. Hsia, RDDL: a systematic ensemble pipeline tool that streamlines balancing training schemes to reduce the effects of data imbalance in rare-disease-related deep-learning applications, Comput. Biol. Chem. 106 (2023),.
[3]
G. Jiang, R. Yue, Q. He, P. Xie, X. Li, Imbalanced learning for wind turbine blade icing detection via spatio-temporal attention model with a self-adaptive weight loss function, Expert Syst. Appl. 229 (2023),.
[4]
X. Wang, Z. Liu, J. Liu, J. Liu, Fraud detection on multi-relation graphs via imbalanced and interactive learning, Inf. Sci. 642 (2023),.
[5]
Y. Chen, X. Yang, H.-L. Dai, Cost-sensitive continuous ensemble kernel learning for imbalanced data streams with concept drift, Knowl.-Based Syst. 284 (2024),.
[6]
H. Tao, L. Yun, W. Ke, X. Jian, L. Fu, A new weighted svdd algorithm for outlier detection, in: Proceedings of the 28th Chinese Control and Decision Conference, CCDC 2016, 2016, pp. 5456–5461,.
[7]
F.T. Liu, K.M. Ting, Z.H. Zhou, Isolation forest, in: Proceedings - IEEE International Conference on Data Mining, ICDM, 2008, pp. 413–422,.
[8]
B. Halder, K. Azharul Hasan, T. Amagasa, M. Manjur Ahmed, Autonomic active learning strategy using cluster-based ensemble classifier for concept drifts in imbalanced data stream, Expert Syst. Appl. (2023),.
[9]
N.V. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer, Smote: synthetic minority over-sampling technique, J. Artif. Intell. Res. 16 (2002) 321–357,.
[10]
D.P. Kingma, M. Welling, Auto-encoding variational Bayes, in: 2nd International Conference on Learning Representations, ICLR 2014 - Conference Track Proceedings, 12 2013,.
[11]
A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, A.A. Bharath, Generative adversarial networks: an overview, IEEE Signal Process. Mag. 35 (2018) 53–65,.
[12]
M. Hudec, E. Mináriková, R. Mesiar, A. Saranti, A. Holzinger, Classification by ordinal sums of conjunctive and disjunctive functions for explainable ai and interpretable machine learning solutions, Knowl.-Based Syst. 220 (2021),.
[13]
J. Dou, Z. Gao, G. Wei, Y. Song, M. Li, Switching synthesizing-incorporated and cluster-based synthetic oversampling for imbalanced binary classification, Eng. Appl. Artif. Intell. 123 (2023),.
[14]
H. Ding, Y. Sun, Z. Wang, N. Huang, Z. Shen, X. Cui, Rgan-el: a gan and ensemble learning-based hybrid approach for imbalanced data classification, Inf. Process. Manag. 60 (2) (2023),.
[15]
V. Svetnik, A. Liaw, C. Tong, J.C. Culberson, R.P. Sheridan, B.P. Feuston, Random forest: a classification and regression tool for compound classification and qsar modeling, J. Chem. Inf. Comput. Sci. 43 (2003) 1947–1958,.
[16]
J.H. Friedman, Greedy function approximation: a gradient boosting machine, Ann. Stat. 29 (2001),.
[17]
C. Srinilta, S. Kanharattanachai, Application of natural neighbor-based algorithm on oversampling smote algorithms, in: 2021 7th International Conference on Engineering, Applied Sciences and Technology, ICEAST 2021 - Proceedings, 2021, pp. 217–220,.
[18]
A. Dixit, A. Mani, Sampling technique for noisy and borderline examples problem in imbalanced classification, Appl. Soft Comput. 142 (2023),.
[19]
M. Zheng, T. Li, R. Zhu, Y. Tang, M. Tang, L. Lin, Z. Ma, Conditional Wasserstein generative adversarial network-gradient penalty-based approach to alleviating imbalanced data classification, Inf. Sci. 512 (2020) 1009–1023,.
[20]
K. Huang, X. Wang, Ada-incvae: improved data generation using variational autoencoder for imbalanced classification, Appl. Intell. 52 (2021) 2838–2853,.
[21]
T.M. Cover, P.E. Hart, Nearest neighbor pattern classification, IEEE Trans. Inf. Theory 13 (1967) 21–27,.
[22]
Y. Liu, Z. Liu, S. Li, Z. Yu, Y. Guo, Q. Liu, G. Wang, Cloud-VAE: variational autoencoder with concepts embedded, Pattern Recognit. 140 (2023),.
[23]
Z. Zhai, X. Li, Z. Chang, Open zero-shot learning via asymmetric VAE with dissimilarity space, Inf. Sci. 647 (2023),.
[24]
H. Liu, Y. Endo, J. Lee, S. Kamijo, SandGAN: style-mix assisted noise distortion for imbalanced conditional image synthesis, Neurocomputing 559 (2023),.
[25]
Y. Dong, H. Xiao, Y. Dong, SA-CGAN: an oversampling method based on single attribute guided conditional GAN for multi-class imbalanced learning, Neurocomputing 472 (2022) 326–337,.
[26]
J. Li, Q. Zhu, Q. Wu, Z. Zhang, Y. Gong, Z. He, F. Zhu, Smote-nan-de: addressing the noisy and borderline examples problem in imbalanced classification by natural neighbors and differential evolution, Knowl.-Based Syst. 223 (2021),.
[27]
Z. Wei, L. Zhang, L. Zhao, Minority-prediction-probability-based oversampling technique for imbalanced learning, Inf. Sci. 622 (2023) 1273–1295,.
[28]
J. Devlin, M.W. Chang, K. Lee, K. Toutanova, Bert: pre-training of deep bidirectional transformers for language understanding, in: NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, vol. 1, 2019, pp. 4171–4186,.
[29]
Z. Lu, J. Ma, Z. Wu, B. Zhou, X. Zhu, A noise-resistant graph neural network by semi-supervised contrastive learning, Inf. Sci. 658 (2024),.
[30]
Y. Xiao, J. Huang, J. Yang, TFCSRec: time–frequency consistency based contrastive learning for sequential recommendation, Expert Syst. Appl. 245 (2024),.
[31]
X. Gao, X. Jia, J. Liu, B. Xue, Z. Huang, S. Fu, G. Zhang, K. Li, An ensemble contrastive classification framework for imbalanced learning with sample-neighbors pair construction, Knowl.-Based Syst. 249 (2022),.
[32]
X. Gao, Z. Meng, X. Jia, J. Liu, X. Diao, B. Xue, Z. Huang, K. Li, An imbalanced binary classification method based on contrastive learning using multi-label confidence comparisons within sample-neighbors pair, Neurocomputing 517 (2023) 148–164,.
[33]
A.B.L. Larsen, S.K. Sønderby, H. Larochelle, O. Winther, Autoencoding beyond pixels using a learned similarity metric, in: 33rd International Conference on Machine Learning, ICML 2016, vol. 4, 2015, pp. 2341–2349. https://arxiv.org/abs/1512.09300v2.
[34]
D.W. Hosmer, S. Lemeshow, R.X. Sturdivant, Applied Logistic Regression, third edition, 2013,.
[35]
P. Janik, T. Lobos, Automated classification of power-quality disturbances using svm and rbf networks, IEEE Trans. Power Deliv. 21 (2006) 1663–1669,.
[36]
S. García, A. Fernández, J. Luengo, F. Herrera, Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power, Inf. Sci. 180 (2010) 2044–2064,.
[37]
S.M. Taheri, G. Hesamian, A generalization of the Wilcoxon signed-rank test and its applications, Stat. Pap. 54 (2013) 457–470,.
[38]
H. Han, W.Y. Wang, B.H. Mao, Borderline-Smote: A New over-Sampling Method in Imbalanced Data Sets Learning, Lecture Notes in Computer Science, vol. 3644, 2005, pp. 878–887,.
[39]
J.D.L. Calleja, O. Fuentes, A distance-based over-sampling method for learning from imbalanced data sets, in: Twentieth International Florida Artificial Intelligence Research Society Conference, 2007, www.aaai.org.
[40]
F. Koto, Smote-out, smote-cosine, and selected-smote: an enhancement strategy to handle imbalance in data level, in: Proceedings - ICACSIS 2014: 2014 International Conference on Advanced Computer Science and Information Systems, 2014, pp. 280–284,.
[41]
T. Sandhan, J.Y. Choi, Handling imbalanced datasets by partially guided hybrid sampling for pattern recognition, in: Proceedings - International Conference on Pattern Recognition, 2014, pp. 1449–1453,.
[42]
G. Douzas, F. Bacao, Self-organizing map oversampling (somo) for imbalanced data set learning, Expert Syst. Appl. 82 (2017) 40–52,.
[43]
C. Bunkhumpornpat, K. Sinapiromsaran, C. Lursinsap, Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) LNAI, vol. 5476, 2009, pp. 475–482,.
[44]
G. Douzas, F. Bacao, F. Last, Improving imbalanced learning through a heuristic oversampling method based on k-means and smote, Inf. Sci. 465 (2018) 1–20,.
[45]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in python, J. Mach. Learn. Res. 12 (2011) 2825–2830.
[46]
A. Iranmehr, H. Masnadi-Shirazi, N. Vasconcelos, Cost-sensitive support vector machines, Neurocomputing 343 (2019) 50–64,.
[47]
X. Gao, B. Ren, H. Zhang, B. Sun, J. Li, J. Xu, Y. He, K. Li, An ensemble imbalanced classification method based on model dynamic selection driven by data partition hybrid sampling, Expert Syst. Appl. 160 (2020),.
[48]
J. Sun, J. Lang, H. Fujita, H. Li, Imbalanced enterprise credit evaluation with dte-sbd: decision tree ensemble based on smote and bagging with differentiated sampling rates, Inf. Sci. 425 (2018) 76–91,.
[49]
Q. Dai, J. wei Liu, J.P. Yang, Swsel: sliding window-based selective ensemble learning for class-imbalance problems, Eng. Appl. Artif. Intell. 121 (2023),.
[50]
D.G. Pereira, A. Afonso, F.M. Medeiros, Overview of Friedman's test and post-hoc analysis, 44 (2015) 2636–2653,.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal
Information Sciences: an International Journal  Volume 662, Issue C
Mar 2024
1436 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 25 June 2024

Author Tags

  1. Imbalanced classification
  2. Contrastive learning
  3. Similarity loss
  4. Adaptive data augmentation
  5. Sample authenticity constraints

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media