Btda: basis transformation based distribution alignment for imbalanced semi-supervised learning

Jinhuang Ye¹,
Xiaozhi Gao²,
Zuoyong Li³,
Jiawei Wu⁴,
Xiaofeng Xu⁵ &
…
Xianghan Zheng¹

221 Accesses
1 Altmetric
Explore all metrics

Abstract

Semi-supervised learning (SSL) employs unlabeled data with limited labeled samples to enhance deep networks, but imbalance degrades performance due to biased pseudo-labels skewing decision boundaries. To address this challenge, we propose two optimization conditions inspired by our theoretical analysis. These conditions focus on aligning class distributions and representations. Additionally, we introduce a plug-and-play method called Basis Transformation based distribution alignment (BTDA) that efficiently aligns class distributions while considering inter-class relationships. BTDA mitigates the negative impact of biased pseudo-labels through basis transformation, which involves a learnable transition matrix. Extensive experiments demonstrate the effectiveness of integrating existing SSL methods with BTDA in image classification tasks with class imbalance. For example, BTDA achieves accuracy improvements ranging from 2.47 to 6.66% on CIFAR10-LT and SVHN-LT datasets, and a remarkable 10.95% improvement on the tail class, even under high imbalanced rates. Despite its simplicity, BTDA achieves state-of-the-art performance in SSL with class imbalance on representative datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Flexible Distribution Alignment: Towards Long-Tailed Semi-supervised Learning with Proper Calibration

Rethinking Distribution Alignment for Inter-class Fairness

Transfer and share: semi-supervised learning from long-tailed data

Article 31 October 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The data that support the findings of this study are openly available in CIFAR-10 [34] and SVHN [35]

References

Yafen L, Yifeng Z, Lingyi J, Guohe L, Wenjie Z (2022) Survey on pseudo-labeling methods in deep semi-supervised learning. J Front Comput Sci Technol 16:1279
Google Scholar
Yang X, Song Z, King I, Xu Z (2022) A survey on deep semi-supervised learning. IEEE Trans Knowl Data Eng 20:1–20
Google Scholar
Yang L, Yan L, Wei X, Yang X (2023) Label consistency-based deep semisupervised nmf for tumor recognition. Eng Appl Artif Intell 117:105511
Article Google Scholar
Lee S, Kim H, Chun D (2023) Ucr-ssl: uncertainty-based consistency regularization for semi-supervised learning. In: 2023 international conference on electronics, information, and communication (ICEIC), pp 1–3
Huang Y, Yang L, Sato Y (2023) Weakly supervised temporal sentence grounding with uncertainty-guided self-training. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 18908–18918
Long J, Chen Y, Yang Z, Huang Y, Li C (2022) A novel self-training semi-supervised deep learning approach for machinery fault diagnosis. Int J Prod Res 20:1–14
Google Scholar
Sohn K et al (2020) Fixmatch: simplifying semi-supervised learning with consistency and confidence. Adv Neural Inf Process Syst 33:596–608
Google Scholar
Berthelot D et al (2020) Remixmatch: semi-supervised learning with distribution matching and augmentation anchoring. In: International conference on learning representations
Kim J et al (2020) Distribution aligning refinery of pseudo-label for imbalanced semi-supervised learning. Adv Neural Inf Process Syst 33:14567–14579
Google Scholar
Lee H, Shin S, Kim H (2021) Abc: auxiliary balanced classifier for class-imbalanced semi-supervised learning. Adv Neural Inf Process Syst 34:7082–7094
Google Scholar
Guo L-Z, Li Y-F (2022) Class-imbalanced semi-supervised learning with adaptive thresholding. In: International conference on machine learning, pp 8082–8094
Berthelot D et al (2019) Mixmatch: a holistic approach to semi-supervised learning. Adv Neural Inf Process Syst 32:25
Google Scholar
Wei C, Sohn K, Mellina C, Yuille A, Yang F (2021) Crest: a class-rebalancing self-training framework for imbalanced semi-supervised learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10857–10866
Cheng L, Guo R, Candan KS, Liu H (2020) Representation learning for imbalanced cross-domain classification. In: Proceedings of the 2020 SIAM international conference on data mining, pp 478–486
Huang L et al. (2022) Learning where to learn in cross-view self-supervised learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14451–14460
Xu N, Shu J, Liu Y-P, Geng X (2020) Variational label enhancement. In: International conference on machine learning, pp 10597–10606
Xu N, Liu Y-P, Geng X (2019) Label enhancement for label distribution learning. IEEE Trans Knowl Data Eng 33:1632–1643
Article Google Scholar
Xu N, Qiao C, Lv J, Geng X, Zhang M-L (2022) One positive label is sufficient: single-positive multi-label learning with label enhancement. Adv Neural Inf Process Syst 35:21765–21776
Google Scholar
Zhang B et al (2021) Flexmatch: boosting semi-supervised learning with curriculum pseudo labeling. Adv Neural Inf Process Syst 34:18408–18419
Google Scholar
Zhang ea Yifan (2023) Deep long-tailed learning: a survey. IEEE Trans Pattern Anal Mach Intell 20:20
Google Scholar
Fernández A, del Río S, Chawla NV, Herrera F (2017) An insight into imbalanced big data classification: outcomes and challenges. Complex Intell Syst 3:105–120
Article Google Scholar
Sleeman WC IV, Krawczyk B (2021) Multi-class imbalanced big data classification on spark. Knowl-Based Syst 212:106598
Article Google Scholar
Zhu T, Liu X, Zhu E (2022) Oversampling with reliably expanding minority class regions for imbalanced data learning. IEEE Trans Knowl Data Eng 20:20
Google Scholar
Zhang et al (2023) An empirical study on the joint impact of feature selection and data resampling on imbalance classification. Appl Intell 53:5449–5461
Google Scholar
Wang W et al (2022) Imbalanced adversarial training with reweighting. In: 2022 IEEE international conference on data mining (ICDM)
Li J, Liu Y, Li Q (2022) Generative adversarial network and transfer-learning-based fault detection for rotating machinery with imbalanced data condition. Meas Sci Technol 33:045103
Article Google Scholar
Shi Y et al (2022) Improving imbalanced learning by pre-finetuning with data augmentation. In: Fourth international workshop on learning with imbalanced domains: theory and applications. PMLR
Bonner S et al (2022) Implications of topological imbalance for representation learning on biomedical knowledge graphs. Brief Bioinform 23:25
Article Google Scholar
Gouabou ACF et al (2022) Rethinking decoupled training with bag of tricks for long-tailed recognition. In: 2022 international conference on digital image computing: techniques and applications (DICTA)
Hyun M, Jeong J, Kwak N (2021) Class-imbalanced semi-supervised learning. ICLR RobustML. Workshop
He J et al (2021) Rethinking re-sampling in imbalanced semi-supervised learning. arXiv:2106.00209 (arXiv preprint)
Zhou Y et al (2022) Mmrotate: a rotated object detection benchmark using pytorch. In: Proceedings of the 30th ACM international conference on multimedia, pp 7331–7334
Yang X et al (2022) Scrdet++: detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. IEEE Trans Pattern Anal Mach Intell 45:2384–2399
Article Google Scholar
Krizhevsky A et al (2009) Learning multiple layers of features from tiny images. Technical Report, University of Toronto
Netzer Y et al (2011) Reading digits in natural images with unsupervised feature learning. In: NIPS workshop on deep learning and unsupervised feature learning
Zagoruyko S, Komodakis N (2016) Wide residual networks. In: British machine vision conference
Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: International conference on machine learning, pp 1139–1147
Polyak BT (1964) Some methods of speeding up the convergence of iteration methods. USSR Comput Math Math Phys 4:1–17
Article Google Scholar
Hendrycks D, Mazeika M, Wilson D, Gimpel K (2018) Using trusted data to train deep networks on labels corrupted by severe noise. Adv Neural Inf Process Syst 31:25
Google Scholar
Sajjadi M, Javanmardi M, Tasdizen T (2016) Regularization with stochastic transformations and perturbations for deep semi-supervised learning. Adv Neural Inf Process Syst 29:25
Google Scholar

Download references

Funding

This research was supported by National Natural Science Foundation of China (61972187); Open Project of Key Laboratory of Medical Big Data Engineering in Fujian Province (KLKF202301);Provincial Natural Science Foundation of Anhui (No. 2108085QF268); R &d Plan of Guangdong Province in key areas (2020B0101090005); the specific research fund of The Innovation Platform for Academician of Hainan Province (YSPTZX202145); Fujian Provincial Science and Technology Department Guided Project (2022H0012).

Author information

Authors and Affiliations

College of Computer and Data Science/College of Software, Fuzhou University, Fuzhou, 350108, China
Jinhuang Ye & Xianghan Zheng
School of Computing, University of Eastern Finland, Kuopio, 70211, Finland
Xiaozhi Gao
Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou, 350108, China
Zuoyong Li
School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, 518107, China
Jiawei Wu
School of Computer and Information, Anhui Polytechnic University, Wuhu, 241000, China
Xiaofeng Xu

Authors

Jinhuang Ye
View author publications
You can also search for this author in PubMed Google Scholar
Xiaozhi Gao
View author publications
You can also search for this author in PubMed Google Scholar
Zuoyong Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiawei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xianghan Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JY: investigation,formal analysis,writing—original draft; XG: project administration, supervision; ZL: writing—review and editing; JW: conceptualization, methodology, writing—original draft; XX: writing—review and editing; XZ: funding acquisition.

Corresponding authors

Correspondence to Zuoyong Li or Xianghan Zheng.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Main notations

In this section, we present a condensed overview of the key symbols utilized in this study and display them in a centralized visualization in Table 4, with the aim of facilitating readers’ comprehension and improving the lucidity of the research outcomes.

Table 4 Summary of the main notations of this paper

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ye, J., Gao, X., Li, Z. et al. Btda: basis transformation based distribution alignment for imbalanced semi-supervised learning. Int. J. Mach. Learn. & Cyber. 15, 3829–3845 (2024). https://doi.org/10.1007/s13042-024-02122-6

Download citation

Received: 16 September 2023
Accepted: 07 March 2024
Published: 09 April 2024
Issue Date: September 2024
DOI: https://doi.org/10.1007/s13042-024-02122-6

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Flexible Distribution Alignment: Towards Long-Tailed Semi-supervised Learning with Proper Calibration

Rethinking Distribution Alignment for Inter-class Fairness

Transfer and share: semi-supervised learning from long-tailed data

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendix: Main notations

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Btda: basis transformation based distribution alignment for imbalanced semi-supervised learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Flexible Distribution Alignment: Towards Long-Tailed Semi-supervised Learning with Proper Calibration

Rethinking Distribution Alignment for Inter-class Fairness

Transfer and share: semi-supervised learning from long-tailed data

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendix: Main notations

Appendix: Main notations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now