Abstract
For the semi-supervised regression task, both the similarity of paired samples and the limited label information serve as core indicators. Nevertheless, most traditional semi-supervised regression methods cannot make full use of both simultaneously. To alleviate the above deficiency, this paper proposes a novel semi-supervised regression with label-guided adaptive graph optimization (LGAGO-SSR). Basically, LGAGO-SSR involves two phases: graph representation and label-guided adaptive graph construction. The first phase seeks two low-dimensional manifold spaces based on two similarity matrices. The second phase aims at adaptively learning these similarity matrices by integrating the data structure information in both the low-dimensional manifold spaces and the label spaces. Each phase has its optimization problems, and the final solution is obtained by iteratively solving problems in two phases. Additionally, the idea of decomposition optimization in twin support vector regression (TSVR) is used to accelerate the training of our LGAGO-SSR. Regression results on 12 benchmark datasets with different unlabeled rates demonstrate the effectiveness of LGAGO-SSR in semi-supervised regression tasks.
Graphical abstract
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability and access
After obtaining a license to use the data, the data can be accessed by visiting the following websites: https://archive.ics.uci.edu/ml/index.php, https://hastie.su.domains/ElemStatLearn/data.html and https://tianchi.aliyun.com/dataset/159885. Users can use the data for study and research purposes, but not for commercial purposes.
Notes
https://tianchi.aliyun.com/dataset/159885
References
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Statistics and Computing 14:199–222
Myles AJ, Feudale RN, Liu Y, Woody NA, Brown SD (2004) An introduction to decision tree modeling. Journal of Chemometrics: J Chemometr Soc 18(6):275–285
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Stat Methodol 58(1):267–288
Czajkowski M, Jurczuk K, Kretowski M (20230) Steering the interpretability of decision trees using lasso regression - an evolutionary perspective. Inform Sci 638118944
Jain N, Jana PK (2023) LRF: A logically randomized forest algorithm for classification and regression problems. Expert Syst Appl 213.(Part C) 119225
Chen H, Wu L, Chen J, Lu W, Ding J (2022) A comparative study of automated legal text classification using random forests and deep learning. Inf Process Manag 59(2):102798
Zhou Z, Li M (2005) Semi-supervised regression with co-training. In: Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, pp 908–913. Morgan Kaufmann, San Francisco, USA
Zhou Z, Li M (2007) Semisupervised regression with cotraining-style algorithms. IEEE Trans Knowl Data Eng 19(11):1479–1493
Wang X, Ma L, Wang X (2010) Apply semi-supervised support vector regression for remote sensing water quality retrieving. IEEE Int Geosci Remote Sens Symp. IEEE, Piscataway, USA, pp 2757–2760
Emadi M, Tanha J, Shiri ME, Aghdam MH (2021) A selection metric for semi-supervised learning based on neighborhood construction. Inf Process Manag 58(2):102444
Lin K, Pai P, Lu Y, Chang P (2013) Revenue forecasting using a least-squares support vector regression model in a fuzzy environment. Inf Sci 220196–209
Yue Y, Wang G, Hu J, Li Y (2023) An improved label propagation algorithm based on community core node and label importance for community detection in sparse network. Appl Intell 5317935–17951
Hua Z, Yang Y (2022) Robust and sparse label propagation for graph-based semi-supervised classification. Appl Intell 523337–3351
Hua Z, Yang Y, Qiu H (2021) Node influence-based label propagation algorithm for semi-supervised learning. Neural Comput & Applic 332753–2768
Wang B, Tsotsos J (2016) Dynamic label propagation for semi-supervised multi-class multi-label classification. Pattern Recogn 5275–84
Yoo J, Kim HJ (2014) Semisupervised location awareness in wireless sensor networks using Laplacian support vector regression. Int J Distrib Sensor Netw 10265801
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 72399–2434
Yu J, Son Y (2021) Weighted co-association rate-based Laplacian regularized label description for semi-supervised regression. Inf Sci 545688–712
Yu Z, Ye F, Yang K, Cao W, Chen CLP, Cheng L, You J, Wong H (2022) Semisupervised classification with novel graph construction for high-dimensional data. IEEE Trans Neural Netw Learn Syst 33(1):75–88
Zhou B, Liu W, Zhang W, Lu Z, Tan Q (2022) Multi-kernel graph fusion for spectral clustering. Inf Process Manag 59(5):103003
Nie F, Dong X, Li X (2021) Unsupervised and semisupervised projection with graph optimization. IEEE Trans Neural Netw Learn Syst 32(4):1547–1559
Wang S, Chen Y, Yi S, Chao G (2022) Frobenius norm-regularized robust graph learning for multi-view subspace clustering. Appl Intell 52(13):14935–14948
Zhang R, Nie F, Li X (2019) Semisupervised learning with parameter-free similarity of label and side information. IEEE Trans Neural Netw Learn Syst 30(2):405–414
Zhang L, Liu Z, Pu J, Song B (2020) Adaptive graph regularized nonnegative matrix factorization for data representation. Appl Intell 50438–447
Li D, Madden AD (2019) Cascade embedding model for knowledge graph inference and retrieval. Inf Process Manag 56(6):102093
Chen L, Zhong Z (2022) Adaptive and structured graph learning for semi-supervised clustering. Inf Process Manag 59(4):102949
Liu J, Lin M, Zhao M, Zhan C, Li B, Chui JKT (2023) Person re-identification via semi-supervised adaptive graph embedding. Appl Intell 53(3):2656–2672
Zhang L, Zhou W, Chang P, Liu J, Yan Z, Wang T, Li FZ (2012) Kernel sparse representation-based classifier. IEEE Trans Signal Process 60(4):1684–1695
Nie F, Wang X, Huang H (2014) Clustering and projected clustering with adaptive neighbors. International Conference on Knowledge Discovery and Data Mining. ACM, New York, USA, pp 977–986
Peng X (2010) TSVR: A n efficient twin support vector machine for regression. Neural Netw 23(3):365–372
Zhuang L, Zhou Z, Gao S, Yin J, Lin Z, Ma Y (2017) Label information guided graph construction for semi-supervised learning. IEEE Trans Image Process 26(9):4182–4192
Peng X, Chen D, Xu D (2019) Hyperplane-based nonnegative matrix factorization with label information. Inf Sc 4931–19
Liu Z, Wang T, Zhu F, Chen X, Pelusi D, Vasilakos AV (2024) Domain adaptive learning based on equilibrium distribution and dynamic subspace approximation. Expert Syst Appl 249123673
Kiyadeh APH, Zamiri A, Yazdi HS, Ghaemi H (2015) Discernible visualization of high dimensional data using label information. Appl Soft Comput 27474–486
Zhu X, Ghahramani Z (2002)Learning from labeled and unlabeled data with label propagation. Technical Report, Technical Report CMU-CALD-02–107, Carnegie Mellon University
Quinlan JR (1993) Combining instance-based and model-based learning. In: Machine Learning, Proceedings of the Tenth International Conference, University of Massachusetts, pp 236–243. Morgan Kaufmann, San Francisco, USA
Breiman L, Friedman JH (1985) Estimating optimal transformations for multiple regression and correlation. J Am Stat Assoc 80(391):580–598
Zhou F, Q C, King RD (2014) Predicting the geographical origin of music. 2014 IEEE Int Conf Data Min. IEEE Computer Society, Los Alamitos, USA, pp 1115–1120
Yeh IC, Hsu TK (2018) Building real estate valuation models with comparative approach through case-based reasoning. Appl Soft Comput 65260–271
Yeh IC (1998) Modeling of strength of high-performance concrete using artificial neural networks. Cem Concr Res 28(12):1797–1808
Yeh IC (2007) Modeling slump flow of concrete using second-order regressions and artificial neural networks. Cem Concr Res 29(6):474–480
Grisoni F, Consonni V, Vighi M, Villa S, Todeschini R (2016) Investigating the mechanisms of bioconcentration through QSAR classification trees. Environ Int 88198–205
Akbilgic O, Bozdogan H, Balaban ME (2014) A novel hybrid RBF neural networks model as a forecaster. Stat Comput 24(3):365–375
Owen AB (1999) Tubular neighbors for regression and classification. Citeseer
Nash W, Sellers T, Talbot S, Cawthorn A, Ford W (1995) Abalone. UCI Machine Learning Repository. https://doi.org/10.24432/C55C7W
Cortez P, Cerdeira A, Almeida F, Matos T, Reis J (2009) Modeling wine preferences by data mining from physicochemical properties. Decision support systems 47(4):547–553
Timilsina M, Figueroa A, d’Aquin M, Yang H (2021) Semi-supervised regression using diffusion on graphs. Appl Soft Comput 104107188
Liu L, Huang P, Yu H, Min F (2023) Safe co-training for semi-supervised regression. Intelligent Data Analysis 27:959–975
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res Learning Research 7(1):1–30
Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52–64
Acknowledgements
We would like to thank five anonymous reviewers and Editor for their valuable comments and suggestions, which have significantly improved this paper. This work was supported in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 19KJA550002, by the Six Talent Peak Project of Jiangsu Province of China under Grant No. XYDXX-054, by the Priority Academic Program Development of Jiangsu Higher Education Institutions, and by the Collaborative Innovation Center of Novel Software Technology and Industrialization.
Author information
Authors and Affiliations
Contributions
Xiaohan Zheng: Methodology, Software, Validation, Writing - original draft, Writing - review & editing. Li Zhang: Conceptualization, Methodology, Software, Writing - review & editing, Validation, Project administration, Funding acquisition. Leilei Yan: Investigation, Software, Visualization. Lei Zhao: Investigation, Software, Visualization.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical and informed consent for data used
This article does not contain any studies with human participants or animals performed by any of the authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendices
1.1 Appendix A: Derivation from (13) to (17)
Let \(v^{low}_{ij}={\Vert \varvec{w}_1^T \varvec{x}_i-\varvec{w}_1^T \varvec{x}_j\Vert }_2^2+\Vert y^{low}_i-y^{low}_j\Vert _2^2\), then the objective function in (13) can be rewritten as
where \(\varvec{v}^{low}_i=[v^{low}_{i1},v^{low}_{i2},\cdots ,v^{low}_{in}]^T\).
Because \(\lambda ^{low}_i\) and \(\varvec{v}^{low}_{i}\) are unrelated to \(\varvec{s}_i^{low}\), the last term in (57) is a constant. Therefore, the objective function in (13) is equivalent to
In a summary, we have completed the derivation from (13) to (17).
1.2 Appendix B: Proof of Theorem 1
The sequences \(\left\{ \left( \varvec{w}_1^{(p)},b_1^{(p)},\varvec{\xi }_1^{(p)},{\varvec{S}^{low}}^{(p)},{\varvec{y}^{low}}^{(p)}\right) \right\} \) and \(\left\{ \!(\varvec{w}_2^{(p)},b_2^{(p)},{\varvec{\xi }_2}^{(p)},{\varvec{S}^{up}}^{(p)},{\varvec{y}^{up}}^{(p)})\!\right\} \) generated by Algorithm 1 can guarantee that \(\left\{ G_1(\varvec{w}_1^{(p)},b_1^{(p)},\varvec{\xi }_1^{(p)},{\varvec{S}^{low}}^{(p)},{\varvec{y}^{low}}^{(p)})\right\} \) and \(\left\{ G_2(\varvec{w}_2^{(p)},b_2^{(p)},{\varvec{\xi }_2}^{(p)},{\varvec{S}^{up}}^{(p)},{\varvec{y}^{up}}^{(p)})\right\} \) are monotonic decreasing and bounded, respectively, where p refers to the current number of iterations.
Assume that p is the current iteration. First, we prove that the sequence \(\left\{ G_1\left( \varvec{w}_1^{(p)},b_1^{(p)},\varvec{\xi }_1^{(p)},{\varvec{S}^{low}}^{(p)},{\varvec{y}^{low}}^{(p)}\right) \right\} \) decreases monotonically. Given \(\left( \varvec{w}_1^{(p)},b_1^{(p)},\varvec{\xi }_1^{(p)}\right) \), we optimize (13) to obtain \(\left( {\varvec{S}^{low}}^{(p+1)},{\varvec{y}^{low}}^{(p+1)}\right) \), thus we have
Given \(\left( {\varvec{S}^{low}}^{(p+1)},{\varvec{y}^{low}}^{(p+1)}\right) \), we optimize (9) to obtain \(\left( \varvec{w}_1^{(p+1)},b_1^{(p+1)},\varvec{\xi }_1^{(p+1)}\right) \), thus we have
By combining (59) and (60), we have
which indicates that the sequence \(\Big \{G_1\Big (\varvec{w}_1^{(p)},b_1^{(p)},\varvec{\xi }_1^{(p)},\) \({\varvec{S}^{low}}^{(p)},{\varvec{y}^{low}}^{(p)}\Big )\Big \}\) for \(p=1, 2, \cdots \) is monotonic decreasing. In the same way, we can prove that the sequence \(\left\{ G_2\left( \varvec{w}_2^{(p)},b_2^{(p)},{\varvec{\xi }_2}^{(p)},{\varvec{S}^{up}}^{(p)},{\varvec{y}^{up}}^{(p)}\right) \right\} \) is monotonic dec-reasing when \(p=1,2,\cdots \).
Next, we prove that both the sequences \(\Big \{G_1\Big (\varvec{w}_1^{(p)},b_1^{(p)},\) \(\varvec{\xi }_1^{(p)},{\varvec{S}^{low}}^{(p)},{\varvec{y}^{low}}^{(p)}\Big )\Big \}\) and \(\Big \{G_2\Big (\varvec{w}_2^{(p)},b_2^{(p)},{\varvec{\xi }_2}^{(p)},{\varvec{S}^{up}}^{(p)},\) \({\varvec{y}^{up}}^{(p)}\Big )\Big \}\) have the infimum. In (15) and (16), we know that \(s^{low}_{ij}\), \(s^{up}_{ij}\), \(\lambda ^{low}_i\), \(\lambda ^{up}_i\), and \(C_i~(i=1,2,3,4)\) are all great than zero. Thus, it is easy to infer that
and
In the other word, \(\left\{ G_1\!\left( \varvec{w}_1^{(p)},b_1^{(p)},\varvec{\xi }_1^{(p)},\!{\varvec{S}^{low}}^{(p)},{\varvec{y}^{low}}^{(p)}\right) \!\right\} \) and \(\left\{ G_2\left( \varvec{w}_2^{(p)},b_2^{(p)},{\varvec{\xi }_2}^{(p)},{\varvec{S}^{up}}^{(p)},{\varvec{y}^{up}}^{(p)}\right) \right\} \) have the infimum that equals 0.
This completes the proof of Theorem 1.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zheng, X., Zhang, L., Yan, L. et al. Semi-supervised regression with label-guided adaptive graph optimization. Appl Intell 54, 10671–10694 (2024). https://doi.org/10.1007/s10489-024-05766-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05766-7