Abstract
Active learning iteratively constructs a refined training set to train an effective classifier with as few labeled instances as possible. In areas where labeling is expensive, active learning plays an important and irreplaceable role. The main challenge of active learning is to correctly identify critical samples. One of the current mainstream methods is to mine the potential data structure based on clustering and then identify key instances. However, the existing methods all adopt deterministic strategies, and the number of key samples is only related to the number of samples to be classified. The internal structure information of the sample clusters to be classified is not used. After analysis and verification, this deterministic key sample selection strategy has serious label waste. This is a serious problem that urgently needs to be solved in active learning. To this end, we propose an adaptive active learning algorithm based on density clustering (AAKC). Firstly, we introduce k-nearest neighbor information to redefine the local density of the instance. The new sample density can clearly express the local structural information of the sample. Secondly, we developed an adaptive key instance selection strategy based on the k-nearest neighbor sample density, which can adaptively select the necessary number of instance queries according to the structural information of the instance clusters to be classified, avoiding label waste. The experimental results of comparison with other algorithms show that our algorithm uses fewer labels to achieve better classification accuracy and has excellent stability.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Li Y, Fan B, Zhang W, Ding W, Yin J (2021) Deep active learning for object detection. Inf Sci 579:418–433
Deng C, Liu X, Li C, Tao D (2018) Active multi-kernel domain adaptation for hyperspectral image classification. Pattern Recogn 77:306–315
Cao X, Yao J, Xu Z, Meng D (2020) Hyperspectral image classification with convolutional neural network and active learning. IEEE Trans Geosci Remote Sens 58(7):4604–4616
Haut JM, Paoletti ME, Plaza J, Li J, Plaza A (2018) Active learning with convolutional neural networks for hyperspectral image classification using a new Bayesian approach. IEEE Trans Geosci Remote Sens 56(11):6440–6461
Kansizoglou I, Bampis L, Gasteratos A (2019) An active learning paradigm for online audio-visual emotion recognition. IEEE Trans Affect Comput 13(2):756–768
Reyes O, Ventura S (2018) Evolutionary strategy to perform batch-mode active learning on multi-label data. ACM Trans Intell Syst Technol 9(4):1–26
Guo J, Pang Z, Bai M, Xie P, Chen Y (2021) Dual generative adversarial active learning. Appl Intell 51(8):5953–5964
McCallumzy AK, Nigamy K (1998) Employing EM and pool-based active learning for text classification. In: Proceedings of the international conference on machine learning, pp 359–367
Dasgupta S, Hsu D (2008) Hierarchical sampling for active learning. In: Proceedings of the 25th international conference on machine learning, pp 208–215
Wang M, Min F, Zhang ZH, Wu YX (2017) Active learning through density clustering. Expert Syst Appl 85:305–317
Xie J, Gao H, Xie W (2016) K-nearest neighbor optimized density peak fast searching clustering algorithm. Chin Sci Inf Sci 46(2):258–280
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344 (6191):1492–1496
Huang SJ, Jin R, Zhou ZH (2010) Active learning by querying informative and representative examples. Adv Neural Inf Process Syst 23:892–900
Seung HS, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the fifth annual workshop on computational learning theory, pp 287–294
Gilad-Bachrach R, Navot A, Tishby N (2003) Kernel query by committee (KQBC). Leibniz Cent Hebr Univ Jerus Israel Tech Rep 88:2004
Min F, Zhang SM, Ciucci D, Wang M (2020) Three-way active learning through clustering selection. Int J Mach Learn Cybern 11(5):1033–1046
Wang M, Zhang YY, Min F, Deng LP, Gao L (2020) A two-stage density clustering algorithm. Soft Comput 24:17797–17819
Blake C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository. Accessed 01 Dec 2021
Han J, Pei J, Tong H (2022) Data mining: concepts and techniques. Morgan Kaufmann
Xiang Z, Zhang L (2012) Research on an optimized C4. 5 algorithm based on rough set theory. In: 2012 international conference on management of e-commerce and e-government, pp 272–274
Rish I (2001) An empirical study of the naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, vol 3. no. 22, pp 41–46
Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22
Cortes EA, Martinez MG, Rubio NG (2007) Multiclass corporate failure prediction by Adaboost. M1. Int Adv Econ Res 13(3):301–312
Ruan YX, Lin HT, Tsai MF (2014) Improving ranking performance with cost-sensitive ordinal classification via regression. Inf Retr 17(1):1–20
Cai YD, Feng KY, Lu WC, Chou KC (2006) Using LogitBoost classifier to predict protein structural classes. J Theor Biol 238(1):172–176
Quinlan JR (1996) Bagging, boosting, and C4. 5. In: AAAI/IAAI, vol 1. pp 725–730
Afshar S, Mosleh M, Kheyrandish M (2013) Presenting a new multiclass classifier based on learning automata. Neurocomputing 104:97–104
Suoliang Z, Tianshu Z, Ming L, Kunlun L, Baozong Y (2010) An experimental study of classifier filtering, 361–364
Frank E, Hall MA, Witten IH (2016) The WEKA Workbench. Online appendix for “Data mining: practical machine learning tools and techniques”, Morgan Kaufmann, Fourth Edition, 2016
Cai D, He X (2011) Manifold adaptive experimental design for text categorization. IEEE Trans Knowl Data Eng 24(4):707–719
Munoz-Mari J, Tuia D, Camps-Valls G (2012) Semisupervised classification of remote sensing images with active queries. IEEE Trans Geosci Remote Sens 50(10):3751–3763
Acknowledgements
This work was supported in part by the Natural Science Foundation of China under Grant 61972001, in part by the General Project of Anhui Natural Science Foundation under Grant 1908085MF188 and 2108085MF212, and in part by the Key Projects of Natural Science Foundation of Anhui Province Colleges and Universities under Grant KJ2020A0041.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ji, X., Ye, W., Li, X. et al. Adaptive active learning through k-nearest neighbor optimized local density clustering. Appl Intell 53, 14892–14902 (2023). https://doi.org/10.1007/s10489-022-04169-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04169-w