Abstract
This study explores Data Reduction Techniques (DRTs) in the realm of lazy classification algorithms like k-NN, focusing on Prototype Selection (PS) and Prototype Generation (PG) methods. The research provides an in-depth examination of these methodologies, categorizing DRTs into two primary categories: PS and PG, and further dividing them into three sub-categories: condensation methods, edition methods, and hybrid methods. An experimental study compares a total of 20 new and state-of-the-art DRTs across 20 datasets. The objective is to draw performance conclusions within both the primary and sub-categories, offering valuable insights into how these techniques enhance the effectiveness and robustness of the k-NN classifier. The paper provides a comprehensive overview of DRTs, clarifying their strategies and relative performances.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alcalá-Fdez, J., et al.: KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput. 13(3), 307–318 (2009). https://doi.org/10.1007/S00500-008-0323-Y
Angiulli, F.: Fast nearest neighbor condensation for large data sets classification. IEEE Trans. Knowl. Data Eng. 19(11), 1450–1464 (2007). https://doi.org/10.1109/TKDE.2007.190645
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967). https://doi.org/10.1109/TIT.1967.1053964
Decaestecker, C.: Finding prototypes for nearest neighbour classification by means of gradient descent and deterministic annealing. Pattern Recognit. 30(2), 281–288 (1997). https://doi.org/10.1016/S0031-3203(96)00072-6
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Fernández, F., Isasi, P.: Evolutionary design of nearest prototype classifiers. J. Heuristics 10(4), 431–454 (2004). https://doi.org/10.1023/B:HEUR.0000034715.70386.5B
García, S., Cano, J.R., Herrera, F.: A memetic algorithm for evolutionary prototype selection: a scaling up approach. Pattern Recognit. 41(8), 2693–2709 (2008). https://doi.org/10.1016/J.PATCOG.2008.02.006
García, S., Derrac, J., Cano, J.R., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012). https://doi.org/10.1109/TPAMI.2011.142
Gates, G.W.: The reduced nearest neighbor rule (corresp.). IEEE Trans. Inf. Theory 18(3), 431–433 (1972). https://doi.org/10.1109/TIT.1972.1054809
Giorginis, T., Ougiaroglou, S., Evangelidis, G., Dervos, D.A.: Fast data reduction by space partitioning via convex hull and MBR computation. Pattern Recognit. 126, 108553 (2022). https://doi.org/10.1016/J.PATCOG.2022.108553
Hart, P.E.: The condensed nearest neighbor rule (corresp.). IEEE Trans. Inf. Theory 14(3), 515–516 (1968). https://doi.org/10.1109/TIT.1968.1054155
Kasemtaweechok, C., Suwannik, W.: Prototype selection for k-nearest neighbors classification using geometric median. In: Proceedings of the Fifth International Conference on Network, Communication and Computing, ICNCC 2016, Kyoto, Japan, 17–21 December 2016, pp. 140–144. ACM (2016). https://doi.org/10.1145/3033288.3033301
Kim, S., Oommen, B.J.: A brief taxonomy and ranking of creative prototype reduction schemes. Pattern Anal. Appl. 6(3), 232–244 (2003). https://doi.org/10.1007/S10044-003-0191-0
Li, J., Dai, C.: Fast prototype selection algorithm based on adjacent neighbourhood and boundary approximation. Sci. Rep. 12(1), 20108 (2022). https://doi.org/10.1038/s41598-022-23036-9
Mukahar, N., Rosdi, B.A.: Performance comparison of prototype selection based on edition search for nearest neighbor classification. In: Zamli, K.Z., Mezhuyev, V., Benedicenti, L. (eds.) Proceedings of the 7th International Conference on Software and Computer Applications, ICSCA 2018, Kuantan, Malaysia, 08–10 February 2018, pp. 143–146. ACM (2018). https://doi.org/10.1145/3185089.3185145
Nanni, L., Lumini, A.: Particle swarm optimization for prototype reduction. Neurocomputing 72(4–6), 1092–1097 (2009). https://doi.org/10.1016/J.NEUCOM.2008.03.008
Olvera-López, J.A., Carrasco-Ochoa, J.A., Trinidad, J.F.M.: A new fast prototype selection method based on clustering. Pattern Anal. Appl. 13(2), 131–141 (2010). https://doi.org/10.1007/S10044-008-0142-X
Ougiaroglou, S., Evangelidis, G.: Efficient editing and data abstraction by finding homogeneous clusters. Ann. Math. Artif. Intell. 76(3–4), 327–349 (2016). https://doi.org/10.1007/S10472-015-9472-8
Ougiaroglou, S., Evangelidis, G.: RHC: a non-parametric cluster-based data reduction for efficient k-NN classification. Pattern Anal. Appl. 19(1), 93–109 (2016). https://doi.org/10.1007/S10044-014-0393-7
Rosero-Montalvo, P., et al.: Prototype reduction algorithms comparison in nearest neighbor classification for sensor data: Empirical study. In: 2017 IEEE Second Ecuador Technical Chapters Meeting (ETCM), pp. 1–5 (2017). https://doi.org/10.1109/ETCM.2017.8247530
Sánchez, J.S.: High training set size reduction by space partitioning and prototype abstraction. Pattern Recognit. 37(7), 1561–1564 (2004). https://doi.org/10.1016/J.PATCOG.2003.12.012
Sánchez, J.S., Pla, F., Ferri, F.J.: Prototype selection for the nearest neighbour rule through proximity graphs. Pattern Recognit. Lett. 18(6), 507–513 (1997). https://doi.org/10.1016/S0167-8655(97)00035-4
Skalak, D.B.: Prototype and feature selection by sampling and random mutation hill climbing algorithms. In: Cohen, W.W., Hirsh, H. (eds.) Machine Learning, Proceedings of the Eleventh International Conference, Rutgers University, New Brunswick, NJ, USA, 10–13 July 1994, pp. 293–301. Morgan Kaufmann (1994). https://doi.org/10.1016/B978-1-55860-335-6.50043-X
Tomek, I.: An experiment with the edited nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. SMC-6(6), 448–452 (1976). https://doi.org/10.1109/TSMC.1976.4309523
Triguero, I., Derrac, J., García, S., Herrera, F.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans. Syst. Man Cybern. Part C 42(1), 86–100 (2012). https://doi.org/10.1109/TSMCC.2010.2103939
Valero-Mas, J.J., Calvo-Zaragoza, J., Rico-Juan, J.R., Iñesta, J.M.: A study of prototype selection algorithms for nearest neighbour in class-imbalanced problems. In: Alexandre, L.A., Salvador Sánchez, J., Rodrigues, J.M.F. (eds.) IbPRIA 2017. LNCS, vol. 10255, pp. 335–343. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58838-4_37
Verbiest, N., Cornelis, C., Herrera, F.: FRPS: a fuzzy rough prototype selection method. Pattern Recognit. 46(10), 2770–2782 (2013). https://doi.org/10.1016/J.PATCOG.2013.03.004
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. SMC-2(3), 408–421 (1972). https://doi.org/10.1109/TSMC.1972.4309137
Acknowledgements
This paper is a result of research conducted within the “MSc in Artificial Intelligence and Data Analytics” of the Department of Applied Informatics of the University of Macedonia. The publication/presentation of the paper is funded by the University of Macedonia Research Committee.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 IFIP International Federation for Information Processing
About this paper
Cite this paper
Eleftheriadis, S., Evangelidis, G., Ougiaroglou, S. (2024). An Empirical Analysis of Data Reduction Techniques for k-NN Classification. In: Maglogiannis, I., Iliadis, L., Macintyre, J., Avlonitis, M., Papaleonidas, A. (eds) Artificial Intelligence Applications and Innovations. AIAI 2024. IFIP Advances in Information and Communication Technology, vol 714. Springer, Cham. https://doi.org/10.1007/978-3-031-63223-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-63223-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-63222-8
Online ISBN: 978-3-031-63223-5
eBook Packages: Computer ScienceComputer Science (R0)