[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3033288.3033301acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicnccConference Proceedingsconference-collections
research-article

Prototype Selection for k-Nearest Neighbors Classification Using Geometric Median

Published: 17 December 2016 Publication History

Abstract

The k-Nearest Neighbors classifier (kNN) is a well-known classifier implemented extensively in the data mining research area. The kNN classifier suffers from several drawbacks such as high storage requirements, computational complexity and high sensitivity to noise. Prototype selection is a promising solution for this problem as it reduces the number of data instances. This study proposes Geometric Median Prototype Selection (GMPS) algorithm which is a new efficient method of prototype selection based on the Geometric Median (GM). A set of GMs are selected as the relevant prototypes of the dataset. The selected prototypes form a training set for building a kNN classifier. After creating the classifier, it is tested on a testing set. The performance is measured in terms of accuracy, kappa and processing time and compared with seven state-of-the-art methods on nine standard datasets. The result shows that GMPS methods provide better performance in accuracy, kappa than all considered PS methods while proposed methods are at least 3.5 times faster than other PS methods and 5.5 times faster than 1NN baseline model. However, the proposed classifiers lost to the baseline classifier about 2 percent of accuracy rate and 0.05 of Cohen's kappa statistics.

References

[1]
Aha, D.W., Kibler, D. and Albert, M. K. 1991. Instance-Based Learning Algorithms. MACH LEARN. 6 (Jan. 1991), 37--66.
[2]
Alcalá-Fdez, J., Sánchez, L., García, S., Del Jesus, M. J., Ventura, S., Garrell, J. M., Otero, J., Romero, C., Bacardit, J., Rivas, V. M., Fernández and J. C., Herrera, F. 2009. KEEL: A Software Tool to Assess Evolutionary Algorithms to Data Mining Problems. SOFT COMPUT. 13 (Feb. 2009), 307--318.
[3]
Altman, N. S. 1992. An introduction to kernel and nearest-neighbor nonparametric regression. AM STAT. 46 (Feb. 1992), 175--185.
[4]
Angiulli, F. 2007. Fast nearest neighbor condensation for large data sets classification. IEEE T KNOWL DATA EN. 19 (Nov. 2007), 1450--1464.
[5]
Brighton, H. and Mellish, C. 2002. Advances In Instance Selection For Instance-Based. DATA MIN KNOWL DISC. 6 (Apr. 2002), 153--172.
[6]
Cover, T. M. and Hart, P. E. 1967. Nearest neighbor pattern classification. IEEE T INFORM THEORY. 13 (Jan. 1967), 21--27.
[7]
Du, K. L. and Swamy, M.N.S. 2016. Search and Optimization by Metaheuristics Techniques and Algorithms Inspired by Nature, Springer Int. Ltd., Switzerland.
[8]
García, S., Derrac, J., Cano, J. R. and Herrera F. 2012. Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study. IEEE T PATTERN ANAL. 34 (Mar. 2012), 417--435.
[9]
Gates, G. W. 1972. The Reduced Nearest Neighbor Rule, INFORM THEORY. 18 (May 1972), 431--433.
[10]
Hamo, Y. and Markovitch, S. 2005. The compset algorithm for subset selection. Proceedings of the 19th international joint conference on Artificial intelligence (Edinburgh, Scotland, Jul. 30 -- Aug. 05, 2005). Morgan Kaufmann, SF, CA, 728--733.
[11]
Hart, P. E. 1968. The Condensed Nearest Neighbor Rule. IEEE T INFORM THEORY. 14 (May. 1968), 515--516.
[12]
Kirkpatrick, S. Gelatt Jr, C. D. and Vecchi, M. P. 1983. Optimization by Simulated Annealing. SCIENCE. 220 (May 1983), 671--680.
[13]
Lichman, M. 2013. UCI Machine Learning Repository {http://archive.ics.uci.edu/ml}. Irvine, CA: University of California, School of Information and Computer Science.
[14]
Marchiori E. 2008. Hit Miss Networks with Applications to Instance Selection. J MACH LEARN RES. 9 (Jan. 2008), 997--1017.
[15]
Marchiori, E. 2010. Class Conditional Nearest Neighbor for Large Margin Instance Selection. IEEE T PATTERN ANAL.32 (Feb. 2010), 364--370.
[16]
Olvera-López, J. A., Carrasco-Ochoa, J. A., Martínez-Trinidad, J. F. 2010. A review of instance selection methods. ARTIF INTELL REV. 34 (Aug. 2010), 133--143.
[17]
Sanchez, J. S. 2004. High training set size reduction by space partitioning and prototype abstraction. PATTERN RECOGN. 37 (Jul. 2004), 1561--1564.
[18]
Tomek, I. 1976. An Experiment with the Edited Nearest-Neighbor Rule. IEEE SYS MAN CYBERN. 6 (Jun. 1976), 448--452.

Cited By

View all
  • (2024)An Empirical Analysis of Data Reduction Techniques for k-NN ClassificationArtificial Intelligence Applications and Innovations10.1007/978-3-031-63223-5_7(83-97)Online publication date: 21-Jun-2024
  • (2022)Quad division prototype selection-based k-nearest neighbor classifier for click fraud detection from highly skewed user click datasetEngineering Science and Technology, an International Journal10.1016/j.jestch.2021.05.01528(101011)Online publication date: Apr-2022
  • (2019)Adaptive geometric median prototype selection method for k-nearest neighbors classificationIntelligent Data Analysis10.3233/IDA-18419023:4(855-876)Online publication date: 26-Sep-2019

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICNCC '16: Proceedings of the Fifth International Conference on Network, Communication and Computing
December 2016
343 pages
ISBN:9781450347938
DOI:10.1145/3033288
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 December 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data preprocessing
  2. Nearest neighbors classification
  3. Prototype selection

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICNCC '16

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An Empirical Analysis of Data Reduction Techniques for k-NN ClassificationArtificial Intelligence Applications and Innovations10.1007/978-3-031-63223-5_7(83-97)Online publication date: 21-Jun-2024
  • (2022)Quad division prototype selection-based k-nearest neighbor classifier for click fraud detection from highly skewed user click datasetEngineering Science and Technology, an International Journal10.1016/j.jestch.2021.05.01528(101011)Online publication date: Apr-2022
  • (2019)Adaptive geometric median prototype selection method for k-nearest neighbors classificationIntelligent Data Analysis10.3233/IDA-18419023:4(855-876)Online publication date: 26-Sep-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media