A new fast prototype selection method based on clustering

J. Arturo Olvera-López¹,
J. Ariel Carrasco-Ochoa¹ &
J. Francisco Martínez-Trinidad¹

932 Accesses
81 Citations
Explore all metrics

Abstract

In supervised classification, a training set T is given to a classifier for classifying new prototypes. In practice, not all information in T is useful for classifiers, therefore, it is convenient to discard irrelevant prototypes from T. This process is known as prototype selection, which is an important task for classifiers since through this process the time for classification or training could be reduced. In this work, we propose a new fast prototype selection method for large datasets, based on clustering, which selects border prototypes and some interior prototypes. Experimental results showing the performance of our method and comparing accuracy and runtimes against other prototype selection methods are reported.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Fast Algorithm for Prototypes Selection—Trust-Margin Prototypes

A Density-Based Prototype Selection Approach

Fast Tree-Based Classification via Homogeneous Clustering

Notes

These runtimes were obtained using an Intel Celeron CPU 2.4 GHz, 512 MB RAM.
For SVM, we used the software from [28], for C4.5 and Naive Bayes WEKA [29] was used, LWLR and k-NN were implemented in MATLAB [30].

References

Kuncheva LI, Bezdek JC (1998) Nearest prototype classification, clustering, genetic algorithms, or random search? IEEE Trans Syst Man Cybern C28(1):160–164
Google Scholar
Bezdek JC, Kuncheva LI (2001) Nearest prototype classifier designs: an experimental study. Int J Intell Syst 16(12):1445–1473
Article MATH Google Scholar
Wilson DR, Martínez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38:257–286
Article MATH Google Scholar
Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Disc 6(2):153–172
Article MATH MathSciNet Google Scholar
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27
Article MATH Google Scholar
Atkeson CG, Moorel AW, Schaal S (1997) Locally weighted learning. Artif Intell Rev 11(1–5):11–73
Article Google Scholar
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
MATH Google Scholar
Vapnik VN (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Cristanni N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
Google Scholar
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo
Google Scholar
Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York
Google Scholar
Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14:515–516
Article Google Scholar
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2:408–421
Article MATH Google Scholar
Chidananda GK, Krishna G (1979) The condensed nearest neighbor rule using the concept of mutual nearest neighborhood. IEEE Trans Inf Theory 25:488–490
Article Google Scholar
Chien-Hsing C, Bo-Han K, Fu C (2006) The generalized condensed nearest neighbor rule as a data reduction method. In: Proceedings of the 18th international conference on pattern recognition. IEEE Computer Society, Hong-Kong, pp 556–559
Google Scholar
Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern 6–6:448–452
MathSciNet Google Scholar
Devijver PA, Kittler J (1980) On the edited nearest neighbor rule. In: Proceedings of the fifth international conference on pattern recognition, Los Alamitos, CA, pp 72–80
Liu H, Motoda H (2002) On issues of instance selection. Data Min Knowl Disc 6:115–130
Article MathSciNet Google Scholar
Spillmann B, Neuhaus M, Bunke H, Pękalska E, Duin RPW (2006) Transforming strings to vector spaces using prototype selection. In: Yeung D-Y et al (eds) SSPR & SPR 2006, Lecture Notes in Computer Science, vol 4109, Hong-Kong, pp 287–296
Lumini A, Nanni L (2006) A clustering method for automatic biometric template selection. Pattern Recogn 39:495–497
Article MATH Google Scholar
Venmann CJ, Reinders MJT (2005) The nearest sub-class classifier: a compromise between the nearest mean and nearest neighbor classifier. IEEE Trans Pattern Anal Mach Intell 27(9):1417–1429
Article Google Scholar
Venmann CJ, Reinders MJT, Backer E (2002) A maximum variance clustering algorithm. IEEE Trans Pattern Anal Mach Intell 24(9):1273–1280
Article Google Scholar
Mollineda RA, Ferri FJ, Vidal E (2002) An efficient prototype merging strategy for the condensed 1-NN rule through class-conditional hierarchical clustering. Pattern Recogn 35:2771–2782
Article MATH Google Scholar
Raicharoen T, Lursinsap C (2005) A divide-and-conquer approach to the pairwise opposite class-nearest neighbor (POC-NN) algorithm. Pattern Recognit Lett 26(10):1554–1567
Article Google Scholar
Karaçali B, Krim H (2002) Fast minimization of structural risk by nearest neighbor rule. IEEE Trans Neural Netw 14:127–137
Article Google Scholar
Asuncion A, Newman DJ (2007) UCI machine learning repository. In: University of California, School of Information and Computer Science, Irvine, CA. http://www.ics.uci.edu/~mlearn/MLRepository.html
Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1924
Article Google Scholar
Vojtech F, Václav H (2004) Statistical pattern recognition toolbox for Matlab. Research report, Center for Machine Perception Department of Cybernetic, Faculty of Electrical Engineering, Czech Technical University
Witten IH, Frank E (2005) Data mining: practical machine learning tools techniques, 2nd edn. Morgan Kaufmann, San Francisco
MATH Google Scholar
The MathWorks Inc. (1994–2008) Natick. [http://www.mathworks.com]

Download references

Author information

Authors and Affiliations

Computer Science Department, National Institute of Astrophysics, Optics and Electronics, Luis Enrrique Erro No. 1, Sta. María Tonantzintla, CP 72000, Puebla, Mexico
J. Arturo Olvera-López, J. Ariel Carrasco-Ochoa & J. Francisco Martínez-Trinidad

Authors

J. Arturo Olvera-López
View author publications
You can also search for this author in PubMed Google Scholar
J. Ariel Carrasco-Ochoa
View author publications
You can also search for this author in PubMed Google Scholar
J. Francisco Martínez-Trinidad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. Arturo Olvera-López.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Olvera-López, J.A., Carrasco-Ochoa, J.A. & Martínez-Trinidad, J.F. A new fast prototype selection method based on clustering. Pattern Anal Applic 13, 131–141 (2010). https://doi.org/10.1007/s10044-008-0142-x

Download citation

Received: 15 February 2008
Accepted: 11 September 2008
Published: 13 January 2009
Issue Date: May 2010
DOI: https://doi.org/10.1007/s10044-008-0142-x

A new fast prototype selection method based on clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fast Algorithm for Prototypes Selection—Trust-Margin Prototypes

A Density-Based Prototype Selection Approach

Fast Tree-Based Classification via Homogeneous Clustering

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A new fast prototype selection method based on clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fast Algorithm for Prototypes Selection—Trust-Margin Prototypes

A Density-Based Prototype Selection Approach

Fast Tree-Based Classification via Homogeneous Clustering

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation