[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Three new instance selection methods based on local sets

Published: 01 April 2015 Publication History

Abstract

The local set is the largest hypersphere centered on an instance such that it does not contain instances from any other class. Due to its geometrical nature, this structure can be very helpful for distance-based classification, such as classification based on the nearest neighbor rule. This paper is focused on instance selection for nearest neighbor classification which, in short, aims to reduce the number of instances in the training set without affecting the classification accuracy. Three instance selection methods based on local sets, which follow different and complementary strategies, are proposed. In an experimental study involving 26 known databases, they are compared with 11 of the most successful state-of-the-art methods in standard and noisy environments. To evaluate their performances, two complementary approaches are applied, the Pareto dominance relation and the Technique for Order Preference by Similarity to Ideal Solution. The results achieved by the proposals reveal that they are among the most effective methods in this field. HighlightsWe propose three selection strategies with different accuracy-reduction tradeoff.We assess them on 26 known databases with more than 1000 instances each one.The results are compared with those of 11 successful state-of-the-art methods.According to different criteria, the new methods are always among the top performers.

References

[1]
T. Cover, P. Hart, Nearest neighbor pattern classification, IEEE Trans. Inf. Theory, 13 (1967) 21-27.
[2]
H. Yan, X. Yuan, S. Yan, J. Yang, Correntropy based feature selection using binary projection, Pattern Recognit., 44 (2011) 2834-2842.
[3]
X. Sun, Y. Liu, J. Li, J. Zhu, H. Chen, X. Liu, Feature evaluation and selection with cooperative game theory, Pattern Recognit., 45 (2012) 2992-3002.
[4]
J.A. Olvera-López, J.A. Carrasco-Ochoa, J.F. Martínez-Trinidad, J. Kittler, A review of instance selection methods, Artif. Intell. Rev., 34 (2010) 133-143.
[5]
S. García, J. Derrac, J.R. Cano, F. Herrera, Prototype selection for nearest neighbor classification, IEEE Trans. Pattern Anal. Mach. Intell., 34 (2012) 417-435.
[6]
I. Triguero, J. Derrac, S. García, F. Herrera, A taxonomy and experimental study on prototype generation for nearest neighbor classification, IEEE Trans. Syst., Man, Cybern., Part C, 42 (2012) 86-100.
[7]
J. Derrac, S. García, F. Herrera, Ifs-coco, Pattern Recognit., 43 (2010) 2082-2105.
[8]
N. García-Pedrajas, A. de Haro-García, J. P¿erez-Rodríguez, A scalable approach to simultaneous evolutionary instance and feature selection, Inf. Sci., 228 (2013) 150-174.
[9]
H. Brighton, C. Mellish, Advances in instance selection for instance-based learning algorithms, Data Min. Knowl. Discov., 6 (2002) 153-172.
[10]
Y. Caises, A. González, E. Leyva, R. Pérez, Combining instance selection methods based on data characterization, Inf. Sci., 181 (2011) 4780-4798.
[11]
E. Leyva, A. González, R. Pérez, Knowledge-based instance selection, Knowl.-Based Syst., 47 (2013) 65-76.
[12]
E. Leyva, Y. Caises, A. González, R. Pérez, On the use of meta-learning for instance selection, Inf. Sci., 266 (2014) 16-30.
[13]
M. Antonelli, P. Ducange, F. Marcelloni, Genetic training instance selection in multiobjective evolutionary fuzzy systems, IEEE Trans. Fuzzy Syst., 20 (2012) 276-290.
[14]
J. Sun, G. Hong, Y. Wong, M. Rahman, Z. Wang, Effective training data selection in tool condition monitoring system, Int. J. Mach. Tools Manuf., 46 (2006) 218-224.
[15]
S. García, J. Cano, E. Bernardó-Mansilla, F. Herrera, Diagnose effective evolutionary prototype selection using an overlapping measure, Int. J. Pattern Recognit. Artif. Intell., 23 (2009) 1527-1548.
[16]
J.R. Cano, F. Herrera, M. Lozano, Evolutionary stratified training set selection for extracting classification rules with trade off precision-interpretability, Data Knowl. Eng., 60 (2007) 90-108.
[17]
Y. Ou, G. Chen, Y. Oyang, Expediting model selection for support vector machines based on an advanced data reduction algorithm, in: Q. Yang, G. Webb (Eds.), PRICAI 2006: Trends in Artificial Intelligence, Lecture Notes in Computer Science, vol. 4099, Springer, Berlin, Heidelberg, 2006, pp. 1017-1021.
[18]
J.R. Cano, F. Herrera, M. Lozano, Using evolutionary algorithms as instance selection for data reduction in kdd, IEEE Trans. Evol. Comput., 7 (2003) 561-575.
[19]
M. Fazzolari, B. Giglio, R. Alcalá, F. Marcelloni, F. Herrera, A study on the application of instance selection techniques in genetic fuzzy rule-based classification systems: accuracy-complexity trade-off, in: Knowledge-Based Systems.
[20]
M. Grochowski, N. Jankowski, Springer, Heidelberg, 2004.
[21]
K. Grudzinski, M. Grochowski, W. Duch, Pruning classification rules with reference vector selection methods, in: Artificial Intelligence and Soft Computing, Lecture Notes in Computer Science, vol. 6113, Springer, Berlin, Heidelberg, 2010, pp. 347-354.
[22]
P.E. Hart, The condensed nearest neighbor rule, IEEE Trans. Inf. Theory, 14 (1968) 515-516.
[23]
G.W. Gates, The reduced nearest neighbor rule, IEEE Trans. Inf. Theory, 18 (1972) 431-433.
[24]
G.L. Ritter, H.B. Woodruff, S.R. Lowry, T.L. Isenhour, An algorithm for a selective nearest neighbor decision rule, IEEE Trans. Inf. Theory, 21 (1975) 665-669.
[25]
B.V. Dasarathy, Minimal consistent set (mcs) identification for optimal nearest neighbor decision systems design, IEEE Trans. Syst., Man Cybern., 24 (1994) 511-517.
[26]
F. Angiulli, Fast nearest neighbor condensation for large data sets classification, IEEE Trans. Knowl. Data Eng., 19 (2007) 1450-1464.
[27]
J.A. Olvera-López, J.A. Carrasco-Ochoa, J.F. Martínez-Trinidad, A new fast prototype selection method based on clustering, Pattern Anal. Appl., 13 (2010) 131-141.
[28]
D.L. Wilson, Asymptotic properties of nearest neighbor rules using edited data, IEEE Trans. Syst., Man, Cybern., 2 (1972) 408-421.
[29]
I. Tomek, An experiment with the edited nearest-neighbor rule, IEEE Trans. Syst., Man Cybern., 6 (1976) 448-452.
[30]
D. Guan, W. Yuan, Y.K. Lee, S. Lee, Nearest neighbor editing aided by unlabeled data, Inf. Sci., 179 (2009) 2273-2282.
[31]
D.R. Wilson, T.R. Martinez, Reduction techniques for instance-based learning algorithms, Mach. Learn., 38 (2000) 257-286.
[32]
J. Sánchez, R. Barandela, A. Marqués, R. Alejo, J. Badenas, Analysis of new techniques to obtain quality training sets, Pattern Recognit. Lett., 24 (2003) 1015-1022.
[33]
N. Jankowski, M. Grochowski, Springer, Heidelberg, 2004.
[34]
F. Vázquez, J.S. Sánchez, F. Pla, Springer-Verlag, Berlin, Heidelberg, 2005.
[35]
D.W. Aha, D. Kibler, M.K. Albert, Instance-based learning algorithms, Mach. Learn., 6 (1991) 37-66.
[36]
K. Zhao, S. Zhou, J. Guan, A. Zhou, C-pruner: an improved instance pruning algorithm, in: 2003 International Conference on Machine Learning and Cybernetics, vol. 1, 2003, pp. 94-99.
[37]
E. Marchiori, Class conditional nearest neighbor for large margin instance selection, IEEE Trans. Pattern Anal. Mach. Intell., 32 (2010) 364-370.
[38]
J. Derrac, S. García, F. Herrera, A survey on evolutionary instance selection and generation, Int. J. Appl. Metaheurist. Comput., 1 (2010) 60-92.
[39]
S. Kim, B. Oommen, Enhancing prototype reduction schemes with recursion, IEEE Trans. Syst., Man Cybern.-Part B, 34 (2004) 1384-1397.
[40]
J.R. Cano, F. Herrera, M. Lozano, Stratification for scaling up evolutionary prototype selection, Pattern Recognit. Lett., 26 (2005) 953-963.
[41]
C. García-Osorio, A. Haro-García, N. García-Pedrajas, Democratic instance selection, Artif. Intell., 174 (2010) 410-441.
[42]
E. Marchiori, Hit miss networks with applications to instance selection, J. Mach. Learn. Res., 9 (2008) 997-1017.
[43]
S. Kim, B. Oommen, Enhancing prototype reduction schemes with lvq3-type algorithms, Pattern Recognit., 36 (2003) 1083-1093.
[44]
J. Alcalá-Fdez, A. Fernández, J. Luengo, J. Derrac, S. García, Keel data-mining software tool, Multiple-Valued Log. Soft Comput., 17 (2011) 255-287.
[45]
X.Z. Wang, B. Wu, Y.L. He, X.H. Pei, Nrmcs : noise removing based on the mcs, in: Proceedings of the Seventh International Conference on Machine Learning and Cybernetics, vol. 1, Kunming, China, 2008, pp. 89-93.
[46]
J. Demsar, Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res., 7 (2006) 1-30.
[47]
C.L. Hwang, K. Yoon, Multiple Attribute Decision Making: Methods and Applications: A State-of-the-Art Survey, Springer-Verlag, Berlin, New York, 1981.
[48]
L. Nanni, A. Lumini, Prototype reduction techniques, Expert Syst. Appl., 38 (2011) 11820-11828.
[49]
N. Verbiest, C. Cornelis, F. Herrera, Frps, Pattern Recognit., 46 (2013) 2770-2782.

Cited By

View all
  • (2024)A selective LVQ algorithm for improving instance reduction techniques and its application for text classificationJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23529046:5-6(11353-11366)Online publication date: 24-Oct-2024
  • (2024)A Quantum Annealing Instance Selection Approach for Efficient and Effective Transformer Fine-TuningProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672515(205-214)Online publication date: 2-Aug-2024
  • (2024)Trusting My Predictions: On the Value of Instance-Level AnalysisACM Computing Surveys10.1145/361535456:7(1-28)Online publication date: 9-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Pattern Recognition
Pattern Recognition  Volume 48, Issue 4
April 2015
550 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 April 2015

Author Tags

  1. Data reduction
  2. Instance selection
  3. Instance-based learning
  4. Local sets
  5. Prototype-based classifiers

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A selective LVQ algorithm for improving instance reduction techniques and its application for text classificationJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23529046:5-6(11353-11366)Online publication date: 24-Oct-2024
  • (2024)A Quantum Annealing Instance Selection Approach for Efficient and Effective Transformer Fine-TuningProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672515(205-214)Online publication date: 2-Aug-2024
  • (2024)Trusting My Predictions: On the Value of Instance-Level AnalysisACM Computing Surveys10.1145/361535456:7(1-28)Online publication date: 9-Apr-2024
  • (2024)A heuristic hybrid instance reduction approach based on adaptive relative distance and k-means clusteringThe Journal of Supercomputing10.1007/s11227-023-05885-x80:9(13096-13123)Online publication date: 1-Jun-2024
  • (2024)Entropy based streaming big-data reduction with adjustable compression ratioMultimedia Tools and Applications10.1007/s11042-023-15897-783:1(2647-2681)Online publication date: 1-Jan-2024
  • (2023)A prototype selection technique based on relative density and density peaks clustering for k nearest neighbor classificationIntelligent Data Analysis10.3233/IDA-22673027:3(675-690)Online publication date: 1-Jan-2023
  • (2023)Investigating the Performance of Data Complexity & Instance Hardness Measures as A Meta-Feature in Overlapping Classes ProblemProceedings of the 2023 7th International Conference on Cloud and Big Data Computing10.1145/3616131.3616132(1-9)Online publication date: 17-Aug-2023
  • (2023)Condensed Nearest Neighbour Rules for Multi-Label DatasetsProceedings of the 27th International Database Engineered Applications Symposium10.1145/3589462.3589492(43-50)Online publication date: 5-May-2023
  • (2023)A Comparative Survey of Instance Selection Methods applied to Non-Neural and Transformer-Based Text ClassificationACM Computing Surveys10.1145/358200055:13s(1-52)Online publication date: 13-Jul-2023
  • (2023)AutoDenoise: Automatic Data Instance Denoising for RecommendationsProceedings of the ACM Web Conference 202310.1145/3543507.3583339(1003-1011)Online publication date: 30-Apr-2023
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media