[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1007/978-3-031-30105-6_42guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A Fast and Efficient Algorithm for Filtering the Training Dataset

Published: 13 April 2023 Publication History

Abstract

The goal of this paper is to present a new algorithm that filters out inconsistent instances from the training dataset for further usage with machine learning algorithms or learning of neural networks. The idea of this algorithm is based on the previous state-of-the-art algorithm, which uses the concept of local sets. Sophisticated modification of the definition of local sets changes the merits of the algorithm. It is additionally supported by locality-sensitive hashing used for searching for nearest neighbors, composing a new efficient (O(nlogn)), and an accurate algorithm.
Results prepared on many benchmarks show that the algorithm is as accurate as previous but strongly reduces the time complexity.

References

[1]
Arnaiz-González, A., Díez-Pastor, J.-F., Rodríguez, J.J., García-Osorio, C.: Instance selection of linear complexity for big data: Knowl.-Based Syst. 107, 83–95 (2016)
[2]
Bawa, M., Condie, T., Ganesan, P.: LSH forest: self-tuning indexes for similarity search. In: Proceedings of the 14th International Conference on World Wide Web, pp. 651–660. Chiba, Japan (2005)
[3]
Brighton H and Mellish C Advances in instance selection for instance-based learning algorithms Data Min. Knowl. Disc. 2002 6 2 153-172
[4]
Cover TM and Hart PE Nearest neighbor pattern classification Instit. Electr. Electron. Eng. Trans. Inf. Theory 1967 13 1 21-27
[5]
Garcia S, Derrac J, Cano J, and Herrera F Prototype selection for nearest neighbor classification: taxonomy and empirical study IEEE Trans. Pattern Anal. Mach. Intell. 2012 34 3 417-435
[6]
Har-Peled S, Indyk P, and Motwani R Approximate nearest neighbor: towards removing the curse of dimensionality Theory Comput. 2012 8 321-350
[7]
Indyk, P., Motwani, R.: Approximate nearest neighbor—towards removing the curse of dimensionality. In: The Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613 (1998)
[8]
Grochowski M and Jankowski N Rutkowski L, Siekmann JH, Tadeusiewicz R, and Zadeh LA Comparison of instance selection algorithms II. results and comments Artificial Intelligence and Soft Computing - ICAISC 2004 2004 Heidelberg Springer 580-585
[9]
Jankowski, N., Orliński, M.: Fast encoding length-based prototype selection algorithms. Australian J. Intell. Inf. Process. Syst. 16(3), 59–66 (2019). Special Issue: Neural Information Processing 26th International Conference on Neural Information Processing. http://ajiips.com.au/iconip2019/docs/ajiips/v16n3.pdf
[10]
Leyva E, González A, and Pérez R Three new instance selection methods based on local sets: a comparative study with several approaches from a bi-objective perspective Pattern Recogn. 2015 48 4 1523-1537
[11]
Merz, C.J., Murphy, P.M.: UCI repository of machine learning databases (1998). http://www.ics.uci.edu/~mlearn/MLRepository.html
[12]
Olvera-López JA, Carrasco-Ochoa JA, and Martínez-Trinidad JF A new fast prototype selection method based on clustering Pattern Anal. Appl. 2009 13 2 131-141
[13]
Orliński M and Jankowski N Fast t-SNE algorithm with forest of balanced LSH trees and hybrid computation of repulsive forces Knowl.-Based Syst. 2020 206 1-16
[14]
Orliński, M., Jankowski, N.: O(mlogm) instance selection algorithms–RR-DROPs. In: IEEE World Congress on Computational Intelligence, pp. 1–8. IEEE Press (2020). http://www.is.umk.pl/~norbert/publications/20-FastDROP.pdf
[15]
Wilson D Asymptotic properties of nearest neighbor rules using edited data IEEE Trans. Syst. Man Cybern. 1972 2 3 408-421

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
Neural Information Processing: 29th International Conference, ICONIP 2022, Virtual Event, November 22–26, 2022, Proceedings, Part I
Nov 2022
659 pages
ISBN:978-3-031-30104-9
DOI:10.1007/978-3-031-30105-6
  • Editors:
  • Mohammad Tanveer,
  • Sonali Agarwal,
  • Seiichi Ozawa,
  • Asif Ekbal,
  • Adam Jatowt

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 13 April 2023

Author Tags

  1. instance selection
  2. prototype selection
  3. classification
  4. machine learning

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media