[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3589462.3589492acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

Condensed Nearest Neighbour Rules for Multi-Label Datasets

Published: 26 May 2023 Publication History

Abstract

Reducing the size of the training set, that is, replacing it with a condensing set, while maintaining the classification accuracy as much as possible is a very common practice to speed up instance-based classifiers. Data reduction techniques, also known as prototype selection or generation algorithms, can be used to accomplish this. There are numerous such algorithms that can be found in the literature that are effective for single-label classification problems, but the majority of them cannot be used for multi-label data where an instance may belong to multiple classes. Due to the numerous binary condensing sets it creates, the well-known Binary Relevance transformation method cannot be combined with a Data Reduction algorithm. Condensed Nearest Neighbor is a well-known parameter-free single-label prototype selection algorithm. This study proposes three variations of that algorithm for training datasets with multiple labels. An experimental study that we conducted over nine distinct datasets shows that our three proposed approaches provide good reduction rates while not tampering with the classification rates.

References

[1]
Henry Brighton and Chris Mellish. 2002. Advances in Instance Selection for Instance-Based Learning Algorithms. Data Mining and Knowledge Discovery 6, 2 (2002), 153–172. https://doi.org/10.1023/a:1014043630878
[2]
Adam Byerly and Tatiana Kalganova. 2022. Class Density and Dataset Quality in High-Dimensional, Unstructured Data. https://doi.org/10.48550/arxiv.2202.03856
[3]
Francisco Charte, Antonio J. Rivera, María J. del Jesus, and Francisco Herrera. 2014. MLeNN: A First Approach to Heuristic Multilabel Undersampling. In Intelligent Data Engineering and Automated Learning – IDEAL 2014. Springer International Publishing, New York, NY, USA, 1–9. https://doi.org/10.1007/978-3-319-10840-7_1
[4]
Salvador Garcia, Joaquin Derrac, Jose Cano, and Francisco Herrera. 2012. Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study. IEEE Trans. Pattern Anal. Mach. Intell. 34, 3 (March 2012), 417–435. https://doi.org/10.1109/TPAMI.2011.142
[5]
Sawsan Kanj, Fahed Abdallah, Thierry Denœux, and Kifah Tout. 2015. Editing training data for multi-label classification with the k-nearest neighbor rule. Pattern Analysis and Applications 19, 1 (Feb 2015), 145–161. https://doi.org/10.1007/s10044-015-0452-8
[6]
Enrique Leyva, Antonio González, and Raúl Pérez. 2015. Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective. Pattern Recognition 48, 4 (2015), 1523–1537. https://doi.org/10.1016/j.patcog.2014.10.001
[7]
Huan Liu and Hiroshi Motoda. 1998. Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, USA.
[8]
Stefanos Ougiaroglou, Panagiotis Filippakis, Georgia Fotiadou, and Georgios Evangelidis. 2023. Data reduction via multi-label prototype generation. Neurocomputing 526 (2023), 1–8. https://doi.org/10.1016/j.neucom.2023.01.004
[9]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12, 85 (2011), 2825–2830. http://jmlr.org/papers/v12/pedregosa11a.html
[10]
P.E.Hart. 1967. The condensed nearest neighbor rule. IEEE Transactions on Information Theory vol.18 (Jan. 1967), pp 515–516.
[11]
Konstantinos Sechidis, Grigorios Tsoumakas, and Ioannis Vlahavas. 2011. On the Stratification of Multi-label Data. In Machine Learning and Knowledge Discovery in Databases, Dimitrios Gunopulos, Thomas Hofmann, Donato Malerba, and Michalis Vazirgiannis (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 145–158. https://doi.org/10.1007/978-3-642-23808-6_10
[12]
E. Spyromitros, G. Tsoumakas, and Ioannis Vlahavas. 2008. An Empirical Study of Lazy Multilabel Classification Algorithms. In Artificial Intelligence: Theories, Models and Applications, John Darzentas, George A. Vouros, Spyros Vosinakis, and Argyris Arnellos (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 401–406. https://doi.org/10.1007/978-3-540-87881-0_40
[13]
Isaac Triguero, Joaquín Derrac, Salvador Garcia, and Francisco Herrera. 2012. A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification. Trans. Sys. Man Cyber Part C 42, 1 (Jan. 2012), 86–100. https://doi.org/10.1109/TSMCC.2010.2103939
[14]
Grigorios Tsoumakas and Ioannis Katakis. 2007. Multi-label classification: An overview. International Journal of Data Warehousing and Mining 3, 3 (1 Jan. 2007), 1–13. https://doi.org/10.4018/jdwm.2007070101
[15]
Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2010. Mining Multi-label Data. Springer US, Boston, MA, 667–685. https://doi.org/10.1007/978-0-387-09823-4_34
[16]
Dennis L. Wilson. 1972. Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Transactions on Systems, Man, and Cybernetics SMC-2, 3 (July 1972), 408–421. https://doi.org/10.1109/tsmc.1972.4309137
[17]
Álvar Arnaiz-González, José-Francisco Díez-Pastor, Juan J Rodríguez, and César García-Osorio. 2018. Local sets for multi-label instance selection. Applied Soft Computing 68 (2018), 651–666. https://doi.org/10.1016/j.asoc.2018.04.016

Cited By

View all
  • (2025)Enhancing dementia prediction models: Leveraging temporal patterns and class-balancing methodsApplied Soft Computing10.1016/j.asoc.2025.112754171(112754)Online publication date: Mar-2025
  • (2024)A Novel Methodology to Warn Pre-icing Events for Wind Turbines2024 IEEE 2nd International Conference on Power Science and Technology (ICPST)10.1109/ICPST61417.2024.10601749(77-82)Online publication date: 9-May-2024
  • (2023)Prototype Selection for Multilabel Instance-Based LearningInformation10.3390/info1410057214:10(572)Online publication date: 19-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
IDEAS '23: Proceedings of the 27th International Database Engineered Applications Symposium
May 2023
222 pages
ISBN:9798400707445
DOI:10.1145/3589462
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 May 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. BRkNN
  2. CNN
  3. binary relevance
  4. data reduction techniques
  5. instance reduction
  6. instance-based classification
  7. multi-label classification
  8. prototype selection

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

IDEAS '23

Acceptance Rates

Overall Acceptance Rate 74 of 210 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Enhancing dementia prediction models: Leveraging temporal patterns and class-balancing methodsApplied Soft Computing10.1016/j.asoc.2025.112754171(112754)Online publication date: Mar-2025
  • (2024)A Novel Methodology to Warn Pre-icing Events for Wind Turbines2024 IEEE 2nd International Conference on Power Science and Technology (ICPST)10.1109/ICPST61417.2024.10601749(77-82)Online publication date: 9-May-2024
  • (2023)Prototype Selection for Multilabel Instance-Based LearningInformation10.3390/info1410057214:10(572)Online publication date: 19-Oct-2023
  • (2023)Self-optimised cost-sensitive classifiers for early field failure prediction in storage systemsSwarm and Evolutionary Computation10.1016/j.swevo.2023.10138883(101388)Online publication date: Dec-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media