More Web Proxy on the site http://driver.im/

research-article

Condensed Nearest Neighbour Rules for Multi-Label Datasets

Authors:

Panagiotis Filippakis,

Stefanos Ougiaroglou,

Georgios EvangelidisAuthors Info & Claims

IDEAS '23: Proceedings of the 27th International Database Engineered Applications Symposium

Pages 43 - 50

https://doi.org/10.1145/3589462.3589492

Published: 26 May 2023 Publication History

Abstract

Reducing the size of the training set, that is, replacing it with a condensing set, while maintaining the classification accuracy as much as possible is a very common practice to speed up instance-based classifiers. Data reduction techniques, also known as prototype selection or generation algorithms, can be used to accomplish this. There are numerous such algorithms that can be found in the literature that are effective for single-label classification problems, but the majority of them cannot be used for multi-label data where an instance may belong to multiple classes. Due to the numerous binary condensing sets it creates, the well-known Binary Relevance transformation method cannot be combined with a Data Reduction algorithm. Condensed Nearest Neighbor is a well-known parameter-free single-label prototype selection algorithm. This study proposes three variations of that algorithm for training datasets with multiple labels. An experimental study that we conducted over nine distinct datasets shows that our three proposed approaches provide good reduction rates while not tampering with the classification rates.

References

[1]

Henry Brighton and Chris Mellish. 2002. Advances in Instance Selection for Instance-Based Learning Algorithms. Data Mining and Knowledge Discovery 6, 2 (2002), 153–172. https://doi.org/10.1023/a:1014043630878

Digital Library

[2]

Adam Byerly and Tatiana Kalganova. 2022. Class Density and Dataset Quality in High-Dimensional, Unstructured Data. https://doi.org/10.48550/arxiv.2202.03856

[3]

Francisco Charte, Antonio J. Rivera, María J. del Jesus, and Francisco Herrera. 2014. MLeNN: A First Approach to Heuristic Multilabel Undersampling. In Intelligent Data Engineering and Automated Learning – IDEAL 2014. Springer International Publishing, New York, NY, USA, 1–9. https://doi.org/10.1007/978-3-319-10840-7_1

[4]

Salvador Garcia, Joaquin Derrac, Jose Cano, and Francisco Herrera. 2012. Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study. IEEE Trans. Pattern Anal. Mach. Intell. 34, 3 (March 2012), 417–435. https://doi.org/10.1109/TPAMI.2011.142

Digital Library

[5]

Sawsan Kanj, Fahed Abdallah, Thierry Denœux, and Kifah Tout. 2015. Editing training data for multi-label classification with the k-nearest neighbor rule. Pattern Analysis and Applications 19, 1 (Feb 2015), 145–161. https://doi.org/10.1007/s10044-015-0452-8

Digital Library

[6]

Enrique Leyva, Antonio González, and Raúl Pérez. 2015. Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective. Pattern Recognition 48, 4 (2015), 1523–1537. https://doi.org/10.1016/j.patcog.2014.10.001

Digital Library

[7]

Huan Liu and Hiroshi Motoda. 1998. Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, USA.

[8]

Stefanos Ougiaroglou, Panagiotis Filippakis, Georgia Fotiadou, and Georgios Evangelidis. 2023. Data reduction via multi-label prototype generation. Neurocomputing 526 (2023), 1–8. https://doi.org/10.1016/j.neucom.2023.01.004

Digital Library

[9]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12, 85 (2011), 2825–2830. http://jmlr.org/papers/v12/pedregosa11a.html

Digital Library

[10]

P.E.Hart. 1967. The condensed nearest neighbor rule. IEEE Transactions on Information Theory vol.18 (Jan. 1967), pp 515–516.

[11]

Konstantinos Sechidis, Grigorios Tsoumakas, and Ioannis Vlahavas. 2011. On the Stratification of Multi-label Data. In Machine Learning and Knowledge Discovery in Databases, Dimitrios Gunopulos, Thomas Hofmann, Donato Malerba, and Michalis Vazirgiannis (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 145–158. https://doi.org/10.1007/978-3-642-23808-6_10

Digital Library

[12]

E. Spyromitros, G. Tsoumakas, and Ioannis Vlahavas. 2008. An Empirical Study of Lazy Multilabel Classification Algorithms. In Artificial Intelligence: Theories, Models and Applications, John Darzentas, George A. Vouros, Spyros Vosinakis, and Argyris Arnellos (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 401–406. https://doi.org/10.1007/978-3-540-87881-0_40

Digital Library

[13]

Isaac Triguero, Joaquín Derrac, Salvador Garcia, and Francisco Herrera. 2012. A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification. Trans. Sys. Man Cyber Part C 42, 1 (Jan. 2012), 86–100. https://doi.org/10.1109/TSMCC.2010.2103939

Digital Library

[14]

Grigorios Tsoumakas and Ioannis Katakis. 2007. Multi-label classification: An overview. International Journal of Data Warehousing and Mining 3, 3 (1 Jan. 2007), 1–13. https://doi.org/10.4018/jdwm.2007070101

[15]

Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2010. Mining Multi-label Data. Springer US, Boston, MA, 667–685. https://doi.org/10.1007/978-0-387-09823-4_34

[16]

Dennis L. Wilson. 1972. Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Transactions on Systems, Man, and Cybernetics SMC-2, 3 (July 1972), 408–421. https://doi.org/10.1109/tsmc.1972.4309137

[17]

Álvar Arnaiz-González, José-Francisco Díez-Pastor, Juan J Rodríguez, and César García-Osorio. 2018. Local sets for multi-label instance selection. Applied Soft Computing 68 (2018), 651–666. https://doi.org/10.1016/j.asoc.2018.04.016

Digital Library

Cited By

Seixas FSeixas EFreitas A(2025)Enhancing dementia prediction models: Leveraging temporal patterns and class-balancing methodsApplied Soft Computing10.1016/j.asoc.2025.112754171(112754)Online publication date: Mar-2025
https://doi.org/10.1016/j.asoc.2025.112754
Yang YLyu YLi YFang LLuo YLiu W(2024)A Novel Methodology to Warn Pre-icing Events for Wind Turbines2024 IEEE 2nd International Conference on Power Science and Technology (ICPST)10.1109/ICPST61417.2024.10601749(77-82)Online publication date: 9-May-2024
https://doi.org/10.1109/ICPST61417.2024.10601749
Filippakis POugiaroglou SEvangelidis G(2023)Prototype Selection for Multilabel Instance-Based LearningInformation10.3390/info1410057214:10(572)Online publication date: 19-Oct-2023
https://doi.org/10.3390/info14100572
Show More Cited By

Index Terms

Condensed Nearest Neighbour Rules for Multi-Label Datasets
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Instance-based learning
2. Information systems
  1. Information systems applications
    1. Data mining
      1. Nearest-neighbor search

Recommendations

Data reduction via multi-label prototype generation
Abstract
A very common practice to speed up instance based classifiers is to reduce the size of their training set, that is, replace it by a condensing set, hoping that their accuracy will not worsen. This can be achieved by applying a ...
Incorporating label dependency into the binary relevance framework for multi-label classification

In multi-label classification, examples can be associated with multiple labels simultaneously. The task of learning from multi-label data can be addressed by methods that transform the multi-label classification problem into several single-label ...
Study of data transformation techniques for adapting single-label prototype selection algorithms to multi-label learning
Highlights
- Data transformation methods for instance selection (IS) in multi-label problems is investigated.
Abstract
In this paper, the focus is on the application of prototype selection to multi-label data sets as a preliminary stage in the learning process. There are two general strategies when designing Machine Learning algorithms that are capable ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

IDEAS '23: Proceedings of the 27th International Database Engineered Applications Symposium

May 2023

222 pages

ISBN:9798400707445

DOI:10.1145/3589462

Editors:
Richard Chbeir
University Pau & Pays Adour, France
,
Mirjana Ivanovic
University of Novi Sad, Serbia
,
Yannis Manolopoulos
Open University of Cyprus, Cyprus
,
Peter Z. Revesz
University of Nebraska Lincoln, USA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 May 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

IDEAS '23

IDEAS '23: International Database Engineered Applications Symposium Conference

May 5 - 7, 2023

Heraklion, Crete, Greece

Acceptance Rates

Overall Acceptance Rate 74 of 210 submissions, 35%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
81
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Seixas FSeixas EFreitas A(2025)Enhancing dementia prediction models: Leveraging temporal patterns and class-balancing methodsApplied Soft Computing10.1016/j.asoc.2025.112754171(112754)Online publication date: Mar-2025
https://doi.org/10.1016/j.asoc.2025.112754
Yang YLyu YLi YFang LLuo YLiu W(2024)A Novel Methodology to Warn Pre-icing Events for Wind Turbines2024 IEEE 2nd International Conference on Power Science and Technology (ICPST)10.1109/ICPST61417.2024.10601749(77-82)Online publication date: 9-May-2024
https://doi.org/10.1109/ICPST61417.2024.10601749
Filippakis POugiaroglou SEvangelidis G(2023)Prototype Selection for Multilabel Instance-Based LearningInformation10.3390/info1410057214:10(572)Online publication date: 19-Oct-2023
https://doi.org/10.3390/info14100572
Bader-El-Den MPerry T(2023)Self-optimised cost-sensitive classifiers for early field failure prediction in storage systemsSwarm and Evolutionary Computation10.1016/j.swevo.2023.10138883(101388)Online publication date: Dec-2023
https://doi.org/10.1016/j.swevo.2023.101388

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten