[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Multi-label classification via closed frequent labelsets and label taxonomies

Published: 14 April 2023 Publication History

Abstract

Multi-label classification (MLC) is a very explored field in recent years. The most common approaches that deal with MLC problems are classified into two groups: (i) problem transformation which aims to adapt the multi-label data, making the use of traditional binary or multiclass classification algorithms feasible, and (ii) algorithm adaptation which focuses on modifying algorithms used into binary or multiclass classification, enabling them to make multi-label predictions. Several approaches have been proposed aiming to explore the relationships among the labels, with some of them through the transformation of a flat multi-label label space into a hierarchical multi-label label space, creating a tree-structured label taxonomy and inducing a hierarchical multi-label classifier to solve the classification problem. This paper presents a novel method in which a label hierarchy structured as a directed acyclic graph (DAG) is created from the multi-label label space, taking into account the label co-occurrences using the notion of closed frequent labelset. With this, it is possible to solve an MLC task as if it was a hierarchical multi-label classification (HMC) task. Global and local HMC approaches were tested with the obtained label hierarchies and compared with the approaches using tree-structured label hierarchies showing very competitive results. The main advantage of the proposed approach is better exploration and representation of the relationships between labels through the use of DAG-structured taxonomies, improving the results. Experimental results over 32 multi-label datasets from different domains showed that the proposed approach is better than related approaches in most of the multi-label evaluation measures and very competitive when compared with the state-of-the-art approaches. Moreover, we found that both tree and in specially DAG-structured label hierarchies combined with a local hierarchical classifier are more suitable to deal with imbalanced multi-label datasets.

References

[1]
Blockeel H, Raedt LD, Ramon J (1998) Top-down induction of clustering trees. In: proceedings of the fifteenth international conference on machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, ICML ’98, p 55-63
[2]
Bogatinovski J, Todorovski L, Džeroski S, et al. Comprehensive comparative study of multi-label classification methods Expert Syst Appl 2022 203 117 215
[3]
Boutell M, Luo J, Shen X, et al. Learning multi-label scene classification Pattern Recognit 2004 37 9 1757-1771
[4]
Boutell MR, Luo J, Shen X, et al. Learning multi-label scene classification Pattern Recognit 2004 37 9 1757-1771
[5]
Briggs F, Lakshminarayanan B, Neal L, et al. Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach J Acoust Soc Am 2012 131 6 4640-4650
[6]
Charte F, Rivera A, del Jesus MJ, et al., et al. Pan JS, Polycarpou MM, Woźniak M, et al., et al. A first approach to deal with imbalance in multi-label datasets Hybrid artificial intelligent systems 2013 Berlin Heidelberg Springer 150-160
[7]
Charte F, Rivera AJ, del Jesus MJ, et al (2015) Quinta: A question tagging assistant to improve the answering ratio in electronic forums. In: EUROCON 2015-international conference on computer as a tool (EUROCON), IEEE, pp 1–6
[8]
Charte F, Rivera AJ, Charte D, et al. Tips, guidelines and tools for managing multi-label datasets: The mldr. datasets R package and the Cometa data repository Neurocomputing 2018
[9]
Cheng W and Hüllermeier E Combining instance-based learning and logistic regression for multilabel classification Mach Learn 2009 76 2–3 211-225
[10]
Clare A and King RD Knowledge discovery in multi-label phenotype data Principles of data mining and knowledge discovery 2001 Berlin Heidelberg Springer
[11]
Crammer K, Dredze M, Ganchev K, et al (2007) Automatic code assignment to medical text. In: Proc. workshop on biological, translational, and clinical language processing, Prague, Czech Republic, BioNLP07, pp 129–136
[12]
Demšar J Statistical comparisons of classifiers over multiple data sets J Mach Learn Res 2006 7 1-30
[13]
Diplaris S, Tsoumakas G, Mitkas P, et al (2005) Protein classification with multiple algorithms. In: procedings of 10th Panhellenic conference on informatics, Volos, Greece, PCI05, pp 448–456
[14]
Duygulu P, Barnard K, de Freitas J, et al (2002) Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Computer Vision ECCV 2002, LNCS, vol 2353. p 97–112
[15]
Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. In: advances in neural information processing systems, pp 681–687
[16]
Fan R, Lin C (2007) A study on threshold selection for multi-label classification. Department of Computer Science, National Taiwan University pp 1–23. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.66.1611 &rep=rep1 &type=pdf
[17]
Fürnkranz J, Hüllermeier E, Loza Mencía E, et al. Multilabel classification via calibrated label ranking Mach Learn 2008 73 2 133-153
[18]
Ganter B (1984) Two basic algorithms in concept analysis. FB4–Preprint 831, TH Darmstadt
[19]
Gibaja E and Ventura S Multi-label learning: a review of the state of the art and ongoing research Wiley Interdiscip Rev Data Mining Knowl Discov 2014 4 6 411-444
[20]
Goncalves EC, Plastino A, Freitas AA (2013) A genetic algorithm for optimizing the label ordering in multi-label classifier chains. In: tools with artificial intelligence (ICTAI), 2013 IEEE 25th international conference on, pp 469–476
[21]
Huynh-Thu VA, Irrthum A, Wehenkel L, et al. Inferring regulatory networks from expression data using tree-based methods PLOS ONE 2010 5 9 1-10
[22]
Ioannou M, Sakkas G, Tsoumakas G, et al (2010) Obtaining Bipartitions from Score Vectors for Multi-Label Classification. In: 2010 22nd IEEE international conference on tools with artificial intelligence, vol 1. IEEE, pp 409–416, 10.1109/ICTAI.2010.65,
[23]
Joachims T (1998) Text categorization with suport vector machines: Learning with many relevant features. In: proceedings of 10th european conference on machine learning, pp 137–142
[24]
Katakis I, Tsoumakas G, Vlahavas I (2008) Multilabel text classification for automated tag suggestion. In: proceedings of ECML PKDD08 discovery challenge, Antwerp, Belgium, pp 75–83
[25]
Klimt B, Yang Y (2004) The enron corpus: a new dataset for email classification research. In: proceedings of ECML04, Pisa, Italy. p 217–226
[26]
Kocev D, Vens C, Struyf J, et al. Tree ensembles for predicting structured outputs Pattern Recognit 2013 46 3 817-833
[27]
Krajca P, Vychodil V (2009) Distributed algorithm for computing formal concepts using map-reduce framework. In: Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII. Springer-Verlag, Berlin, Heidelberg, IDA ’09, pp 333–344
[28]
Lang K (1995) Newsweeder: learning to filter netnews. In: proceedings of 12th international conference on machine learning, pp 331–339
[29]
Madjarov G, Kocev D, Gjorgjevikj D, et al. An extensive experimental comparison of methods for multi-label learning Pattern Recognit 2012 45 9 3084-3104
[30]
Madjarov G, Dimitrovski I, Gjorgjevikj D et al (2015) Evaluation of different data-derived label hierarchies in multi-label classification. Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) 8983:19–37
[31]
Madjarov G, Gjorgjevikj D, Dimitrovski I, et al. The use of data-derived label hierarchies in multi-label classification J Intell Inf Syst 2016 47 1 57-90
[32]
Madjarov G, Vidulin V, Dimitrovski I, et al. Web genre classification with methods for structured output prediction Inf Sci 2019 503 551-573
[33]
Nikoloski S, Kocev D, Dzeroski S (2017) Structuring the output space in multi-label classification by using feature ranking. In: Appice A, Loglisci C, Manco G, et al (eds) New Frontiers in Mining Complex Patterns - 6th International Workshop, NFMCP 2017, Held in Conjunction with ECML-PKDD 2017, Skopje, Macedonia, September 18-22, 2017, Revised Selected Papers, Lecture Notes in Computer Science, vol 10785. Springer, pp 151–166, 10.1007/978-3-319-78680-3_11,
[34]
Nourine L and Raynaud O A fast algorithm for building lattices Inf Process Lett 1999
[35]
Papanikolaou Y, Tsoumakas G, and Katakis I Hierarchical partitioning of the output space in multi-label data Data Knowl Eng 2018 116 42-60 arXiv:1612.06083
[36]
Pasquier N, Bastide Y, Taouil R, et al (1998) Pruning closed itemset lattices for association rules. Actes de la conférence BDA sur les Bases de Données Avancées (October):177–196. http://www.informatik.uni-trier.de/~ley/db/conf/bda/bda98.html
[37]
Pasquier N, Bastide Y, Taouil R, et al. Efficient mining of association rules using closed itemset lattices Inf Syst 1999 24 1 25-46
[38]
Read J (2010) Scalable multi-label classification. PhD thesis, University of Waikato
[39]
Read J, Pfahringer B, Holmes G (2008) Multi-label classification using ensembles of pruned sets. Proceedings–IEEE international conference on data mining, ICDM pp 995–1000. 10.1109/ICDM.2008.74
[40]
Read J, Pfahringer B, Holmes G, et al. Classifier chains for multi-label classification Mach Learn 2011 85 3 333-359
[41]
Rivolli A, Parker LC, de Carvalho AC (2017) Food truck recommendation using multi-label classification. In: portuguese conference on artificial intelligence, Springer, pp 585–596, 10.1007/978-3-319-65340-2_48
[42]
Rivolli A, Soares C, and de Carvalho AC Enhancing multilabel classification for food truck recommendation Expert Syst 2018 35 4 1-19
[43]
Sajnani H, Saini V, Kumar K, et al (2013) The yelp dataset challenge - multilabel classification of yelp reviews into relevant categories. https://www.ics.uci.edu/~vpsaini/
[44]
Sanden C, Zhang JZ (2011) Enhancing multi-label music genre classification through ensemble techniques. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp 705–714, 10.1145/2009916.2010011
[45]
Sechidis K, Tsoumakas G, Vlahavas I (2011) On the stratification of multi-label data. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6913 LNAI
[46]
Silla CN and Aa Freitas A survey of hierarchical classification across different application domains Data Mining Knowl Discov 2011 22 1–2 31-72
[47]
Tsoumakas G and Katakis I Multi label classification: an overview Int J Data Wareh Min 2007 3 3 1-13
[48]
Tsoumakas G, Katakis I, Vlahavas I (2008) Effective and efficient multilabel classification in domains with large number of labels. In: Proc ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD’08) pp 30–44. http://lpis.csd.auth.gr/publications/tsoumakas-mmd08.pdf
[49]
Tsoumakas G, Katakis I, and Vlahavas I Random k-labelsets for multi-label classification IEEE Trans Knowl Data Eng 2011 23 7 1079-1089
[50]
Tsoumakas G, Spyromitros-Xioufis E, Vilcek J, et al. Mulan: a java library for multi-label learning J Mach Learn Res 2011 12 2411-2414
[51]
Turnbull D, Barrington L, Torres D, et al. Semantic annotation and retrieval of music and sound effects IEEE Trans Audio Speech Lang Process 2008 16 2 467-476
[52]
Vens C, Struyf J, Schietgat L, et al. Decision trees for hierarchical multi-label classification Mach Learn 2008 73 2 185-214
[53]
Wang B, Hu X, Zhang C, et al. Hierarchical GAN-tree and bi-directional capsules for multi-label image classification Knowl Based Syst 2022 238 107 882
[54]
Wang H, Li Z, Huang J, et al (2020a) Collaboration based multi-label propagation for fraud detection. IJCAI international joint conference on artificial intelligence 2021-January:2477–2483. 10.24963/ijcai.2020/343
[55]
Wang T, Liu L, Liu N, et al. A multi-label text classification method via dynamic semantic representation model and deep neural network Appl Intell 2020 50 2339-2351
[56]
Wieczorkowska A, Synak P, Ra’s Z (2006) Multi-label classification of emotions in music. In: intelligent information processing and web mining, vol 35. p 307–315
[57]
Xu J, Liu J, Yin J, et al. A multi-label feature extraction algorithm via maximizing feature variance and feature-label dependence simultaneously Knowl Based Syst 2016 98 172-184
[58]
Xu Z, Zhang B, Li D, et al. Hierarchical multilabel classification by exploiting label correlations Int J Mach Learn Cybern 2022 13 1 115-131
[59]
Zhang ML and Zhou ZH Multilabel neural networks with applications to functional genomics and text categorization IEEE Trans Knowl Data Eng 2006 18 10 1338-1351
[60]
Zhang ML and Zhou ZH Ml-knn: a lazy learning approach to multi-label learning Pattern Recognit 2007 40 7 2038-2048
[61]
Zhang ML and Zhou ZH A review on multi-label learning algorithms IEEE Trans Knowl Data Eng 2014 26 8 1819-1837
[62]
Zhou JP, Chen L, Guo ZH, et al. Iatc-nrakel: an efficient multi-label classifier for recognizing anatomical therapeutic chemical classes of drugs Bioinformatics 2020 36 5 1391-1396

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Soft Computing - A Fusion of Foundations, Methodologies and Applications
Soft Computing - A Fusion of Foundations, Methodologies and Applications  Volume 27, Issue 13
Jul 2023
695 pages
ISSN:1432-7643
EISSN:1433-7479
Issue’s Table of Contents

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 14 April 2023
Accepted: 14 March 2023

Author Tags

  1. Multi-label learning
  2. Multi-label classification
  3. Problem transformation methods
  4. Hierarchical multi-label classification

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media