More Web Proxy on the site http://driver.im/

research-article

Public Access

Correlation Networks for Extreme Multi-label Text Classification

Authors:

Aidong ZhangAuthors Info & Claims

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 1074 - 1082

https://doi.org/10.1145/3394486.3403151

Published: 20 August 2020 Publication History

Abstract

This paper develops the Correlation Networks (CorNet) architecture for the extreme multi-label text classification (XMTC) task, where the objective is to tag an input text sequence with the most relevant subset of labels from an extremely large label set. XMTC can be found in many real-world applications, such as document tagging and product annotation. Recently, deep learning models have achieved outstanding performances in XMTC tasks. However, these deep XMTC models ignore the useful correlation information among different labels. CorNet addresses this limitation by adding an extra CorNet module at the prediction layer of a deep model, which is able to learn label correlations, enhance raw label predictions with correlation knowledge and output augmented label predictions. We show that CorNet can be easily integrated with deep XMTC models and generalize effectively across different datasets. We further demonstrate that CorNet can bring significant improvements over the existing deep XMTC models in terms of both performance and convergence rate. The models and datasets are available at: https://github.com/XunGuangxu/CorNet.

References

[1]

Rahul Agrawal, Archit Gupta, Yashoteja Prabhu, and Manik Varma. 2013. Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In Proceedings of the 22nd international conference on World Wide Web. 13--24.

Digital Library

[2]

Rohit Babbar and Bernhard Schölkopf. 2017. Dismec: Distributed sparse machines for extreme multi-label classification. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. 721--729.

Digital Library

[3]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).

[4]

Shaojie Bai, J Zico Kolter, and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018).

[5]

Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, and Prateek Jain. 2015. Sparse local embeddings for extreme multi-label classification. In Advances in neural information processing systems. 730--738.

[6]

Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, and Inderjit Dhillon. 2019. X-BERT: eXtreme Multi-label Text Classification with BERT. arXiv preprint arXiv:1905.02331 (2019).

[7]

Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).

Digital Library

[8]

Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. 2015. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015).

[9]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[10]

Amit Garg, Jonathan Noyola, Romil Verma, Ashutosh Saxena, and Aditya Jami. 2015. Exploring correlation between labels to improve multi-label classification. arXiv preprint arXiv:1511.07953 (2015).

[11]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.

[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[13]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.

[14]

Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. 2018. Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407 (2018).

[15]

Himanshu Jain, Yashoteja Prabhu, and Manik Varma. 2016. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 935--944.

Digital Library

[16]

Kalina Jasinska, Krzysztof Dembczynski, Róbert Busa-Fekete, Karlson Pfannschmidt, Timo Klerx, and Eyke Hullermeier. 2016. Extreme F-measure maximization using sparse probability estimates. In International Conference on Machine Learning. 1435--1444.

[17]

Qiao Jin, Bhuwan Dhingra, William Cohen, and Xinghua Lu. 2018. AttentionMeSH: Simple, Effective and Interpretable Automatic MeSH Indexer. In Proceedings of the 6th BioASQ Workshop A challenge on large-scale biomedical semantic indexing and question answering. 47--56.

[18]

Sujay Khandagale, Han Xiao, and Rohit Babbar. 2019. Bonsai-diverse and shallow trees for extreme multi-label classification. arXiv preprint arXiv:1904.08249 (2019).

[19]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[20]

Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130 (2017).

[21]

Jingzhou Liu, Wei-Cheng Chang, Yuexin Wu, and Yiming Yang. 2017. Deep learning for extreme multi-label text classification. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 115--124.

Digital Library

[22]

Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems. 165--172.

Digital Library

[23]

Eneldo Loza Mencia and Johannes Fürnkranz. 2008. Efficient pairwise multilabel classification for large-scale problems in the legal domain. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 50--65.

Digital Library

[24]

Tien Thanh Nguyen, Thi Thu Thuy Nguyen, Anh Vu Luong, Quoc Viet Hung Nguyen, Alan Wee-Chung Liew, and Bela Stantic. 2019. Multi-label classification via label correlation and first order feature dependance in a data stream. Pattern recognition, Vol. 90 (2019), 35--51.

[25]

Ioannis Partalas, Aris Kosmopoulos, Nicolas Baskiotis, Thierry Artieres, George Paliouras, Eric Gaussier, Ion Androutsopoulos, Massih-Reza Amini, and Patrick Galinari. 2015. Lshtc: A benchmark for large-scale text classification. arXiv preprint arXiv:1503.08581 (2015).

[26]

Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global Vectors for Word Representation. In EMNLP, Vol. 14. 1532--1543.

[27]

Yashoteja Prabhu, Anil Kag, Shrutendra Harsola, Rahul Agrawal, and Manik Varma. 2018. Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In Proceedings of the 2018 World Wide Web Conference. 993--1002.

Digital Library

[28]

Yashoteja Prabhu and Manik Varma. 2014. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 263--272.

Digital Library

[29]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.

[30]

Yukihiro Tagami. 2017. Annexml: Approximate nearest neighbor search for extreme multi-label classification. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 455--464.

Digital Library

[31]

George Tsatsaronis, Georgios Balikas, Prodromos Malakasiotis, Ioannis Partalas, Matthias Zschunke, Michael R Alvers, Dirk Weissenborn, Anastasia Krithara, Sergios Petridis, Dimitris Polychronopoulos, Yannis Almirantis, John Pavlopoulos, Nicolas Baskiotis, Patrick Gallinari, Thierry Artieres, Axel Ngonga, Norman Heino, Eric Gaussier, Liliana Barrio-Alvers, Michael Schroeder, Ion Androutsopoulos, and Georgios Paliouras. 2015. An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics, Vol. 16 (2015), 138. https://doi.org/10.1186/s12859-015-0564--6

[32]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998--6008.

[33]

Guangxu Xun, Kishlay Jha, Ye Yuan, Yaqing Wang, and Aidong Zhang. 2016. MeSHProbeNet: A Self-attentive Probe Net for MeSH Indexing. Bioinformatics, Vol. 32, 12 (2016), 70--79. https://doi.org/10.1093/bioinformatics/btw294

[34]

Guangxu Xun, Yaliang Li, Jing Gao, and Aidong Zhang. 2017a. Collaboratively improving topic discovery and word embeddings by coordinating global and local contexts. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 535--543.

Digital Library

[35]

Guangxu Xun, Yaliang Li, Wayne Xin Zhao, Jing Gao, and Aidong Zhang. 2017b. A Correlated Topic Model Using Word Embeddings. In Proceedings of the 26th International Joint Conference on Artificial Intelligence .

[36]

Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu, and Houfeng Wang. 2018. SGM: sequence generation model for multi-label classification. arXiv preprint arXiv:1806.04822 (2018).

[37]

Ian EH Yen, Xiangru Huang, Wei Dai, Pradeep Ravikumar, Inderjit Dhillon, and Eric Xing. 2017. Ppdsparse: A parallel primal-dual sparse method for extreme classification. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 545--553.

Digital Library

[38]

Ronghui You, Suyang Dai, Zihan Zhang, Hiroshi Mamitsuka, and Shanfeng Zhu. 2018. Attentionxml: Extreme multi-label text classification with multi-label attention based recurrent neural networks. arXiv preprint arXiv:1811.01727 (2018).

Cited By

Kumar SChohan JKalita K(2025)WhaleOptNB: A Method for Automated Biomedical Text Document ClassificationIntelligent Computing and Optimization10.1007/978-3-031-73324-6_18(174-184)Online publication date: 10-Jan-2025
https://doi.org/10.1007/978-3-031-73324-6_18
Zangari AMarcuzzo MRizzo MGiudice LAlbarelli AGasparetto A(2024)Hierarchical Text Classification and Its Foundations: A Review of Current ResearchElectronics10.3390/electronics1307119913:7(1199)Online publication date: 25-Mar-2024
https://doi.org/10.3390/electronics13071199
Wu TYang S(2024)Contrastive Enhanced Learning for Multi-Label Text ClassificationApplied Sciences10.3390/app1419865014:19(8650)Online publication date: 25-Sep-2024
https://doi.org/10.3390/app14198650
Show More Cited By

Index Terms

Correlation Networks for Extreme Multi-label Text Classification
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification
WWW '22: Proceedings of the ACM Web Conference 2022

Large-scale multi-label text classification (LMTC) aims to associate a document with its relevant labels from a large candidate set. Most existing LMTC approaches rely on massive human-annotated training data, which are often costly to obtain and suffer ...
Semi-supervised multi-label classification using incomplete label information
Highlights
- An inductive semi-supervised method called Smile is proposed for multi-label classification using incomplete label information.
Abstract
Classifying multi-label instances using incompletely labeled instances is one of the fundamental tasks in multi-label learning. Most existing methods regard this task as supervised weak-label learning problem and assume sufficient ...
Asymmetry label correlation for multi-label learning
Abstract
As an effective method for mining latent information between labels, label correlation is widely adopted by many scholars to model multi-label learning algorithms. Most existing multi-label algorithms usually ignore that the correlation between ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

August 2020

3664 pages

ISBN:9781450379984

DOI:10.1145/3394486

General Chairs:
Rajesh Gupta
UC San Diego, USA
,
Yan Liu
USC, USA
,
Program Chairs:
Mohak Shah
LG Electronics, USA
,
Suju Rajan
Linkedin, USA
,
Publications Chairs:
Jiliang Tang
Michigan State, USA
,
B. Aditya Prakash
Georgia Tech, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

KDD '20

Sponsor:

KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

July 6 - 10, 2020

CA, Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
4,332
Total Downloads

Downloads (Last 12 months)948
Downloads (Last 6 weeks)75

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kumar SChohan JKalita K(2025)WhaleOptNB: A Method for Automated Biomedical Text Document ClassificationIntelligent Computing and Optimization10.1007/978-3-031-73324-6_18(174-184)Online publication date: 10-Jan-2025
https://doi.org/10.1007/978-3-031-73324-6_18
Zangari AMarcuzzo MRizzo MGiudice LAlbarelli AGasparetto A(2024)Hierarchical Text Classification and Its Foundations: A Review of Current ResearchElectronics10.3390/electronics1307119913:7(1199)Online publication date: 25-Mar-2024
https://doi.org/10.3390/electronics13071199
Wu TYang S(2024)Contrastive Enhanced Learning for Multi-Label Text ClassificationApplied Sciences10.3390/app1419865014:19(8650)Online publication date: 25-Sep-2024
https://doi.org/10.3390/app14198650
Li XWang NZeng JZhong YShen ZSerra ESpezzano F(2024)Knowledge-enhanced Dynamic Modeling framework for Multi-Behavior RecommendationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679949(3882-3886)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679949
Gao YXu MZhang M(2024)Complementary to Multiple Labels: A Correlation-Aware Correction ApproachIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.341638446:12(9179-9191)Online publication date: Dec-2024
https://doi.org/10.1109/TPAMI.2024.3416384
Zhao XAn YXu NGeng X(2024)Variational Continuous Label Distribution Learning for Multi-Label Text ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3323401(1-15)Online publication date: 2024
https://doi.org/10.1109/TKDE.2023.3323401
Gong LShor AZhang AJha K(2024)Context-Specific Feature Augmentation for Improving Social Determinants of Health Extraction2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825225(1736-1745)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10825225
Lenc LMartínek JBaloun JPřibáň PPrantl MTaylor SKrál PKyliš J(2024)Czech medical coding assistant based on transformer networksComputers in Biology and Medicine10.1016/j.compbiomed.2024.108672178:COnline publication date: 19-Sep-2024
https://dl.acm.org/doi/10.1016/j.compbiomed.2024.108672
Shi JWei TLi Y(2024)Residual diverse ensemble for long-tailed multi-label text classificationScience China Information Sciences10.1007/s11432-022-3915-667:11Online publication date: 23-Oct-2024
https://doi.org/10.1007/s11432-022-3915-6
Zhao FAi QLi XWang WGao QLiu Y(2024)TLC-XML: Transformer with Label Correlation for Extreme Multi-label Text ClassificationNeural Processing Letters10.1007/s11063-024-11460-z56:1Online publication date: 10-Feb-2024
https://doi.org/10.1007/s11063-024-11460-z
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten