[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3394486.3403151acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Correlation Networks for Extreme Multi-label Text Classification

Published: 20 August 2020 Publication History

Abstract

This paper develops the Correlation Networks (CorNet) architecture for the extreme multi-label text classification (XMTC) task, where the objective is to tag an input text sequence with the most relevant subset of labels from an extremely large label set. XMTC can be found in many real-world applications, such as document tagging and product annotation. Recently, deep learning models have achieved outstanding performances in XMTC tasks. However, these deep XMTC models ignore the useful correlation information among different labels. CorNet addresses this limitation by adding an extra CorNet module at the prediction layer of a deep model, which is able to learn label correlations, enhance raw label predictions with correlation knowledge and output augmented label predictions. We show that CorNet can be easily integrated with deep XMTC models and generalize effectively across different datasets. We further demonstrate that CorNet can bring significant improvements over the existing deep XMTC models in terms of both performance and convergence rate. The models and datasets are available at: https://github.com/XunGuangxu/CorNet.

References

[1]
Rahul Agrawal, Archit Gupta, Yashoteja Prabhu, and Manik Varma. 2013. Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In Proceedings of the 22nd international conference on World Wide Web. 13--24.
[2]
Rohit Babbar and Bernhard Schölkopf. 2017. Dismec: Distributed sparse machines for extreme multi-label classification. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. 721--729.
[3]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
[4]
Shaojie Bai, J Zico Kolter, and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018).
[5]
Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, and Prateek Jain. 2015. Sparse local embeddings for extreme multi-label classification. In Advances in neural information processing systems. 730--738.
[6]
Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, and Inderjit Dhillon. 2019. X-BERT: eXtreme Multi-label Text Classification with BERT. arXiv preprint arXiv:1905.02331 (2019).
[7]
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).
[8]
Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. 2015. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015).
[9]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[10]
Amit Garg, Jonathan Noyola, Romil Verma, Ashutosh Saxena, and Aditya Jami. 2015. Exploring correlation between labels to improve multi-label classification. arXiv preprint arXiv:1511.07953 (2015).
[11]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[13]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.
[14]
Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. 2018. Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407 (2018).
[15]
Himanshu Jain, Yashoteja Prabhu, and Manik Varma. 2016. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 935--944.
[16]
Kalina Jasinska, Krzysztof Dembczynski, Róbert Busa-Fekete, Karlson Pfannschmidt, Timo Klerx, and Eyke Hullermeier. 2016. Extreme F-measure maximization using sparse probability estimates. In International Conference on Machine Learning. 1435--1444.
[17]
Qiao Jin, Bhuwan Dhingra, William Cohen, and Xinghua Lu. 2018. AttentionMeSH: Simple, Effective and Interpretable Automatic MeSH Indexer. In Proceedings of the 6th BioASQ Workshop A challenge on large-scale biomedical semantic indexing and question answering. 47--56.
[18]
Sujay Khandagale, Han Xiao, and Rohit Babbar. 2019. Bonsai-diverse and shallow trees for extreme multi-label classification. arXiv preprint arXiv:1904.08249 (2019).
[19]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[20]
Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130 (2017).
[21]
Jingzhou Liu, Wei-Cheng Chang, Yuexin Wu, and Yiming Yang. 2017. Deep learning for extreme multi-label text classification. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 115--124.
[22]
Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems. 165--172.
[23]
Eneldo Loza Mencia and Johannes Fürnkranz. 2008. Efficient pairwise multilabel classification for large-scale problems in the legal domain. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 50--65.
[24]
Tien Thanh Nguyen, Thi Thu Thuy Nguyen, Anh Vu Luong, Quoc Viet Hung Nguyen, Alan Wee-Chung Liew, and Bela Stantic. 2019. Multi-label classification via label correlation and first order feature dependance in a data stream. Pattern recognition, Vol. 90 (2019), 35--51.
[25]
Ioannis Partalas, Aris Kosmopoulos, Nicolas Baskiotis, Thierry Artieres, George Paliouras, Eric Gaussier, Ion Androutsopoulos, Massih-Reza Amini, and Patrick Galinari. 2015. Lshtc: A benchmark for large-scale text classification. arXiv preprint arXiv:1503.08581 (2015).
[26]
Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global Vectors for Word Representation. In EMNLP, Vol. 14. 1532--1543.
[27]
Yashoteja Prabhu, Anil Kag, Shrutendra Harsola, Rahul Agrawal, and Manik Varma. 2018. Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In Proceedings of the 2018 World Wide Web Conference. 993--1002.
[28]
Yashoteja Prabhu and Manik Varma. 2014. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 263--272.
[29]
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.
[30]
Yukihiro Tagami. 2017. Annexml: Approximate nearest neighbor search for extreme multi-label classification. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 455--464.
[31]
George Tsatsaronis, Georgios Balikas, Prodromos Malakasiotis, Ioannis Partalas, Matthias Zschunke, Michael R Alvers, Dirk Weissenborn, Anastasia Krithara, Sergios Petridis, Dimitris Polychronopoulos, Yannis Almirantis, John Pavlopoulos, Nicolas Baskiotis, Patrick Gallinari, Thierry Artieres, Axel Ngonga, Norman Heino, Eric Gaussier, Liliana Barrio-Alvers, Michael Schroeder, Ion Androutsopoulos, and Georgios Paliouras. 2015. An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics, Vol. 16 (2015), 138. https://doi.org/10.1186/s12859-015-0564--6
[32]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998--6008.
[33]
Guangxu Xun, Kishlay Jha, Ye Yuan, Yaqing Wang, and Aidong Zhang. 2016. MeSHProbeNet: A Self-attentive Probe Net for MeSH Indexing. Bioinformatics, Vol. 32, 12 (2016), 70--79. https://doi.org/10.1093/bioinformatics/btw294
[34]
Guangxu Xun, Yaliang Li, Jing Gao, and Aidong Zhang. 2017a. Collaboratively improving topic discovery and word embeddings by coordinating global and local contexts. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 535--543.
[35]
Guangxu Xun, Yaliang Li, Wayne Xin Zhao, Jing Gao, and Aidong Zhang. 2017b. A Correlated Topic Model Using Word Embeddings. In Proceedings of the 26th International Joint Conference on Artificial Intelligence .
[36]
Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu, and Houfeng Wang. 2018. SGM: sequence generation model for multi-label classification. arXiv preprint arXiv:1806.04822 (2018).
[37]
Ian EH Yen, Xiangru Huang, Wei Dai, Pradeep Ravikumar, Inderjit Dhillon, and Eric Xing. 2017. Ppdsparse: A parallel primal-dual sparse method for extreme classification. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 545--553.
[38]
Ronghui You, Suyang Dai, Zihan Zhang, Hiroshi Mamitsuka, and Shanfeng Zhu. 2018. Attentionxml: Extreme multi-label text classification with multi-label attention based recurrent neural networks. arXiv preprint arXiv:1811.01727 (2018).

Cited By

View all
  • (2024)Hierarchical Text Classification and Its Foundations: A Review of Current ResearchElectronics10.3390/electronics1307119913:7(1199)Online publication date: 25-Mar-2024
  • (2024)Contrastive Enhanced Learning for Multi-Label Text ClassificationApplied Sciences10.3390/app1419865014:19(8650)Online publication date: 25-Sep-2024
  • (2024)Knowledge-enhanced Dynamic Modeling framework for Multi-Behavior RecommendationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679949(3882-3886)Online publication date: 21-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
August 2020
3664 pages
ISBN:9781450379984
DOI:10.1145/3394486
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. label correlation
  3. multi-label text classification

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)937
  • Downloads (Last 6 weeks)132
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Hierarchical Text Classification and Its Foundations: A Review of Current ResearchElectronics10.3390/electronics1307119913:7(1199)Online publication date: 25-Mar-2024
  • (2024)Contrastive Enhanced Learning for Multi-Label Text ClassificationApplied Sciences10.3390/app1419865014:19(8650)Online publication date: 25-Sep-2024
  • (2024)Knowledge-enhanced Dynamic Modeling framework for Multi-Behavior RecommendationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679949(3882-3886)Online publication date: 21-Oct-2024
  • (2024)Complementary to Multiple Labels: A Correlation-Aware Correction ApproachIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.341638446:12(9179-9191)Online publication date: Dec-2024
  • (2024)Variational Continuous Label Distribution Learning for Multi-Label Text ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3323401(1-15)Online publication date: 2024
  • (2024)Czech medical coding assistant based on transformer networksComputers in Biology and Medicine10.1016/j.compbiomed.2024.108672178:COnline publication date: 19-Sep-2024
  • (2024)Residual diverse ensemble for long-tailed multi-label text classificationScience China Information Sciences10.1007/s11432-022-3915-667:11Online publication date: 23-Oct-2024
  • (2024)TLC-XML: Transformer with Label Correlation for Extreme Multi-label Text ClassificationNeural Processing Letters10.1007/s11063-024-11460-z56:1Online publication date: 10-Feb-2024
  • (2024)TAE: Topic-aware encoder for large-scale multi-label text classificationApplied Intelligence10.1007/s10489-024-05485-z54:8(6269-6284)Online publication date: 9-May-2024
  • (2024)EmoBART: Multi-label Emotion Classification Method Based on Pre-trained Label Sequence Generation ModelNeural Computing for Advanced Applications10.1007/978-981-97-7007-6_8(104-115)Online publication date: 22-Sep-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media