More Web Proxy on the site http://driver.im/

research-article

Free access

Pretrained generalized autoregressive model with adaptive probabilistic label clusters for extreme multi-label text classification

AUTHORs:

Brian D. DavisonAuthors Info & Claims

ICML'20: Proceedings of the 37th International Conference on Machine Learning

Article No.: 1002, Pages 10809 - 10819

Published: 13 July 2020 Publication History

PDF eReader Publisher Site

Abstract

Extreme multi-label text classification (XMTC) is a task for tagging a given text with the most relevant labels from an extremely large label set. We propose a novel deep learning method called APLC-XLNet. Our approach fine-tunes the recently released generalized autoregressive pretrained model (XLNet) to learn a dense representation for the input text. We propose Adaptive Probabilistic Label Clusters (APLC) to approximate the cross entropy loss by exploiting the unbalanced label distribution to form clusters that explicitly reduce the computational time. Our experiments, carried out on five benchmark datasets, show that our approach has achieved new state-of-the-art results on four benchmark datasets.

References

[1]

Babbar, R. and Schölkopf, B. Dismec: Distributed sparse machines for extreme multi-label classification. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 721-729. ACM, 2017.

Digital Library

[2]

Bhatia, K., Jain, H., Kar, P., Varma, M., and Jain, P. Sparse local embeddings for extreme multi-label classification. In Advances in Neural Information Processing Systems, pp. 730-738, 2015.

Digital Library

[3]

Chen, Z., Trabelsi, M., Heflin, J., Xu, Y., and Davison, B. D. Table search using a deep contextualized language model. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 589-598. ACM, 2020.

Digital Library

[4]

Conneau, A. and Lample, G. Cross-lingual language model pretraining. In Advances in Neural Information Processing Systems, pp. 7057-7067, 2019.

[5]

Dai, Z., Yang, Z., Yang, Y., Carbonell, J. G., Le, Q., and Salakhutdinov, R. Transformer-xl: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2978-2988, 2019.

[6]

Dekel, O. and Shamir, O. Multiclass-multilabel classification with more classes than examples. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 137-144, 2010.

[7]

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1 (Long and Short Papers), pp. 4171-4186, 2019.

[8]

Grave, E., Joulin, A., Cissé, M., Jégou, H., et al. Efficient softmax approximation for gpus. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1302-1310. JMLR.org, 2017.

Digital Library

[9]

Guo, Q., Qiu, X., Liu, P., Shao, Y., Xue, X., and Zhang, Z. Star-transformer. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 1315-1325, 2019.

[10]

Howard, J. and Ruder, S. Universal language model finetuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 328-339, 2018.

[11]

Jain, H., Prabhu, Y., and Varma, M. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 935-944. ACM, 2016.

Digital Library

[12]

Jain, H., Balasubramanian, V., Chunduri, B., and Varma, M. Slice: Scalable linear extreme classifiers trained on 100 million labels for related searches. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 528-536. ACM, 2019.

Digital Library

[13]

Jasinska, K., Dembczynski, K., Busa-Fekete, R., Pfannschmidt, K., Klerx, T., and Hullermeier, E. Extreme f-measure maximization using sparse probability estimates. In International Conference on Machine Learning, pp. 1435-1444, 2016.

[14]

Joulin, A., Grave, É., Bojanowski, P., and Mikolov, T. Bag of tricks for efficient text classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 427-431, 2017.

[15]

Khandagale, S., Xiao, H., and Babbar, R. Bonsai-diverse and shallow trees for extreme multi-label classification. arXiv preprint arXiv:1904.08249, 2019.

[16]

Kim, Y. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746-1751, 2014.

[17]

Kudo, T. and Richardson, J. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 66-71, 2018.

[18]

Lai, S., Xu, L., Liu, K., and Zhao, J. Recurrent convolutional neural networks for text classification. In Twenty-ninth AAAI Conference on Artificial Intelligence, 2015.

Digital Library

[19]

Le, H.-S., Oparin, I., Allauzen, A., Gauvain, J.-L., and Yvon, F. Structured output layer neural network language model. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5524-5527. IEEE, 2011.

[20]

Liu, J., Chang, W.-C., Wu, Y., and Yang, Y. Deep learning for extreme multi-label text classification. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 115-124. ACM, 2017.

Digital Library

[21]

Liu, P., Qiu, X., and Huang, X. Recurrent neural network for text classification with multi-task learning. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, pp. 2873-2879, 2016.

Digital Library

[22]

Mikolov, T., Kombrink, S., Burget, L., Černockỳ, J., and Khudanpur, S. Extensions of recurrent neural network language model. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5528-5531. IEEE, 2011.

[23]

Mikolov, T., Chen, K., Corrado, G., and Dean, J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.

[24]

Mnih, A. and Teh, Y. W. A fast and simple algorithm for training neural probabilistic language models. In Proceedings of the 29th International Coference on International Conference on Machine Learning, pp. 419-426, 2012.

Digital Library

[25]

Morin, F. and Bengio, Y. Hierarchical probabilistic neural network language model. In AISTATS, volume 5, pp. 246-252, 2005.

[26]

Prabhu, Y. and Varma, M. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 263-272. ACM, 2014.

Digital Library

[27]

Prabhu, Y., Kag, A., Harsola, S., Agrawal, R., and Varma, M. Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In Proceedings of the 2018 World Wide Web Conference, pp. 993-1002. International World Wide Web Conferences Steering Committee, 2018.

Digital Library

[28]

Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I. Improving language understanding by generative pretraining, 2018.

[29]

Siblini, W., Kuntz, P., and Meyer, F. CRAFTML, an efficient clustering-based random forest for extreme multilabel learning. In Proceedings of the 35th International Conference on Machine Learning, 2018.

[30]

Sun, C., Qiu, X., Xu, Y., and Huang, X. How to fine-tune bert for text classification? In China National Conference on Chinese Computational Linguistics, pp. 194-206. Springer, 2019.

Digital Library

[31]

Tagami, Y. Annexml: Approximate nearest neighbor search for extreme multi-label classification. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 455-464. ACM, 2017.

Digital Library

[32]

Vaswani, A., Zhao, Y., Fossum, V., and Chiang, D. Decoding with large-scale neural language models improves translation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1387-1392, 2013.

[33]

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems, pp. 5998-6008, 2017.

Digital Library

[34]

Wei-Cheng, C., Hsiang-Fu, Y., Kai, Z., Yiming, Y., and Inderjit, D. X-BERT: eXtreme Multi-label Text Classification using Bidirectional Encoder Representations from Transformers. In NeurIPS Science Meets Engineering of Deep Learning Workshop, 2019.

[35]

Wydmuch, M., Jasinska, K., Kuznetsov, M., Busa-Fekete, R., and Dembczynski, K. A no-regret generalization of hierarchical softmax to extreme multi-label classification. In Advances in Neural Information Processing Systems, pp. 6355-6366, 2018.

[36]

Xu, H., Liu, B., Shu, L., and Yu, P. Bert post-training for review reading comprehension and aspect-based sentiment analysis. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, volume 1, 2019.

[37]

Yang, W., Xie, Y., Lin, A., Li, X., Tan, L., Xiong, K., Li, M., and Lin, J. End-to-end open-domain question answering with BERTserini. In NAACL-HLT (Demonstrations), 2019a.

[38]

Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., and Hovy, E. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480-1489, 2016.

[39]

Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., and Le, Q. V. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems, pp. 5753-5763, 2019b.

[40]

Yen, I. E., Huang, X., Dai, W., Ravikumar, P., Dhillon, I., and Xing, E. Ppdsparse: A parallel primal-dual sparse method for extreme classification. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 545-553. ACM, 2017.

Digital Library

[41]

Yen, I. E.-H., Huang, X., Ravikumar, P., Zhong, K., and Dhillon, I. Pd-sparse: A primal and dual sparse approach to extreme multiclass and multilabel classification. In International Conference on Machine Learning, pp. 3069- 3077, 2016.

[42]

Yin, W. and Schütze, H. Attentive convolution: Equipping CNNs with RNN-style attention mechanisms. Transactions of the Association for Computational Linguistics, 6: 687-702, 2018.

[43]

You, R., Zhang, Z., Wang, Z., Dai, S., Mamitsuka, H., and Zhu, S. Attentionxml: Label tree-based attention-aware deep model for high-performance extreme multilabel text classification. In Advances in Neural Information Processing Systems, pp. 5812-5822, 2019.

Cited By

Yao YZhang JZhang PSun Y(2023)A Dual-branch Learning Model with Gradient-balanced Loss for Long-tailed Multi-label Text ClassificationACM Transactions on Information Systems10.1145/359741642:2(1-24)Online publication date: 27-Sep-2023
https://dl.acm.org/doi/10.1145/3597416

Index Terms

Pretrained generalized autoregressive model with adaptive probabilistic label clusters for extreme multi-label text classification

Index terms have been assigned to the content through auto-classification.

Recommendations

Semi-supervised multi-label classification using incomplete label information
Highlights
- An inductive semi-supervised method called Smile is proposed for multi-label classification using incomplete label information.
Abstract
Classifying multi-label instances using incompletely labeled instances is one of the fundamental tasks in multi-label learning. Most existing methods regard this task as supervised weak-label learning problem and assume sufficient ...
Multi-label image classification with a probabilistic label enhancement model
UAI'14: Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence

In this paper, we present a novel probabilistic label enhancement model to tackle multi-label image classification problem. Recognizing multiple objects in images is a challenging problem due to label sparsity, appearance variations of the objects and ...
Multi-label co-training
IJCAI'18: Proceedings of the 27th International Joint Conference on Artificial Intelligence

Multi-label learning aims at assigning a set of appropriate labels to multi-label samples. Although it has been successfully applied in various domains in recent years, most multi-label learning methods require sufficient labeled training samples, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'20: Proceedings of the 37th International Conference on Machine Learning

July 2020

11702 pages

Editors:
Hal Daumé,
Aarti Singh

Copyright © 2020.

Publisher

JMLR.org

Publication History

Published: 13 July 2020

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
47
Total Downloads

Downloads (Last 12 months)37
Downloads (Last 6 weeks)6

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yao YZhang JZhang PSun Y(2023)A Dual-branch Learning Model with Gradient-balanced Loss for Long-tailed Multi-label Text ClassificationACM Transactions on Information Systems10.1145/359741642:2(1-24)Online publication date: 27-Sep-2023
https://dl.acm.org/doi/10.1145/3597416

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents