[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3447548.3467426acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Generalized Zero-Shot Extreme Multi-label Learning

Published: 14 August 2021 Publication History

Abstract

Extreme Multi-label Learning (XML) involves assigning the subset of most relevant labels to a data point from millions of label choices. A hitherto unaddressed challenge in XML is that of predicting unseen labels with no training points. These form a significant fraction of total labels and contain fresh and personalized information desired by end users. Most existing extreme classifiers are not equipped for zero-shot label prediction and hence fail to leverage unseen labels. As a remedy, this paper proposes a novel approach called ZestXML for the task of Generalized Zero-shot XML (GZXML) where relevant labels have to be chosen from all available seen and unseen labels. ZestXML learns to project a data point's features close to the features of its relevant labels through a highly sparsified linear transform. This L0-constrained linear map between the two high-dimensional feature vectors is tractably recovered through a novel optimizer based on Hard Thresholding. By effectively leveraging the sparsities in features, labels and the learnt model, ZestXML achieves higher accuracy and smaller model size than existing XML approaches while also promoting efficient training & prediction, real-time label update as well as explainable prediction.
Experiments on large-scale GZXML datasets demonstrated that ZestXML can be up to 14% and 10% more accurate than state-of-the-art extreme classifiers and leading BERT-based dense retrievers respectively, while having 10x smaller model size. ZestXML trains on largest dataset with 31M labels in just 30 hours on a single core of a commodity desktop. When added to an large ensemble of existing models in Bing Sponsored Search Advertising, ZestXML significantly improved click yield of IR based system by 17% and unseen query coverage by 3.4% respectively. ZestXML's source code and benchmark datasets for GZXML will be publically released for research purposes here.

References

[1]
R. Agrawal, A. Gupta, Y. Prabhu, and M. Varma. 2013. Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In WWW.
[2]
R. Babbar and B. Schölkopf. 2017. DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification. In WSDM.
[3]
R. Babbar and B. Schölkopf. 2019. Data scarcity, robustness and extreme multi-label classification. ML (2019).
[4]
K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain. 2015. Sparse Local Embeddings for Extreme Multi-label Classification. In NIPS.
[5]
D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent dirichlet allocation. JMLR, Vol. 3, Jan (2003), 993--1022.
[6]
Chang W. C., Yu H. F., Zhong K., Yang Y., and Dhillon I. S. 2019. A Modular Deep Learning Approach for Extreme Multi-label Text Classification. CoRR (2019).
[7]
I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, and I. Androutsopoulos. 2019. Large-Scale Multi-Label Text Classification on EU Legislation. In ACL. 6314--6322.
[8]
W. C. Chang, H. F. Yu, K. Zhong, Y. Yang, and I.S. Dhillon. 2020. Taming Pretrained Transformers for Extreme Multi-label Text Classification. (August 2020).
[9]
J. Chen and Q. Gu. 2017. Fast Newton Hard Thresholding Pursuit for Sparsity Constrained Nonconvex Optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 757--766.
[10]
J. Chen, K. Li, J. Zhu, and W. Chen. 2016. WarpLDA: a Cache Efficient O(1) Algorithm for Latent Dirichlet Allocation. In VLDB.
[11]
K. Dahiya, D. Saini, A. Mittal, K. Dave, H. Jain, S. Agarwal, and M. Varma. 2021. DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents. WSDM to appear (2021).
[12]
Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, and W. Bruce Croft. 2017. Neural Ranking Models with Weak Supervision. In SIGIR. 65--74.
[13]
R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin. 2008. LIBLINEAR: A library for large linear classification. (2008).
[14]
A. Gaure, A. Gupta, V. K. Verma, and P. Rai. 2017. A probabilistic framework for zero-shot multi-label learning. In UAI, Vol. 1. 3.
[15]
J. Guan, A. Zhao, and Z. Lu. 2018. Extreme Reverse Projection Learning for Zero-Shot Recognition. In ACCV. Springer, 125--141.
[16]
C. Guo, A. Mousavi, X. Wu, D. N. Holtmann-Rice, S. Kale, S. Reddi, and S. Kumar. 2019. Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces. In NeurIPS.
[17]
B. Hariharan, S. V. N. Vishwanathan, and M. Varma. 2012. Efficient max-margin multi-label classification with applications to zero-shot learning. Machine learning, Vol. 88, 1--2 (2012), 127--155.
[18]
D. Huynh and E. Elhamifar. 2020. A shared multi-attention framework for multi-label zero-shot learning. In CVPR. 8776--8786.
[19]
H. Jain, V. Balasubramanian, B. Chunduri, and M. Varma. 2019. Slice: Scalable Linear Extreme Classifiers trained on 100 Million Labels for Related Searches. In WSDM.
[20]
H. Jain, Y. Prabhu, and M. Varma. 2016. Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking and Other Missing Label Applications. In KDD.
[21]
A. Jalan and P. Kar. 2019. Accelerating Extreme Classification via Adaptive Feature Agglomeration. IJCAI (2019).
[22]
K. Jasinska, K. Dembczynski, R. Busa-Fekete, K. Pfannschmidt, T. Klerx, and E. Hullermeier. 2016. Extreme F-measure Maximization using Sparse Probability Estimates. In ICML.
[23]
Y. Jernite, A. Choromanska, and D. Sontag. 2017. Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation. In ICML.
[24]
A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In Proceedings of the European Chapter of the Association for Computational Linguistics.
[25]
Singh R. Keshari, R. and M Vatsa. 2020. Generalized Zero-Shot Learning Via Over-Complete Distribution. arXiv preprint arXiv:2004.00666 (2020).
[26]
S. Khandagale, H. Xiao, and R. Babbar. 2019. Bonsai - Diverse and Shallow Trees for Extreme Multi-label Classification. CoRR (2019).
[27]
V. Kumar Verma, G. Arora, A. Mishra, and P. Rai. 2018. Generalized zero-shot learning via synthesized examples. In CVPR. 4281--4289.
[28]
A. Kusupati, M. Singh, K. Bhatia, A. Kumar, P. Jain, and M. Varma. 2018. FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network. In NeurIPS.
[29]
Y. Le Cacheux, H. Le Borgne, and M. Crucianu. 2019. From classical to generalized zero-shot learning: A simple adaptation process. In International Conference on Multimedia Modeling. Springer, 465--477.
[30]
Lee, C. W., W. Fang, Yeh, C. K., Frank Wang, and Yu-Chiang. 2018. Multi-label zero-shot learning with structured knowledge graphs. In CVPR. 1576--1585.
[31]
Z. Lin, G. Ding, M. Hu, and J. Wang. 2014. Multi-label Classification via Feature-aware Implicit Label Space Encoding. In ICML.
[32]
J. Liu, W. Chang, Y. Wu, and Y. Yang. 2017. Deep Learning for Extreme Multi-label Text Classification. In SIGIR.
[33]
S. Liu, M. Long, J. Wang, and M. I. Jordan. 2018. Generalized zero-shot learning with deep calibration network. In NIPS. 2005--2015.
[34]
W. Lu, J. Jiao, and R. Zhang. 2020. TwinBERT: Distilling Knowledge to Twin-Structured Compressed BERT Models for Large-Scale Retrieval. In CIKM. 2645--2652.
[35]
Y. A. Malkov and D. A. Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence (2018).
[36]
S. Mallat and Z. Zhang. 1993. Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing (1993), 3397--3415.
[37]
T. K. R. Medini, Q. Huang, Y. Wang, V. Mohan, and A. Shrivastava. 2019. Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products. In Neurips.
[38]
T. Mensink, E. Gavves, and C. G. Snoek. 2014. Costa: Co-occurrence statistics for zero-shot classification. In CVPR. 2441--2448.
[39]
P. Mineiro and N. Karampatziakis. 2015. Fast Label Embeddings via Randomized Linear Algebra. In Joint European conference on machine learning and knowledge discovery in databases.
[40]
A. Mittal, K. Dahiya, D. Saini, S. Agarwal, P. Kar, and M. Varma. 2021. DECAF: Deep Extreme Classification with Label Features. WSDM to appear (2021).
[41]
J. Nam, E. L. Mencía, H. J. Kim, and J. Fürnkranz. 2015. Predicting unseen labels using label hierarchies in large-scale multi-label learning. In ECML PKDD. Springer, 102--118.
[42]
A. Niculescu-Mizil and E. Abbasnejad. 2017. Label Filters for Large Scale Multilabel Classification. In AISTATS.
[43]
M. Norouzi, T. Mikolov, S. Bengio, Y. Singer, J. Shlens, A. Frome, G. S. Corrado, and J Dean. 2013. Zero-shot learning by convex combination of semantic embeddings. arXiv preprint arXiv:1312.5650 (2013).
[44]
Y. Prabhu, A. Kag, S. Gopinath, K. Dahiya, S. Harsola, R. Agrawal, and M. Varma. 2018a. Extreme multi-label learning with label features for warm-start tagging, ranking and recommendation. In WSDM.
[45]
Y. Prabhu, A. Kag, S. Harsola, R. Agrawal, and M. Varma. 2018b. Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In WWW.
[46]
Y. Prabhu, A. Kusupati, N. Gupta, and M. Varma. 2020. Extreme Regression for Dynamic Search Advertising. WSDM (2020).
[47]
Y. Prabhu and M. Varma. 2014. FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning. In KDD.
[48]
N. Reimers and I. Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. EMNLP-IJCNLP (2019).
[49]
N. Reimers and I. Gurevych. 2020. The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes. arXiv preprint arXiv:2012.14210 (2020).
[50]
A. Rios and R. Kavuluru. 2018. Few-shot and zero-shot multi-label learning for structured label spaces. In EMNLP, Vol. 2018. 3132.
[51]
Romera-Paredes, B., and P. Torr. 2015. An embarrassingly simple approach to zero-shot learning. In ICML. 2152--2161.
[52]
Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. 2014. Learning Semantic Representations Using Convolutional Neural Networks for Web Search. In Proceedings of the International Conference on World Wide Web (WWW '14 Companion).
[53]
S. Si, H. Zhang, S. S. Keerthi, D. Mahajan, I. S. Dhillon, and C. J. Hsieh. 2017. Gradient Boosted Decision Trees for High Dimensional Sparse Output. In ICML. 3182--3190.
[54]
J. Tropp and A. Gilbert. 2007. Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit. IEEE Transactions on Information Theory, Vol. 53 (2007), 4655--4666.
[55]
T. Wei and Y. F. Li. 2018. Does Tail Label Help for Large-Scale Multi-Label Learning. In IJCAI.
[56]
J. Weston, S. Bengio, and U. Nicolas. 2011. Wsabie: Scaling up to large vocabulary image annotation. In IJCAI.
[57]
M. Wydmuch, K. Jasinska, M. Kuznetsov, R. Busa-Fekete, and K. Dembczynski. 2018. A no-regret generalization of hierarchical softmax to extreme multi-label classification. In NIPS. Curran Associates Inc.
[58]
Y. Xiao-Tong and L. Qingshan. 2014. Newton Greedy Pursuit: A Quadratic Approximation Method for Sparsity-Constrained Optimization. In 2014 IEEE Conference on Computer Vision and Pattern Recognition. 4122--4129.
[59]
S. Xie and S. Y. Philip. 2017. Active zero-shot learning: A novel approach to extreme multi-labeled classification. International Journal of Data Science and Analytics, Vol. 3, 3 (2017), 151--160.
[60]
L. Xiong, C. Xiong, Y. Li, K. F. Tang, J. Liu, P. Bennett, J. Ahmed, and A. Overwijk. 2020. Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv preprint arXiv:2007.00808 (2020).
[61]
C. Xu, D. Tao, and C. Xu. 2016. Robust Extreme Multi-label Learning. In KDD.
[62]
N. Yadav, R. Sen, D. N. Hill, A. Mazumdar, and I. S. Dhillon. 2020. Session-Aware Query Auto-completion using Extreme Multi-label Ranking. arXiv preprint arXiv:2012.07654 (2020).
[63]
E.H. I. Yen, X. Huang, W. Dai, I. Ravikumar, P.and Dhillon, and E. Xing. 2017. PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification. In KDD.
[64]
E.H. I. Yen, X. Huang, K. Zhong, P. Ravikumar, and I. S. Dhillon. 2016. PD-Sparse: A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification. In ICML.
[65]
Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. In KDD. 974--983.
[66]
R. You, Z. Zhang, Z. Wang, S. Dai, H. Mamitsuka, and S. Zhu. 2019. AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification. In NeurIPS.
[67]
H. Yu, P. Jain, P. Kar, and I. S. Dhillon. 2014. Large-scale Multi-label Learning with Missing Labels. In ICML.
[68]
H. Zamani, B. Mitra, X. Song, N. Craswell, and S. Tiwary. 2018. Neural Ranking Models with Multiple Document Fields. WSDM (2018).
[69]
Y. Zhang, B. Gong, and M Shah. 2016. Fast zero-shot image tagging. In CVPR. 5985--5994.

Cited By

View all
  • (2024)MatchXML: An Efficient Text-Label Matching Framework for Extreme Multi-Label Text ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337475036:9(4781-4793)Online publication date: Sep-2024
  • (2024)Leveraging Pre-Trained Extreme Multi-Label Classifiers for Zero-Shot Learning2024 11th IEEE Swiss Conference on Data Science (SDS)10.1109/SDS60720.2024.00041(233-236)Online publication date: 30-May-2024
  • (2024)Collaborative learning of supervision and correlation for generalized zero-shot extreme multi-label learningApplied Intelligence10.1007/s10489-024-05498-854:8(6285-6298)Online publication date: 9-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
August 2021
4259 pages
ISBN:9781450383325
DOI:10.1145/3447548
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. extreme multi-label classification
  2. label metadata
  3. sponsored search advertising
  4. zero-shot learning

Qualifiers

  • Research-article

Conference

KDD '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)119
  • Downloads (Last 6 weeks)13
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)MatchXML: An Efficient Text-Label Matching Framework for Extreme Multi-Label Text ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337475036:9(4781-4793)Online publication date: Sep-2024
  • (2024)Leveraging Pre-Trained Extreme Multi-Label Classifiers for Zero-Shot Learning2024 11th IEEE Swiss Conference on Data Science (SDS)10.1109/SDS60720.2024.00041(233-236)Online publication date: 30-May-2024
  • (2024)Collaborative learning of supervision and correlation for generalized zero-shot extreme multi-label learningApplied Intelligence10.1007/s10489-024-05498-854:8(6285-6298)Online publication date: 9-May-2024
  • (2023)SemSup-XCProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618419(228-247)Online publication date: 23-Jul-2023
  • (2023)Conditional Consistency Regularization for Semi-Supervised Multi-Label Image ClassificationIEEE Transactions on Multimedia10.1109/TMM.2023.332413226(4206-4216)Online publication date: 12-Oct-2023
  • (2023)Topic Recommendation for GitHub Repositories: How Far Can Extreme Multi-Label Learning Go?2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00025(167-178)Online publication date: Mar-2023
  • (2023)Interrelated feature selection from health surveys using domain knowledge graphHealth Information Science and Systems10.1007/s13755-023-00254-711:1Online publication date: 16-Nov-2023
  • (2023)Learning metric space with distillation for large-scale multi-label text classificationNeural Computing and Applications10.1007/s00521-023-08308-335:15(11445-11458)Online publication date: 11-Feb-2023
  • (2022)ELIASProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601709(19798-19809)Online publication date: 28-Nov-2022
  • (2022)CascadeXMLProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600421(2074-2087)Online publication date: 28-Nov-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media