[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3430984.3431043acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
short-paper

Learning Multi-Sense Word Distributions using Approximate Kullback-Leibler Divergence

Published: 02 January 2021 Publication History

Abstract

Learning word representations has garnered greater attention in the recent past due to its diverse text applications. Word embeddings encapsulate the syntactic and semantic regularities of words among sentences. Modelling word embedding as multi-sense gaussian mixture distributions will additionally capture uncertainty and polysemy of words. We propose to learn the Gaussian mixture representation of words using a Kullback-Leibler (KL) divergence based objective function. The KL divergence based energy function provides a better distance metric which can effectively capture entailment and distribution similarity among the words. Due to the intractability of KL divergence for Gaussian mixture, we go for a KL approximation between Gaussian mixtures. We train on a Wikipedia based dataset and perform qualitative and quantitative experiments on benchmark word similarity and entailment datasets which demonstrate the effectiveness of the proposed approach.

References

[1]
Ben Athiwaratkun and Andrew Wilson. 2017. Multimodal Word Distributions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics : Long Papers-Volume 1. Association for Computational Linguistics, 1645–1656.
[2]
Isabelle Augenstein, Tim Rocktäschel, Andreas Vlachos, and Kalina Bontcheva. 2016. Stance detection with bidirectional conditional encoding. arXiv preprint arXiv:1606.05464(2016).
[3]
Marco Baroni, Raffaella Bernardi, Ngoc-Quynh Do, and Chung-chieh Shan. 2012. Entailment above the word level in distributional semantics. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 23–32.
[4]
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of machine learning research 3, Feb (2003), 1137–1155.
[5]
Elia Bruni, Nam-Khanh Tran, and Marco Baroni. 2014. Multimodal distributional semantics. Journal of Artificial Intelligence Research 49 (2014), 1–47.
[6]
Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, and Luke Zettlemoyer. 2018. Quac: Question answering in context. arXiv preprint arXiv:1808.07036(2018).
[7]
YL Cun, L Bottou, G Orr, and K Muller. 1998. Efficient backprop, neural networks: Tricks of the trade. Lecture notes in computer sciences 1524 (1998), 5–50.
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805(2018).
[9]
John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, Jul (2011), 2121–2159.
[10]
J-L Durrieu, J-Ph Thiran, and Finnian Kelly. 2012. Lower and upper bounds for approximation of the Kullback-Leibler divergence between Gaussian mixture models. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Ieee, 4833–4836.
[11]
Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2002. Placing search in context: The concept revisited. ACM Transactions on information systems 20, 1 (2002), 116–131.
[12]
Guy Halawi, Gideon Dror, Evgeniy Gabrilovich, and Yehuda Koren. 2012. Large-scale learning of word relatedness with constraints. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1406–1414.
[13]
Andreas Hanselowski, Hao Zhang, Zile Li, Daniil Sorokin, Benjamin Schiller, Claudia Schulz, and Iryna Gurevych. 2018. UKP-Athene: Multi-Sentence Textual Entailment for Claim Verification. arXiv preprint arXiv:1809.01479(2018).
[14]
John R Hershey and Peder A Olsen. 2007. Approximating the Kullback Leibler divergence between Gaussian mixture models. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, Vol. 4. IEEE, IV–317.
[15]
Felix Hill, Roi Reichart, and Anna Korhonen. 2015. Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics 41, 4 (2015), 665–695.
[16]
Eric H Huang, Richard Socher, Christopher D Manning, and Andrew Y Ng. 2012. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 873–882.
[17]
Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 133–142.
[18]
Lili Kotlerman, Ido Dagan, Idan Szpektor, and Maayan Zhitomirsky-Geffet. 2010. Directional distributional similarity for lexical inference. Natural Language Engineering 16, 4 (2010), 359–389.
[19]
Thang Luong, Richard Socher, and Christopher Manning. 2013. Better word representations with recursive neural networks for morphology. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning. 104–113.
[20]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781(2013).
[21]
Tomas Mikolov, Armand Joulin, Sumit Chopra, Michael Mathieu, and Marc’Aurelio Ranzato. 2014. Learning longer memory in recurrent neural networks. arXiv preprint arXiv:1412.7753(2014).
[22]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.
[23]
George A Miller and Walter G Charles. 1991. Contextual correlates of semantic similarity. Language and cognitive processes 6, 1 (1991), 1–28.
[24]
Arvind Neelakantan, Jeevan Shankar, Alexandre Passos, and Andrew McCallum. 2015. Efficient non-parametric estimation of multiple embeddings per word in vector space. arXiv preprint arXiv:1504.06654(2015).
[25]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543.
[26]
Kira Radinsky, Eugene Agichtein, Evgeniy Gabrilovich, and Shaul Markovitch. 2011. A word at a time: computing word relatedness using temporal semantic analysis. In Proceedings of the 20th international conference on World wide web. ACM, 337–346.
[27]
Herbert Rubenstein and John B Goodenough. 1965. Contextual correlates of synonymy. Commun. ACM 8, 10 (1965), 627–633.
[28]
Fei Tian, Hanjun Dai, Jiang Bian, Bin Gao, Rui Zhang, Enhong Chen, and Tie-Yan Liu. 2014. A probabilistic model for learning multi-prototype word embeddings. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 151–160.
[29]
Peter D Turney and Saif M Mohammad. 2015. Experiments with three approaches to recognizing lexical entailment. Natural Language Engineering 21, 3 (2015), 437–476.
[30]
Luke Vilnis and Andrew McCallum. 2014. Word representations via gaussian embedding. arXiv preprint arXiv:1412.6623(2014).
[31]
Dongqiang Yang and David Martin Powers. 2006. Verb similarity on the taxonomy of WordNet. Masaryk University.

Cited By

View all
  • (2023)A method for constructing word sense embeddings based on word sense inductionScientific Reports10.1038/s41598-023-40062-313:1Online publication date: 9-Aug-2023
  • (2021)Enhancing Accuracy of Semantic Relatedness Measurement by Word Single-Meaning EmbeddingsIEEE Access10.1109/ACCESS.2021.31074459(117424-117433)Online publication date: 2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
CODS-COMAD '21: Proceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD)
January 2021
453 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 January 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. KL Divergence
  2. Language Modelling
  3. Mixture of Gaussians
  4. Textual Entailment
  5. Word embedding distribution

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

CODS COMAD 2021
CODS COMAD 2021: 8th ACM IKDD CODS and 26th COMAD
January 2 - 4, 2021
Bangalore, India

Acceptance Rates

Overall Acceptance Rate 197 of 680 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A method for constructing word sense embeddings based on word sense inductionScientific Reports10.1038/s41598-023-40062-313:1Online publication date: 9-Aug-2023
  • (2021)Enhancing Accuracy of Semantic Relatedness Measurement by Word Single-Meaning EmbeddingsIEEE Access10.1109/ACCESS.2021.31074459(117424-117433)Online publication date: 2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media