Contextual Generation of Word Embeddings for Out of Vocabulary Words in Downstream Tasks

Nicolas Garneau¹⁶,
Jean-Samuel Leboeuf¹⁶ &
Luc Lamontagne¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11489))

Included in the following conference series:

Canadian Conference on Artificial Intelligence

2782 Accesses
4 Citations

Abstract

Over the past few years, the use of pre-trained word embeddings to solve natural language processing tasks has considerably improved performances on every end. However, even though these embeddings are trained on gigantic corpora, the vocabulary is fixed and thus numerous out of vocabulary words appear in specific downstream tasks. Recent studies proposed models able to generate embeddings for out of vocabulary words given its morphology and its context. These models assume that we have sufficient textual data in hand to train them. In contrast, we specifically tackle the case where such data is not available anymore and we rely only on pre-trained embeddings. As a solution, we introduce a model that predicts meaningful embeddings from the spelling of a word as well as from the context in which it appears for a downstream task without the need of pre-training on a given corpus. We thoroughly test our model on a joint tagging task on three different languages. Results show that our model helps consistently on all languages, outperforms other ways of handling out of vocabulary words and can be integrated into any neural model to predict out of vocabulary words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 55.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 69.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A deep connection to Khasi language through pre-trained embedding

Article 05 December 2022

WordNet Expansion with Bilingual Word Embeddings and Neural Machine Translation

Sidecar: Augmenting Word Embedding Models with Expert Knowledge

Notes

1.
https://dumps.wikimedia.org.

References

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Al-Rfou, R., Perozzi, B., Skiena, S.: Polyglot: distributed word representations for multilingual NLP. arXiv preprint arXiv:1307.1662 (2013)
Pinter, Y., Guthrie, R., Eisenstein, J.: Mimicking word embeddings using subword RNNs. arXiv preprint arXiv:1707.06961 (2017)
Goldberg, Y.: Neural network methods for natural language processing, vol. 10, pp. 83–97 (2017)
Google Scholar
Kiperwasser, E., Goldberg, Y.: Simple and accurate dependency parsing using bidirectional LSTM feature representations. arXiv preprint arXiv:1603.04351 (2016)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)
Ling, W., et al.: Finding function in form: compositional character models for open vocabulary word representation. arXiv preprint arXiv:1508.02096 (2015)
De Marneffe, M.C., et al.: Universal Stanford dependencies: a cross-linguistic typology. In: LREC, vol. 14, pp. 4585–4592 (2014)
Google Scholar
Zhao, J., Mudgal, S., Liang, Y.: Generalizing word embeddings using bag of subwords. In: EMNLP (2018)
Google Scholar
Peters, M.E., et al.: Deep contextualized word representations. In: NAACL-HLT (2018)
Google Scholar
Garneau, N., Leboeuf, J.S., Lamontagne, L.: Predicting and interpreting embeddings for out of vocabulary words in downstream tasks. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 331–333 (2018)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Département d’informatique et de génie logiciel, Université Laval, Quebec City, QC, Canada
Nicolas Garneau, Jean-Samuel Leboeuf & Luc Lamontagne

Authors

Nicolas Garneau
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Samuel Leboeuf
View author publications
You can also search for this author in PubMed Google Scholar
Luc Lamontagne
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicolas Garneau .

Editor information

Editors and Affiliations

University of Quebec in Montreal, Montreal, QC, Canada
Marie-Jean Meurs
University of Toronto, Toronto, ON, Canada
Frank Rudzicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garneau, N., Leboeuf, JS., Lamontagne, L. (2019). Contextual Generation of Word Embeddings for Out of Vocabulary Words in Downstream Tasks. In: Meurs, MJ., Rudzicz, F. (eds) Advances in Artificial Intelligence. Canadian AI 2019. Lecture Notes in Computer Science(), vol 11489. Springer, Cham. https://doi.org/10.1007/978-3-030-18305-9_60

Download citation

DOI: https://doi.org/10.1007/978-3-030-18305-9_60
Published: 24 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18304-2
Online ISBN: 978-3-030-18305-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Contextual Generation of Word Embeddings for Out of Vocabulary Words in Downstream Tasks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A deep connection to Khasi language through pre-trained embedding

WordNet Expansion with Bilingual Word Embeddings and Neural Machine Translation

Sidecar: Augmenting Word Embedding Models with Expert Knowledge

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Contextual Generation of Word Embeddings for Out of Vocabulary Words in Downstream Tasks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A deep connection to Khasi language through pre-trained embedding

WordNet Expansion with Bilingual Word Embeddings and Neural Machine Translation

Sidecar: Augmenting Word Embedding Models with Expert Knowledge

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation