Learning to Generate Representations for Novel Words: Mimic the OOV Situation in Training

Xiaoyu Xing¹²,
Minlong Peng¹²,
Qi Zhang¹²,
Qin Liu¹² &
…
Xuanjing Huang¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12430))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

3311 Accesses

Abstract

In this work, we address the out-of-vocabulary (OOV) problem in sequence labeling using only training data of the task. A typical solution in this field is to represent an OOV word using the mean-pooled representations of its surrounding words at test time. However, such a pipeline approach often suffers from the error propagation problem, since training of the supervised model is independent of the mean-pooling operation. In this work, we propose a novel training strategy to address the error propagation problem suffered by this solution. It designs to mimic the OOV situation in the process of model training and trains the supervised model to fit the OOV word representations generated by the mean-pooling operation. Extensive experiments on different sequence labeling tasks, including part-of-speech tagging (POS), named entity recognition (NER), and chunking verified the effectiveness of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An End-to-End Scalable Iterative Sequence Tagging with Multi-Task Learning

TOKEN Is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models

CLINER: exploring task-relevant features and label semantic for few-shot named entity recognition

Article 16 December 2023

Notes

1.
https://universaldependencies.org/.

References

Akbik, A., Blythe, D., Vollgraf, R.: Contextual string embeddings for sequence labeling. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 1638–1649 (2018)
Google Scholar
Ataman, D., Federico, M.: Compositional representation of morphologically-rich input for neural machine translation. In: 56th Annual Meeting of the Association for Computational Linguistics, pp. 305–311 (2018)
Google Scholar
Bachman, P., Alsharif, O., Precup, D.: Learning with pseudo-ensembles. In: Advances in Neural Information Processing Systems, pp. 3365–3373 (2014)
Google Scholar
Barbu Mititelu, V., Ion, R., Simionescu, R., Irimia, E., Perez, C.: The Romanian treebank annotated according to universal dependencies. In: Proceedings of The Tenth International Conference on Natural Language Processing (HrTAL2016) (2016)
Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
Bojarski, M., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)
Caruana, R., Lawrence, S., Giles, C.L.: Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping. In: Advances in Neural Information Processing Systems, pp. 402–408 (2001)
Google Scholar
Caselli, T., et al.: When it’s all piling up: investigating error propagation in an NLP pipeline. In: WNACP@ NLDB (2015)
Google Scholar
Chen, X., Qiu, X., Zhu, C., Liu, P., Huang, X.: Long short-term memory neural networks for Chinese word segmentation. In: EMNLP, pp. 1197–1206 (2015)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Herbelot, A., Baroni, M.: High-risk learning: acquiring new word vectors from tiny data. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 304–309 (2017)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Khodak, M., Saunshi, N., Liang, Y., Ma, T., Stewart, B., Arora, S.: A La Carte embedding: cheap but effective induction of semantic feature vectors. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 12–22 (2018)
Google Scholar
Kinga, D., Adam, J.B.: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR), vol. 5 (2015)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)
Google Scholar
Ling, W., et al.: Finding function in form: compositional character models for open vocabulary word representation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1520–1530 (2015)
Google Scholar
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1064–1074 (2016)
Google Scholar
Madhyastha, P.S., Bansal, M., Gimpel, K., Livescu, K.: Mapping unseen words to task-trained embedding spaces. In: Proceedings of the 1st Workshop on Representation Learning for NLP, pp. 100–110 (2016)
Google Scholar
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the Penn treebank. Comput. Linguist. 19(2), 313–330 (1993)
Google Scholar
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013)
Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Peters, M.E., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)
Pinter, Y., Guthrie, R., Eisenstein, J.: Mimicking word embeddings using subword RNNs. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 102–112 (2017)
Google Scholar
Sang, E.F., Buchholz, S.: Introduction to the CONLL-2000 shared task: chunking. arXiv preprint cs/0009008 (2000)
Google Scholar
Sang, E.F., De Meulder, F.: Introduction to the CONLL-2003 shared task: language-independent named entity recognition. arXiv preprint cs/0306050 (2003)
Google Scholar
Sang, E.F.T.K.: Introduction to the CONLL-2002 shared task: language-independent named entity recognition. Computer Science, pp. 142–147 (2002)
Google Scholar
Schick, T., Schütze, H.: Learning semantic representations for novel words: leveraging both form and context. arXiv preprint arXiv:1811.03866 (2018)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Zipf, G.K.: Human behavior and the principle of least effort (1949)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Fudan University, Shanghai, China
Xiaoyu Xing, Minlong Peng, Qi Zhang, Qin Liu & Xuanjing Huang

Authors

Xiaoyu Xing
View author publications
You can also search for this author in PubMed Google Scholar
Minlong Peng
View author publications
You can also search for this author in PubMed Google Scholar
Qi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xuanjing Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuanjing Huang .

Editor information

Editors and Affiliations

ECE & Ingenuity Labs Research Institute, Queen’s University, Kingston, ON, Canada
Xiaodan Zhu
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Min Zhang
School of Computer Science and Technology, Soochow University, Suzhou, China
Yu Hong
College of Intelligence and Computing, Tianjin University, Tianjin, China
Ruifang He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xing, X., Peng, M., Zhang, Q., Liu, Q., Huang, X. (2020). Learning to Generate Representations for Novel Words: Mimic the OOV Situation in Training. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12430. Springer, Cham. https://doi.org/10.1007/978-3-030-60450-9_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-60450-9_26
Published: 02 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60449-3
Online ISBN: 978-3-030-60450-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)