Abstract
In this work, we address the out-of-vocabulary (OOV) problem in sequence labeling using only training data of the task. A typical solution in this field is to represent an OOV word using the mean-pooled representations of its surrounding words at test time. However, such a pipeline approach often suffers from the error propagation problem, since training of the supervised model is independent of the mean-pooling operation. In this work, we propose a novel training strategy to address the error propagation problem suffered by this solution. It designs to mimic the OOV situation in the process of model training and trains the supervised model to fit the OOV word representations generated by the mean-pooling operation. Extensive experiments on different sequence labeling tasks, including part-of-speech tagging (POS), named entity recognition (NER), and chunking verified the effectiveness of our proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Akbik, A., Blythe, D., Vollgraf, R.: Contextual string embeddings for sequence labeling. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 1638–1649 (2018)
Ataman, D., Federico, M.: Compositional representation of morphologically-rich input for neural machine translation. In: 56th Annual Meeting of the Association for Computational Linguistics, pp. 305–311 (2018)
Bachman, P., Alsharif, O., Precup, D.: Learning with pseudo-ensembles. In: Advances in Neural Information Processing Systems, pp. 3365–3373 (2014)
Barbu Mititelu, V., Ion, R., Simionescu, R., Irimia, E., Perez, C.: The Romanian treebank annotated according to universal dependencies. In: Proceedings of The Tenth International Conference on Natural Language Processing (HrTAL2016) (2016)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Bojarski, M., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)
Caruana, R., Lawrence, S., Giles, C.L.: Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping. In: Advances in Neural Information Processing Systems, pp. 402–408 (2001)
Caselli, T., et al.: When it’s all piling up: investigating error propagation in an NLP pipeline. In: WNACP@ NLDB (2015)
Chen, X., Qiu, X., Zhu, C., Liu, P., Huang, X.: Long short-term memory neural networks for Chinese word segmentation. In: EMNLP, pp. 1197–1206 (2015)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Herbelot, A., Baroni, M.: High-risk learning: acquiring new word vectors from tiny data. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 304–309 (2017)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Khodak, M., Saunshi, N., Liang, Y., Ma, T., Stewart, B., Arora, S.: A La Carte embedding: cheap but effective induction of semantic feature vectors. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 12–22 (2018)
Kinga, D., Adam, J.B.: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR), vol. 5 (2015)
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)
Ling, W., et al.: Finding function in form: compositional character models for open vocabulary word representation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1520–1530 (2015)
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1064–1074 (2016)
Madhyastha, P.S., Bansal, M., Gimpel, K., Livescu, K.: Mapping unseen words to task-trained embedding spaces. In: Proceedings of the 1st Workshop on Representation Learning for NLP, pp. 100–110 (2016)
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the Penn treebank. Comput. Linguist. 19(2), 313–330 (1993)
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Peters, M.E., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)
Pinter, Y., Guthrie, R., Eisenstein, J.: Mimicking word embeddings using subword RNNs. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 102–112 (2017)
Sang, E.F., Buchholz, S.: Introduction to the CONLL-2000 shared task: chunking. arXiv preprint cs/0009008 (2000)
Sang, E.F., De Meulder, F.: Introduction to the CONLL-2003 shared task: language-independent named entity recognition. arXiv preprint cs/0306050 (2003)
Sang, E.F.T.K.: Introduction to the CONLL-2002 shared task: language-independent named entity recognition. Computer Science, pp. 142–147 (2002)
Schick, T., Schütze, H.: Learning semantic representations for novel words: leveraging both form and context. arXiv preprint arXiv:1811.03866 (2018)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Zipf, G.K.: Human behavior and the principle of least effort (1949)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Xing, X., Peng, M., Zhang, Q., Liu, Q., Huang, X. (2020). Learning to Generate Representations for Novel Words: Mimic the OOV Situation in Training. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12430. Springer, Cham. https://doi.org/10.1007/978-3-030-60450-9_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-60450-9_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60449-3
Online ISBN: 978-3-030-60450-9
eBook Packages: Computer ScienceComputer Science (R0)