Unsupervised Active Learning of CRF Model for Cross-Lingual Named Entity Recognition

Mohamed Farouk Abdel Hady²²,
Abubakrelsedik Karali²²,
Eslam Kamal²² &
…
Rania Ibrahim²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8774))

Included in the following conference series:

IAPR Workshop on Artificial Neural Networks in Pattern Recognition

2414 Accesses
1 Citations

Abstract

Manual annotation of the training data of information extraction models is a time consuming and expensive process but necessary for the building of information extraction systems. Active learning has been proven to be effective in reducing manual annotation efforts for supervised learning tasks where a human judge is asked to annotate the most informative examples with respect to a given model. However, in most cases reliable human judges are not available for all languages. In this paper, we propose a cross-lingual unsupervised active learning paradigm (XLADA) that generates high-quality automatically annotated training data from a word-aligned parallel corpus. To evaluate our paradigm, we applied XLADA on English-French and English-Chinese bilingual corpora then we trained French and Chinese information extraction models. The experimental results show that XLADA can produce effective models without manually-annotated training data.

Download to read the full chapter text

Chapter PDF

LTP: A New Active Learning Strategy for CRF-Based Named Entity Recognition

Article 12 January 2022

Enhancing Cross-Lingual Named Entity Recognition via Dual Contrastive Learning Based on MRC Framework

Adversarial Adaptation for French Named Entity Recognition

Keywords

References

McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of CoNLL (2003)
Google Scholar
Esuli, A., Marcheggiani, D., Sebastiani, F.: Sentence-based active learning strategies for information extraction. In: Proceedings of the 2nd Italian Information Retrieval Workshop (IIR 2010), pp. 41–45 (2010)
Google Scholar
Jones, R., Ghani, R., Mitchell, T., Rilo, E.: Active learning for information extraction with multiple view. In: Proceedings of the European Conference in Machine Learning (ECML 2003), vol. 77, pp. 257–286 (2003)
Google Scholar
Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing multilingual text analysis tools via robust projection across aligned corpora. In: Human Language Technology Conference, pp. 109–116 (2001)
Google Scholar
Kim, S., Toutanova, K., Yu, H.: Multilingual named entity recognition using parallel data and metadata from Wikipedia. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (2012)
Google Scholar
Fu, R., Qin, B., Liu, T.: Generating chinese named entity data from a parallel corpus. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, pp. 264–272 (2011)
Google Scholar
Muslea, I., Minton, S., Knoblock, C.A.: Active learning with multiple views. Journal of Artificial Intelligence Research 27, 203–233 (2006)
MathSciNet MATH Google Scholar
Li, Q., Li, H., Ji, H.: Joint bilingual name tagging for parallel corpora. In: Proceedings of CIKM 2012 (2012)
Google Scholar
He, X.: Using word-dependent transition models in HMM based word alignment for statistical machine translation. In: Proceedings of the Second Workshop on SMT (WMT). Association for Computational Linguistics (2007)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 282–289 (2001)
Google Scholar
Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 257–286 (1989)
Article Google Scholar
Brown, P.F., de Souza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Computational Linguistics 18(4) (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, Cairo, Egypt
Mohamed Farouk Abdel Hady, Abubakrelsedik Karali, Eslam Kamal & Rania Ibrahim

Authors

Mohamed Farouk Abdel Hady
View author publications
You can also search for this author in PubMed Google Scholar
Abubakrelsedik Karali
View author publications
You can also search for this author in PubMed Google Scholar
Eslam Kamal
View author publications
You can also search for this author in PubMed Google Scholar
Rania Ibrahim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Computers and Information, Orman, Cairo University, Giza, Egypt
Neamat El Gayar
Institute for Neural Information Processing, University of Ulm, 89069, Ulm, Germany
Friedhelm Schwenker
Department of Computer Science and Software Engineering, Concordia University, H3G 1M8, Monral, QC, Canada
Cheng Suen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abdel Hady, M.F., Karali, A., Kamal, E., Ibrahim, R. (2014). Unsupervised Active Learning of CRF Model for Cross-Lingual Named Entity Recognition. In: El Gayar, N., Schwenker, F., Suen, C. (eds) Artificial Neural Networks in Pattern Recognition. ANNPR 2014. Lecture Notes in Computer Science(), vol 8774. Springer, Cham. https://doi.org/10.1007/978-3-319-11656-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-11656-3_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11655-6
Online ISBN: 978-3-319-11656-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Unsupervised Active Learning of CRF Model for Cross-Lingual Named Entity Recognition

Abstract

Chapter PDF

Similar content being viewed by others

LTP: A New Active Learning Strategy for CRF-Based Named Entity Recognition

Enhancing Cross-Lingual Named Entity Recognition via Dual Contrastive Learning Based on MRC Framework

Adversarial Adaptation for French Named Entity Recognition

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Unsupervised Active Learning of CRF Model for Cross-Lingual Named Entity Recognition

Abstract

Chapter PDF

Similar content being viewed by others

LTP: A New Active Learning Strategy for CRF-Based Named Entity Recognition

Enhancing Cross-Lingual Named Entity Recognition via Dual Contrastive Learning Based on MRC Framework

Adversarial Adaptation for French Named Entity Recognition

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation