Abstract
The effective handling of previously unseen words is an important factor in the performance of part-of-speech taggers. Some trainable POS taggers use suffix (sometimes prefix) strings as cues in handling unknown words (in effect serving as a proxy for actual linguistic affixes). In the context of creating a tagger for the African language Igbo, we compare the performance of some existing taggers, implementing such an approach, to a novel method for handling morphologically complex unknown words, based on morphological reconstruction (i.e. a linguistically-informed segmentation into root and affixes). The novel method outperforms these other systems by several percentage points, achieving accuracies of around 92 % on morphologically-complex unknown words.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Obtained from jw.org.
- 2.
“Mmadụ Ka A Na-Arịa” written in 2013.
- 3.
Here, “rV” means letter r and any vowel (a,e,u,o,i,ị,ọ,ụ) attached to a word in Igbo like “bịara” came, “kọrọ” told, “riri” ate, “nwuru” shone, etc. It is a past tense marker if attached to active verb or indicate stative/passive meaning if attached to a stative verb [3]. Therefore, it is an important cue in predicting past tense verbs or verbs having applicative meaning “APP”.
References
Brants, T.: TnT: a statistical part-of-speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 224–231 (2000)
Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21, 543–565 (1995). MIT Press, Cambridge
Emenanjo, N.E.: Elements of Modern Igbo Grammar: A Descriptive Approach. Oxford University Press, Ibadan (1978)
Halácsy, P., Kornai, A., Oravecz, C.: HunPos: an open source trigram tagger. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 209–212 (2007)
Kupiec, J.: Robust part-of-speech tagging using a hidden Markov model. J. Comput. Speech Lang. 6(3), 225–242 (1992)
Ngai, G., Florian, R.: Transformation-based learning in the fast lane. In: Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies, pp. 1–8 (2001)
Onyenwe, I.E., Uchechukwu, C., Hepple, M.: Part-of-speech tagset and corpus development for Igbo, an African. In: Proceedings of LAW VIII-8th Linguistic Annotation, Workshop 2014 in conjuction with COLING 2014, Dublin, Ireland 23–24 August 2014, pp. 93–98. Association for Computational Linguistics (2014)
Onyenwe, I.E., Hepple, M., Uchechukwu, C., Ezeani, I.: Use of transformation-based learning in annotation pipeline of Igbo, an African language. In: Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects, p. 24 (2015)
Ratnaparkhi, A., et al.: A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 133–142 (1996)
Samuelsson, C.: Morphological tagging based entirely on Bayesian inference. In: 9th Nordic Conference on Computational Linguistics (2013)
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 173–180 (2003)
Acknowledgments
We acknowledge the financial support of Tertiary Education Trust Fund Nigeria and Nnamdi Azikiwe University (NAU) Nigeria. Many thanks to Dr. Uchechukwu Chinedu of linguistic department, NAU for his very helpful discussion.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Onyenwe, I.E., Hepple, M. (2016). Predicting Morphologically-Complex Unknown Words in Igbo. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2016. Lecture Notes in Computer Science(), vol 9924. Springer, Cham. https://doi.org/10.1007/978-3-319-45510-5_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-45510-5_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45509-9
Online ISBN: 978-3-319-45510-5
eBook Packages: Computer ScienceComputer Science (R0)