[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Predicting Morphologically-Complex Unknown Words in Igbo

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9924))

Included in the following conference series:

Abstract

The effective handling of previously unseen words is an important factor in the performance of part-of-speech taggers. Some trainable POS taggers use suffix (sometimes prefix) strings as cues in handling unknown words (in effect serving as a proxy for actual linguistic affixes). In the context of creating a tagger for the African language Igbo, we compare the performance of some existing taggers, implementing such an approach, to a novel method for handling morphologically complex unknown words, based on morphological reconstruction (i.e. a linguistically-informed segmentation into root and affixes). The novel method outperforms these other systems by several percentage points, achieving accuracies of around 92 % on morphologically-complex unknown words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Obtained from jw.org.

  2. 2.

    “Mmadụ Ka A Na-Arịa” written in 2013.

  3. 3.

    Here, “rV” means letter r and any vowel (a,e,u,o,i,ị,ọ,ụ) attached to a word in Igbo like “bịara” came, “kọrọ” told, “riri” ate, “nwuru” shone, etc. It is a past tense marker if attached to active verb or indicate stative/passive meaning if attached to a stative verb [3]. Therefore, it is an important cue in predicting past tense verbs or verbs having applicative meaning “APP”.

References

  1. Brants, T.: TnT: a statistical part-of-speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 224–231 (2000)

    Google Scholar 

  2. Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21, 543–565 (1995). MIT Press, Cambridge

    Google Scholar 

  3. Emenanjo, N.E.: Elements of Modern Igbo Grammar: A Descriptive Approach. Oxford University Press, Ibadan (1978)

    Google Scholar 

  4. Halácsy, P., Kornai, A., Oravecz, C.: HunPos: an open source trigram tagger. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 209–212 (2007)

    Google Scholar 

  5. Kupiec, J.: Robust part-of-speech tagging using a hidden Markov model. J. Comput. Speech Lang. 6(3), 225–242 (1992)

    Article  Google Scholar 

  6. Ngai, G., Florian, R.: Transformation-based learning in the fast lane. In: Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies, pp. 1–8 (2001)

    Google Scholar 

  7. Onyenwe, I.E., Uchechukwu, C., Hepple, M.: Part-of-speech tagset and corpus development for Igbo, an African. In: Proceedings of LAW VIII-8th Linguistic Annotation, Workshop 2014 in conjuction with COLING 2014, Dublin, Ireland 23–24 August 2014, pp. 93–98. Association for Computational Linguistics (2014)

    Google Scholar 

  8. Onyenwe, I.E., Hepple, M., Uchechukwu, C., Ezeani, I.: Use of transformation-based learning in annotation pipeline of Igbo, an African language. In: Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects, p. 24 (2015)

    Google Scholar 

  9. Ratnaparkhi, A., et al.: A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 133–142 (1996)

    Google Scholar 

  10. Samuelsson, C.: Morphological tagging based entirely on Bayesian inference. In: 9th Nordic Conference on Computational Linguistics (2013)

    Google Scholar 

  11. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 173–180 (2003)

    Google Scholar 

Download references

Acknowledgments

We acknowledge the financial support of Tertiary Education Trust Fund Nigeria and Nnamdi Azikiwe University (NAU) Nigeria. Many thanks to Dr. Uchechukwu Chinedu of linguistic department, NAU for his very helpful discussion.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ikechukwu E. Onyenwe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Onyenwe, I.E., Hepple, M. (2016). Predicting Morphologically-Complex Unknown Words in Igbo. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2016. Lecture Notes in Computer Science(), vol 9924. Springer, Cham. https://doi.org/10.1007/978-3-319-45510-5_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45510-5_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45509-9

  • Online ISBN: 978-3-319-45510-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics