Abstract
In this paper we present an evaluation of available Part Of Speech (POS) tagsets designed for tagging Sanskrit and Indian languages which are developed in India. The tagsets evaluated are - JNU-Sanskrit tagset (JPOS), Sanskrit consortium tagset (CPOS), MSRI-Sanskrit tagset (IL-POST), IIIT Hyderabad tagset (ILMT POS) and CIIL Mysore tagset for the Linguistic Data Consortium for Indian Languages (LDCIL) project (LDCPOS). The main goal behind this enterprise is to check the suitability of existing tagsets for Sanskrit from various Natural Language Processing (NLP) points of view.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baskaran, S., Bali, K., Bhattacharya, T., Bhattacharyya, P., Choudhury, M., Jha, G.N., Rajendran, S., Saravanan, K., Sobha, L., Subbarao, K.V.S.: A Common Parts-of-Speech Tagset Framework for Indian Languages. In: LREC, Marrakech, Morocco (2008)
Baskaran, S., et al.: Framework for a Common Parts-of-Speech Tagset for IndicLanguages (2007), http://research.microsoft.com/~baskaran/POSTagset
Cardona, G.: Pāṇini: His work and its traditions. Motilal Banarasidass, Delhi (1988)
Chandrashekar, R.: Parts-of-Speech Tagging For Sanskrit. Ph.D. thesis submitted to JNU, New Delhi (2007)
Greene, B.B., Rubin, G.M.: Automatic grammatical tagging of English. Department of Linguistics, Brown University, Providence, R.I. (1981)
Hardie, A.: The Computational Analysis of Morphosyntactic Categories in Urdu. PhD Thesis submitted to Lancaster University (2004)
Hellwig, O.: SANSKRITTAGGER, A Stochastic Lexical and POS Tagger for Sanskrit. In: Huet, G., Kulkarni, A. (eds.) Sanskrit Computational Linguistics 2007. LNCS (LNAI), vol. 5402. Springer, Heidelberg (2009)
Huet, G.: The Sanskrit Heritage Site, http://sanskrit.inria.fr/
IIIT-Tagset. A Parts-of-Speech tagset for Indian Languages, http://shiva.iiit.ac.in/SPSAL2007/iiit_tagset_guidelines.pdf
Jha, G.N.: Generating nominal inflectional morphology in Sanskrit. In: SIMPLE 2004, IIT-Kharagpur Lecture Compendium, Shyama Printing Works, Kharagpur (2004)
Jha, G.N., Gopal, M., Mishra, D.: Annotating Sanskrit Corpus: adapting IL-POSTS. In: Vetulani, Z. (ed.) Proceedings of the 4th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, pp. 467–471 (2009)
Jha, G.N., Mishra, S.: Semantic processing in Panini’s karaka system. In: Huet, G., Kulkarni, A., Scharf, P. (eds.) Sanskrit Computational Linguistics 2007/2008. LNCS (LNAI), vol. 5402. Springer, Heidelberg (2009)
Kale, M.R.: A Higher Sanskrit Grammar. MLBD Publishers, New Delhi (1995)
Leech, G., Wilson, A.: Recommendations for the Morphosyntactic Annotation of Corpora. EAGLES Report EAG-TCWG-MAC/R (1996)
Leech, G., Wilson, A.: Standards for Tag-sets. In: van Halteren, H. (ed.) Syntactic Word class Tagging. Kluwer Academic, Dordrecht (1999)
Leech, G.: Grammatical Tagging. In: Garsire, Leech, McEnery (eds.) Corpus Annotation: Linguistic Information for Computer Text Corpora. Longman, London (1997)
Mishra, S., Jha, G.N.: Identifying verb inflections in Sanskrit morphology. In: Proceedings of SIMPLE 2004, IIT Kharagpur (2005)
Ramkrishnamacharyulu, K.V.: Annotating Sanskrit Texts Based on Sabdabodha Systems. In: Kulkarni, A., Huet, G. (eds.) Sanskrit Computational Linguistics. LNCS (LNAI), vol. 5406, pp. 26–39. Springer, Heidelberg (2009)
Rishi, U.S.S. (ed.): Yaska-pranitam niruktam, vol. I. Chowkhamba Vidyabhawan, Varanasi (2005)
Santorini, B.: Part-of-speech tagging guidelines for the Penn Treebank Project. Technical report MS-CIS-90-47, Dept. Of Computer and Information Science, University of Pennsylvania (1990)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gopal, M., Mishra, D., Singh, D.P. (2010). Evaluating Tagsets for Sanskrit. In: Jha, G.N. (eds) Sanskrit Computational Linguistics. ISCLS 2010. Lecture Notes in Computer Science(), vol 6465. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17528-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-17528-2_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17527-5
Online ISBN: 978-3-642-17528-2
eBook Packages: Computer ScienceComputer Science (R0)