[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/2206329.2206341dlproceedingsArticle/Chapter ViewAbstractPublication PagesiwptConference Proceedingsconference-collections
research-article
Free access

One-step statistical parsing of hybrid dependency-constituency syntactic representations

Published: 05 October 2011 Publication History

Abstract

In this paper, we describe and compare two statistical parsing approaches for the hybrid dependency-constituency syntactic representation used in the Quranic Arabic Treebank (Dukes and Buckwalter, 2010). In our first approach, we apply a multi-step process in which we use a shift-reduce algorithm trained on a pure dependency preprocessed version of the treebank. After parsing, the dependency output is converted into the hybrid representation. This is compared to a novel one-step parser that is able to learn the hybrid representation without preprocessing. We define an extended labelled attachment score (ELAS) as our performance metric for hybrid parsing, and report 87.47% (F1 score) for the multi-step approach, and 89.03% (F1 score) for the one-step integrated algorithm. We also consider the effect of using different sets of morphological features for parsing the Quran, comparing our results to recent work on Modern Standard Arabic.

References

[1]
Kepa Bengoetxea and Koldo Gojenola. 2010. Application of Different Techniques to Dependency Parsing of Basque. In Proceedings of the NAACL/HLT Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2010), Los Angeles, California.
[2]
Rajesh Bhatt, Bhuvana Narasimhan, Martha Palmer, Owen Rambow, Dipti Misra Sharma and Fei Xia. 2009. Multi-Representational and Multi-Layered Treebank for Hindi/Urdu. In Proceedings of the Third Linguistic Annotation Workshop at the conference of the Association for Computational Linguistics (ACL-IJCNLP), Suntec, Singapore.
[3]
Daniel Bikel. 2004. On the Parameter Space of Lexicalized Statistical Parsing Models. PhD thesis, Department of Computer and Information Sciences. University of Pennsylvania.
[4]
Ezra Black et al. 1991. A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars. In Proceedings of the February 1991 DARPA Speech and Natural Language Workshop.
[5]
Georges Bohas, Jean-Patrick Guillaume, and Djamel Eddin Kouloughli. 1990. The Arabic linguistic tradition. Arabic Thought and Culture. Routledge.
[6]
Chih-Chung Chang and Chih-Jen Lin. 2001. LIBSVM: A Library for Support Vector Machines. Technical Report, Department of Computer Science and Information Engineering, National Taiwan University.
[7]
Eugene Charniak. 2000. A Maximum-entropy-inspired Parser. In Proceedings of the 1st Annual Meeting of the North American Chapter of the ACL (NAACL), Seattle.
[8]
Michael Collins. 1999. Head-driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania.
[9]
Kais Dukes, Eric Atwell and Nizar Habash. 2011. Supervised Collaboration for Syntactic Annotation of Quranic Arabic. To appear in Language Resources and Evaluation Journal (LREJ): Special Issue on Collaboratively Constructed Language Resources.
[10]
Kais Dukes, Eric Atwell and Abdul-Baquee Sharaf. 2010. Syntactic annotation guidelines for the Quranic Arabic Dependency Treebank. In Proceedings of the Language Resources and Evaluation Conference (LREC). Valletta, Malta.
[11]
Kais Dukes and Timothy Buckwalter. 2010. A Dependency Treebank of the Quran using Traditional Arabic Grammar. In Proceedings of the 7th international conference on Informatics and Systems (INFOS). Cairo, Egypt.
[12]
Kais Dukes and Nizar Habash. 2010. Morphological Annotation of Quranic Arabic. In Proceedings of the Language Resources and Evaluation Conference (LREC). Valletta, Malta.
[13]
Gülsen Eryiğit, Joakim Nivre and Kemal Oflazer. 2008. Dependency Parsing of Turkish. Computational Linguistics.
[14]
George Forman and Martin Scholz. 2009. Apples-to-apples in Cross-validation Studies: Pitfalls in Classifier Performance Measurement. HP technical Reports, HPL-2009-359.
[15]
Ryan Gabbard, Seth Kulick, and Mitchell Marcus. 2006. Fully parsing the Penn Treebank. In Proceedings of the Human Language Technology Conference of the NAACL, New York.
[16]
Yoav Goldberg and Michael Elhadad. 2010. Easy-First Dependency Parsing of Modern Hebrew. In Proceedings of the NAACL/HLT Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2010), Los Angeles, California.
[17]
Yoav Goldberg and Reut Tsarfaty. 2008. A Single Generative Model for Joint Morphological Segmentation and Syntactic Parsing. In Proceedings of ACL-HLT. Columbus, Ohio.
[18]
Nizar Habash and Ryan Roth. 2009. CATiB: The Columbia Arabic Treebank. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. Suntec, Singapore, August.
[19]
Nizar Habash. 2010. Introduction to Arabic Natural Language Processing. Morgan & Claypool Publishers.
[20]
Johan Hall, Jens Nilsson, Joakim Nivre, Gülsen Eryiğit, Beáta Megyesi, Mattias Nilsson, and Markus Saers. 2007. Single Malt or Blended? A Study in Multilingual Parser Optimization. In Proceedings of EMNLP-CoNLL.
[21]
Johan Hall and Joakim Nivre. 2008. A Dependency-driven Parser for German Dependency and Constituency Representations. In Proceedings of the ACL Workshop on Parsing German (PaGe08), Columbus, Ohio.
[22]
Johan Hall, Joakim Nivre and Jens Nilsson. 2007. A Hybrid Constituency-dependency Parser for Swedish. In Proceedings of NODALIDA, Tartu, Estonia.
[23]
Seth Kulick, Ryan Gabbard, and Mitchell Marcus. 2006. Parsing the Arabic Treebank: Analysis and Improvements. In Proceedings of Treebanks and Linguistic Theories Conference. Prague, Czech Republic.
[24]
Mohamed Maamouri, Ann Bies, Timothy Buckwalter, and Wigdan Mekki. 2004. The Penn Arabic Treebank: Building a Large-scale Annotated Arabic Corpus. In Proceedings of the NEMLAR Conference on Arabic Language Resources and Tools. Cairo, Egypt.
[25]
Mitchell Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a Large Annotated corpus of English: The Penn Treebank. Computational Linguistics.
[26]
Yuval Marton, Nizar Habash, and Owen Rambow. 2010. Improving Arabic Dependency Parsing with Lexical and Inflectional Morphological Features. In Proceedings of the NAACL/HLT Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2010), Los Angeles, California.
[27]
Ryan McDonald, Kevin Lerman, and Fernando Pereira. 2006. Multilingual Dependency Analysis with a Two-stage Discriminative Parser. In Proceedings of CoNLL. New York.
[28]
Joakim Nivre, Johan Hall, Jens Nilsson, Atanas Chanev, Gülşen Eryiğit, Sandra Kübler, Svetoslav Marinov, and Erwin Marsi. 2007a. MaltParser: A Language Independent System for Data-driven Dependency Parsing. Natural Language Engineering.
[29]
Joakim Nivre, Johan Hall, Sandra Kübler, Ryan McDonald, Jens Nilsson, Sebastian Riedel and Deniz Yuret. 2007b. The CoNLL 2007 Shared Task on Dependency Parsing. In Proceedings of EMNLP-CoNLL.
[30]
Jonathan Owens. 1988. The Foundations of Grammar: An Introduction to Medieval Arabic Grammatical Theory. Amsterdam Studies in the Theory and History of Linguistic Science. John Benjamins.
[31]
Bahjat Salih. 2007. al-i'rāb al-mufassal li-kitāb allāh al-murattal ('A Detailed Grammatical Analysis of the Recited Quran using i'rāb'). Dar Al-Fikr, Beirut.
[32]
Otakar Smrž, Viktor Bielický, Iveta Kourilová, Jakub Kráčmar, Jan Hajic, and Petr Zemánek. 2008. Prague Arabic Dependency Treebank: A Word on the Million Words. In Proceedings of the Workshop on Arabic and Local Languages (LREC 2008). Marrakech, Morocco.
[33]
Hiroyasu Yamada and Yuji Matsumoto. 2003. Statistical Dependency Analysis with Support Vector Machines. In Proceedings of International Conference on Parsing Technologies (IWPT 2003).

Cited By

View all
  • (2022)I3rab: A New Arabic Dependency Treebank Based on Arabic Grammatical TheoryACM Transactions on Asian and Low-Resource Language Information Processing10.1145/347229521:2(1-32)Online publication date: 31-Mar-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
IWPT '11: Proceedings of the 12th International Conference on Parsing Technologies
October 2011
264 pages
ISBN:9781932432046

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 05 October 2011

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)4
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)I3rab: A New Arabic Dependency Treebank Based on Arabic Grammatical TheoryACM Transactions on Asian and Low-Resource Language Information Processing10.1145/347229521:2(1-32)Online publication date: 31-Mar-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media