Abstract
This article describes a heuristic approach to zero subject detection in Polish. It focuses on the zero subject detection as a crucial step in end-to-end coreference resolution. The zero subject verbs are recognized using a set of manually created rules utilizing information from different sources, including: a dependency parser, a shallow relational parser and a valence dictionary. The rules were developed and evaluated on the Polish Coreference Corpus. The experimental results show that the presented method significantly outperforms the only machine learning-based alternative for Polish, i.e., MentionDetector. We also discuss and evaluate the importance of zero subject detection for existing coreference resolution tools for Polish.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Broda, B., Marcińczuk, M., Maziarz, M., Radziszewski, A., Wardyński, A.: KPWr: towards a free corpus of Polish. In: Calzolari, N., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of LREC 2012, Istanbul, Turkey. ELRA (2012)
Chomsky, N.: Lectures on government and binding. In: The Pisa Lectures. Foris Publications, Holland (1981)
Russo, L., Loáiciga, S., Gulati, A.: Improving machine translation of null subjects in italian and spanish. In: Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, pp. 81–89. Association for Computational Linguistics, April 2012
Rello, L., Ferraro, G., Gayo, I.: A first approach to the automatic detection of zero subjects and impersonal constructions in portuguese. Procesamiento del Lenguaje Natural 49, 163–170 (2012)
Mihăilă, C., Ilisei, I., Inkpen, D.: Zero pronominal anaphora resolution for the romanian language
Kopeć, M.: Zero subject detection for Polish. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Short Papers, Gothenburg, Sweden, vol. 2, pp. 221–225. Association for Computational Linguistics (2014)
Ogrodniczuk, M., Głowińska, K., Kopeć, M., Savary, A., Zawisławska, M.: Polish coreference corpus. In: Vetulani, Z. (ed.) Proceedings of the 6th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, Poznań, Poland, Wydawnictwo Poznańskie, Fundacja Uniwersytetu im, pp. 494–498. Adama Mickiewicza (2013)
Radziszewski, A.: A tiered CRF tagger for Polish. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Kryszkiewicz, M., Niezgódka, M. (eds.) Intell. Tools for Building a Scientific Information. SCI, vol. 467, pp. 215–230. Springer, Heidelberg (2013)
Przepiórkowski, A., Hajnicz, E., Patejuk, A., Woliński, M., Skwarski, F., Świdziński, M.: Walenty: towards a comprehensive valence dictionary of polish. In: Chair, N.C.C., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), Reykjavik, Iceland. European Language Resources Association (ELRA), May 2014
Nivre, J., Hall, J., Nilsson, J.: Maltparser: a data-driven parser-generator for dependency parsing. In: Proc. of LREC-2006, pp. 2216–2219 (2006)
Wróblewska, A.: Polish dependency bank. Linguistic Issues in Language Technology 7(1) (2012)
Radziszewski, A., Orłowicz, P., Broda, B.: Classification of predicate-argument relations in Polish data. In: Kłopotek, M.A., Koronacki, J., Marciniak, M., Mykowiecka, A., Wierzchoń, S.T. (eds.) IIS 2013. LNCS, vol. 7912, pp. 28–38. Springer, Heidelberg (2013)
Ogrodniczuk, M., Kopeć, M.: Rule-based coreference resolution module for Polish. In: Proceedings of the 8th Discourse Anaphora and Anaphor Resolution Colloquium (DAARC 2011), Faro, Portugal, pp. 191–200 (2011)
Kopeć, M., Ogrodniczuk, M.: Creating a coreference resolution system for Polish. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, Istanbul, Turkey, pp. 192–195. ELRA (2012)
Broda, B., Burdka, L., Maziarz, M.: IKAR: an improved kit for anaphora resolution for Polish. In: Proceedings of COLING 2012: Demonstration Papers, Mumbai, India, pp. 25–32. The COLING 2012 Organizing Committee, December 2012
Marcińczuk, M., Kocoń, J., Janicki, M.: Liner2 — a customizable framework for proper names recognition for Polish. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Kryszkiewicz, M., Niezgódka, M. (eds.) Intell. Tools for Building a Scientific Information. SCI, vol. 467, pp. 231–254. Springer, Heidelberg (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Kaczmarek, A., Marcińczuk, M. (2015). Heuristic Algorithm for Zero Subject Detection in Polish. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-24033-6_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)