Abstract
Verbal Multi-Word Expressions (VMWEs) are very common in many languages. They include among other types the following types: Verb-Particle Constructions (VPC) (e.g. get around), Light-Verb Constructions (LVC) (e.g. make a decision), and idioms (ID) (e.g. break a leg). In this paper, we present a new dataset for supervised learning of VMWEs written in Yiddish. The dataset was manually collected and annotated from a web resource. It contains a set of positive examples for VMWEs and a set of non-VMWEs examples. While the dataset can be used for training supervised algorithms, the positive examples can be used as seeds in unsupervised bootstrapping algorithms. Moreover, we analyze the lexical properties of VMWEs written in Yiddish by classifying them to six categories: VPC, LVC, ID, Inherently Pronominal Verb (IPronV), Inherently Prepositional Verb (IPrepV), and other (OTH). The analysis suggests some interesting features of VMWEs for exploration. This dataset is a first step towards automatic identification of VMWEs written in Yiddish, which is important for natural language understanding, generation and translation systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Ashkenaz is the medieval Hebrew name for northern Europe and Germany.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
To facilitate readability, we use a transliteration of Hebrew using Roman characters; the letters used, in Hebrew lexicographic order, are abgdhwzxTiklmns`pcqršt.
- 10.
- 11.
- 12.
References
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45715-1_1
Biber, D., Johansson, S., Leech, G., Conrad, S., Finegan, E., Quirk, R.: Longman Grammar of Spoken and Written English. MIT Press, Cambridge (1999)
Fazly, A., Stevenson, S.: Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In: Proceedings of the Workshop on a Broader Perspective on Multiword Expressions, pp. 9–16. Association for Computational Linguistics (2007)
Jacobs, N.G.: Yiddish: A Linguistic Introduction. Cambridge University Press, Cambridge (2005)
Baumgarten, J.: Introduction to Old Yiddish Literature. Oxford University Press, Oxford (2005)
Santorini, B.: The Penn Yiddish Corpus. University of Pennsylvania (1997)
Aptroot, M., Hansen, B.: Yiddish Language Structures. vol. 52, Walter de Gruyter, Berlin (2014)
Dias, G., Guilloré, S., Lopes, J.G.P.: Language independent automatic acquisition of rigid multiword units from unrestricted text corpora. In: Proceedings of Conférence Traitement Automatique des Langues Naturelles (TALN) (1999)
Deane, P.: A nonparametric method for extraction of candidate phrasal terms. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 605–613. Association for Computational Linguistics (2005)
Pecina, P., Schlesinger, P.: Combining association measures for collocation extraction. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 651–658. Association for Computational Linguistics (2006)
Bejcek, E., Stranák, P., Pecina, P.: Syntactic identification of occurrences of multiword expressions in text using a lexicon with dependency structures. In: MWE@ NAACL-HLT, pp. 106–115 (2013)
Green, S., de Marneffe, M.-C., Manning, C.D.: Parsing models for identifying multiword expressions. Comput. Linguist. 39, 195–227 (2013)
Al-Haj, H., Itai, A., Wintner, S.: Lexical representation of multiword expressions in morphologically-complex languages. Int. J. Lexicogr. 27, 130–170 (2013)
Baldwin, T.: Deep lexical acquisition of verb–particle constructions. Comput. Speech Lang. 19, 398–414 (2005)
Zhang, Y., Kordoni, V., Villavicencio, A., Idiart, M.: Automated multiword expression prediction for grammar engineering. In: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, pp. 36–44. Association for Computational Linguistics (2006)
Fazly, A.: Automatic acquisition of lexical knowledge about multiword predicates. University of Toronto (2007)
Boulaknadel, S., Daille, B., Aboutajdine, D.: A multi-word term extraction program for Arabic language. In: LREC (2008)
Ramisch, C., de Medeiros Caseli, H., Villavicencio, A., Machado, A., Finatto, M.J.: A hybrid approach for multiword expression identification. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS (LNAI), vol. 6001, pp. 65–74. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12320-7_9
Farahmand, M., Nivre, J.: Modeling the statistical idiosyncrasy of multiword expressions. In: MWE@ NAACL-HLT, pp. 34–38 (2015)
Sangati, F., van Cranenburgh, A.: Multiword expression identification with recurring tree fragments and association measures. In: MWE@ NAACL-HLT, pp. 10–18 (2015)
Mandravickaite, J., Krilavičius, T.: Identification of multiword expressions for Latvian and Lithuanian: hybrid approach. In: Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp. 97–101 (2017)
Lapata, M., Lascarides, A.: Detecting novel compounds: the role of distributional evidence. In: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics, vol. 1, pp. 235–242. Association for Computational Linguistics, Stroudsburg (2003)
Pecina, P.: Lexical association measures and collocation extraction. Lang. Resour. Eval. 44, 137–158 (2010)
Ramisch, C., Schreiner, P., Idiart, M., Villavicencio, A.: An evaluation of methods for the extraction of multiword expressions. In: Proceedings of the LREC Workshop-Towards a Shared Task for Multiword Expressions (MWE 2008), pp. 50–53 (2008)
Ramisch, C., Villavicencio, A., Moura, L., Idiart, M.: Picking them up and figuring them out: verb-particle constructions, noise and idiomaticity. In: Proceedings of the Twelfth Conference on Computational Natural Language Learning, pp. 49–56. Association for Computational Linguistics (2008)
Al-Haj, H., Wintner, S.: Identifying multi-word expressions by leveraging morphological and syntactic idiosyncrasy. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 10–18. Association for Computational Linguistics (2010)
Rondon, A., de Medeiros Caseli, H., Ramisch, C.: Never-ending multiword expressions learning. In: MWE@ NAACL-HLT, pp. 45–53 (2015)
Katz, G., Giesbrecht, E.: Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, pp. 12–19. Association for Computational Linguistics (2006)
Sporleder, C., Li, L.: Unsupervised recognition of literal and non-literal use of idiomatic expressions. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 754–762. Association for Computational Linguistics (2009)
Biemann, C., Giesbrecht, E.: Distributional semantics and compositionality 2011: shared task description and results. In: Proceedings of the Workshop on Distributional Semantics and Compositionality, pp. 21–28. Association for Computational Linguistics (2011)
Guevara, E.: Computing semantic compositionality in distributional semantics. In: Proceedings of the Ninth International Conference on Computational Semantics, pp. 135–144. Association for Computational Linguistics (2011)
Salehi, B., Cook, P., Baldwin, T.: A word embedding approach to predicting the compositionality of multiword expressions. In: HLT-NAACL, pp. 977–983 (2015)
Yazdani, M., Farahmand, M., Henderson, J.: Learning semantic composition to detect non-compositionality of multiword expressions. In: EMNLP, pp. 1733–1742 (2015)
Liebeskind, C., HaCohen-Kerner, Y.: Semantically motivated Hebrew verb-noun multi-word expressions identification. In: COLING, pp. 1242–1253 (2016)
Dandapat, S., Mitra, P., Sarkar, S.: Statistical investigation of Bengali noun-verb (NV) collocations as multi-word-expressions. In: Proceedings of Modeling and Shallow Parsing of Indian Languages, MSPIL, pp. 230–233 (2006)
Diab, M.T., Bhutada, P.: Verb noun construction MWE token supervised classification. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, pp. 17–22. Association for Computational Linguistics (2009)
Schneider, N., Danchik, E., Dyer, C., Smith, N.A.: Discriminative lexical semantic segmentation with gaps: running the MWE gamut. Trans. Assoc. Comput. Linguist. 2, 193–206 (2014)
Todirascu, A., Navlea, M.: Aligning Verb+Noun Collocation to Improve a French-Romanian Statistical MT System. John Benjamins (2015)
Blum, Y.P.: Techniques for automatic normalization of orthographically variant Yiddish texts (2015)
Liebeskind, C., HaCohen-Kerner, Y.: A lexical resource of Hebrew verb-noun multi-word expressions. In: LREC, pp. 522–527 (2016)
Acknowledgments
We would like to express our deep gratitude to Gitty Eithen, Bluma Zicherman, and Hindy Golomb, our research assistants, for carrying out the annotation process.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Liebeskind, C., HaCohen-Kerner, Y. (2018). Verbal Multi-Word Expressions in Yiddish. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2018. Lecture Notes in Computer Science(), vol 10859. Springer, Cham. https://doi.org/10.1007/978-3-319-91947-8_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-91947-8_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91946-1
Online ISBN: 978-3-319-91947-8
eBook Packages: Computer ScienceComputer Science (R0)