Abstract
The behavior of verbs in sublanguages is highly specific and does not follow general principles of lexical decomposition. NLP applications require specific lexicons for tasks like surface parsing and shallow semantic interpretation. The reduced set of verbal senses specific to a given domain is more appropriate for efficient processing in real world tasks (e.g. information extraction and retrieval). In this paper a method for learning verb subcategorization patterns from corpora is proposed. Conceptual clustering techniques are applied to the results of surface parsing in order to extract relevant domain typical senses and automatically build a lexicon of subcategorization frames. The aim is to learn a core of lexico-grammatical knowledge suitable to support more sophisticated parsing strategies to be applied in a target NLP application. Results derived for the Italian language from several corpora are presented.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Basili R., M.T. Pazienza, P. Velardi, A shallow syntactic analyser to extract word associations from corpora, Literary and Linguistic Computing, 1992, vol.7, n. 2, 114–124.
Basili R., M.T. Pazienza, P. Velardi, What can be learned from raw texts?, Journal of Machine Translation, 8:147–173, 1993.
Basili R., A. Marziali, M.T. Pazienza, Modelling Syntactic Uncertainty in Lexical Acquisition from Texts, Journal of Quantitative Linguistics, 1,1: 62–81,1994.
Basili R., Marziali A., Pazienza M.T., and Velardi P, Unsupervised learning of syntactic knowledge: Methods and measures,in Proceedings of the International Conference on Empirical Methods in Natural Language Processing, Philadelfia, Pennsylvania, 1996.
Basili R., M.T. Pazienza, P. Velardi, Integrating General Purpose and Corpus-based Verb Classifications, to appear in Computationa Linguistics 1997.
Brent M. R. Automatic Acquisition of Subcategorisation Frames from Unrestricted English, PhD Thesis, 1989.
Carpineto C., Romano G. GALOIS: An order-theoretic approach to conceptual clustering, Fondazione Ugo Bordoni, 1993.
Chomsky N., Aspects of the Theory of Syntax, MIT Press, Cambridge, MA, 1965.
Chomsky N., Lectures on Government and Binding, Foris Publications, Dordrecht, 1981.
Del Monte, R. and Dolci, R. “ Parsing Italian with a Contextfree recogniser” Annali di Ca' Foscari XXVIII,1–2, 1989.
Gennari J. H. & Langley P. & Fisher D. H. Models of incremental concept formation, Artificial Intelligence, 40, 11–61, 1989.
Gazdar G. Klein E., Pullum K. Sag I. Developments in GPSG theory Indiana University Linguistics, 38–68, 1985
Grimshaw J. Complement selection and the Lexicon. Linguistic Inquiry 10 (2):279–326, 1977.
Kaplan R., Bresnan J. Lexical-Functional Grammar: A Formal System for Grammatical Representation, in J. Bresnan Ed., The Mental Representation of Grammatical Relations, MIT Press, Cambridge, MA, 1982.
Pollard C. & Sag I. Information-Based Syntax and Semantics, CSLI Lecture Note Series, Chicago, 1987
Pollard C. & Sag I. Head-Driven Phrase Structure Grammar, CSLI Lecture Note Series, Chicago, 1994
XTAG Research Group. A Lexicalized Tree Adjoining Grammar for English, Technical Report IRCS 95-03, University of Pennsylvania, 1995.
N. Zingarelli, Vocabolario della lingua italiana, 1970
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Basili, R., Pazienza, M.T., Vindigni, M. (1997). Corpus-driven unsupervised learning of verb subcategorization frames. In: Lenzerini, M. (eds) AI*IA 97: Advances in Artificial Intelligence. AI*IA 1997. Lecture Notes in Computer Science, vol 1321. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63576-9_105
Download citation
DOI: https://doi.org/10.1007/3-540-63576-9_105
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63576-5
Online ISBN: 978-3-540-69601-8
eBook Packages: Springer Book Archive