Abstract
Question classification is one of the first tasks carried out in a Question Answering system. In this paper we present a multilingual question classification system based on machine learning techniques. We use Support Vector Machines to classify the questions. All the features needed to train and test this method are automatically extracted through statistical information in an unsupervised way, comparing Poisson distributions of single words in two plain corpora of questions and documents. Thus, we need nothing but plain text to train the system, obtaining a flexible approach easy to adapt to new languages and domains. We have tested it on a bilingual corpus of questions in English and Spanish.
This work has been developed in the framework of the project CICYT R2D2 (TIC2003-07158-C04).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hermjakob, U.: Parsing and question classification for question answering. In: Proceedings of the ACL 2001 Workshop on Open-Domain Question Answering (2001)
Li, X., Roth, D.: Learning question classifiers. In: Proceedings of COLING (2002)
Bisbal, E., Tomás, D., Vicedo, J.L., Moreno, L.: A Multilingual SVM-Based Question Classification System. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds.) MICAI 2005. LNCS, vol. 3789, pp. 806–815. Springer, Heidelberg (2005)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, N.Y ISBN 0-387-94559-8
Manning, C., Schütze, H.: Foundations of Statistical natural Language Processing. MIT Press, Cambridge (1999)
Tomás, D., Bisbal, E., Vicedo, J.L., Moreno, L., Suárez, A.: Una aproximación multilingüe a la clasificación de preguntas basada en aprendizaje automático. Procesamiento del Lenguaje Natural (SEPLN) 35, 391–400 (2005)
Magnini, B., Romagnoli, S., Vallin, A., Herrera, J., Peñas, A., Peinado, V., Verdejo, F., de Rijke, M.: Creating the DISEQuA Corpus: A Test Set for Multilingual Question Answering
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tomás, D., Vicedo, J.L., Bisbal, E., Moreno, L. (2006). Automatic Feature Extraction for Question Classification Based on Dissimilarity of Probability Distributions. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds) Advances in Natural Language Processing. FinTAL 2006. Lecture Notes in Computer Science(), vol 4139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816508_15
Download citation
DOI: https://doi.org/10.1007/11816508_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37334-6
Online ISBN: 978-3-540-37336-0
eBook Packages: Computer ScienceComputer Science (R0)