Abstract
Question Answering Systems is a field of Information Retrieval and Natural Language Processing that automatically answers questions posed by humans in a natural language. One of the main steps of these systems is the Question Classification, where the system tries to identify the type of question (i.e. if it is related to a person, time or a location) facilitate the generation of a precise answer. Machine learning techniques are commonly employed in tasks where the text is represented as a vector of features, such as bag–of–words, Term Frequency-Inverse Document Frequency (TF-IDF) or word embeddings. However, the quality of results produced by supervised algorithms is dependent on the existence of a large, domain-dependent training dataset which sometimes is unavailable due to labor-intense of manual annotation of datasets. Normally, word embedding presents a related better performance on small training sets, while bag-of-words and TF-IDF presents better results on large training sets. In this work, we propose a hybrid model that combines TF-IDF and word embedding in order to provide the answer type to text questions using small and large training sets. Our experiments using the Portuguese language, using several different sizes of training sets, showed that the proposed hybrid model statistically outperforms bag-of-words, TF-IDF, and word embedding approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amaral, C., et al.: Priberam’s question answering system in QA@CLEF 2007. In: Peters, C., et al. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 364–371. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85760-0_46
Cavalin, P., et al.: Building a question-answering corpus using social media and news articles. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS (LNAI), vol. 9727, pp. 353–358. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41552-9_36
Freitas, C., Mota, C., Santos, D., Oliveira, H.G., Carvalho, P.: Second harem: advancing the state of the art of named entity recognition in portuguese. In: LREC. Citeseer (2010)
Gonçalves, P.N., Branco, A.H.: A comparative evaluation of QA systems over list questions. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS (LNAI), vol. 9727, pp. 115–121. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41552-9_11
Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., Aluisio, S.: Portuguese word embeddings: evaluating on word analogies and natural language tasks. arXiv preprint arXiv:1708.06025 (2017)
Hovy, E., Hermjakob, U., Ravichandran, D.: A question/answer typology with surface text patterns. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 247–251. Morgan Kaufmann Publishers Inc. (2002)
Huang, Z., Thint, M., Qin, Z.: Question classification using head words and their hypernyms. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 927–936. Association for Computational Linguistics (2008)
Jurafsky, D., Martin, J.H.: Speech and Language Processing, vol. 3. Pearson, London (2014)
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)
Lee, J.Y., Dernoncourt, F.: Sequential short-text classification with recurrent and convolutional neural networks. arXiv preprint arXiv:1603.03827 (2016)
Loni, B.: A survey of state-of-the-art methods on question classification (2011)
Ma, M., Huang, L., Xiang, B., Zhou, B.: Group sparse CNNs for question classification with answer sets. arXiv preprint arXiv:1710.02717 (2017)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Mouriño-García, M., Pérez-Rodríguez, R., Anido-Rifón, L., Gómez-Carballa, M.: Bag-of-concepts document representation for Bayesian text classification. In: 2016 IEEE International Conference on Computer and Information Technology (CIT), pp. 281–288. IEEE (2016)
Nirob, S.M.H., Nayeem, M.K., Islam, M.S.: Question classification using support vector machine with hybrid feature extraction method. In: 2017 20th International Conference of Computer and Information Technology (ICCIT), pp. 1–6. IEEE (2017)
Santos, D., Rocha, P.: The key to the first CLEF with Portuguese: topics, questions and answers in CHAVE. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds.) CLEF 2004. LNCS, vol. 3491, pp. 821–832. Springer, Heidelberg (2005). https://doi.org/10.1007/11519645_80
dos Santos, H.D., Ulbrich, A.H.D., Woloszyn, V., Vieira, R.: DDC-Outlier: Preventing medication errors using unsupervised learning. IEEE J. Biomed. Health Inform. (2018)
Sarrouti, M., El Alaoui, S.O.: A machine learning-based method for question type classification in biomedical question answering. Methods Inf. Med. 56(03), 209–216 (2017)
Solorio, T., Pérez-Coutiño, M., Montes-y-Gómez, M., Villaseñor-Pineda, L., López-López, A.: Question classification in Spanish and Portuguese. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 612–619. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-30586-6_66
Wang, P., Xu, J., Xu, B., Liu, C., Zhang, H., Wang, F., Hao, H.: Semantic clustering and convolutional neural network for short text categorization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), vol. 2, pp. 352–357 (2015)
Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, pp. 90–94. Association for Computational Linguistics (2012)
Woloszyn, V., Machado, G.M., de Oliveira, J.P.M., Wives, L., Saggion, H.: Beatnik: an algorithm to automatic generation of educational description of movies. In: Brazilian Symposium on Computers in Education (Simpósio Brasileiro de Informática na Educação-SBIE), vol. 28, p. 1377 (2017)
Woloszyn, V., Nejdl, W.: Distrustrank: spotting false news domains. In: Proceedings of the 10th ACM Conference on Web Science, pp. 221–228. ACM (2018)
Woloszyn, V., dos Santos, H.D., Wives, L.K., Becker, K.: MRR: an unsupervised algorithm to rank reviews by relevance. In: Proceedings of the International Conference on Web Intelligence, pp. 877–883. ACM (2017)
Xia, W., Zhu, W., Liao, B., Chen, M., Cai, L., Huang, L.: Novel architecture for long short-term memory used in question classification. Neurocomputing 299, 20–31 (2018)
Xu, J., Zhou, Y., Wang, Y.: A classification of questions using SVM and semantic similarity analysis. In: 2012 Sixth International Conference on Internet Computing for Science and Engineering (ICICSE), pp. 31–34. IEEE (2012)
Zhang, D., Lee, W.S.: Question classification using support vector machines. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 26–32. ACM (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Cortes, E.G., Woloszyn, V., Barone, D.A.C. (2018). When, Where, Who, What or Why? A Hybrid Model to Question Answering Systems. In: Villavicencio, A., et al. Computational Processing of the Portuguese Language. PROPOR 2018. Lecture Notes in Computer Science(), vol 11122. Springer, Cham. https://doi.org/10.1007/978-3-319-99722-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-99722-3_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99721-6
Online ISBN: 978-3-319-99722-3
eBook Packages: Computer ScienceComputer Science (R0)