[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Soft pattern matching models for definitional question answering

Published: 01 April 2007 Publication History

Abstract

We explore probabilistic lexico-syntactic pattern matching, also known as soft pattern matching, in a definitional question answering system. Most current systems use regular expression-based hard matching patterns to identify definition sentences. Such rigid surface matching often fares poorly when faced with language variations. We propose two soft matching models to address this problem: one based on bigrams and the other on the Profile Hidden Markov Model (PHMM). Both models provide a theoretically sound method to model pattern matching as a probabilistic process that generates token sequences. We demonstrate the effectiveness of the models on definition sentence retrieval for definitional question answering. We show that both models significantly outperform the state-of-the-art manually constructed hard matching patterns on recent TREC data.
A critical difference between the two models is that the PHMM has a more complex topology. We experimentally show that the PHMM can handle language variations more effectively but requires more training data to converge.
While we evaluate soft pattern models only on definitional question answering, we believe that both models are generic and can be extended to other areas where lexico-syntactic pattern matching can be applied.

References

[1]
Agichtein, E. and Gravano, L. 2000. Snowball: Extracting relations from large plain-text collections. In Proceedings of the Fifth ACM Conference on Digital Libraries (DL '00). ACM Press, New York, NY, 85--94.
[2]
Ahn, D., Jijkoun, V., Mishne, G., &Muml;uller, K., de Rijke, M., and Schlobach, S. 2004. Using Wikipedia at the TREC QA Track. In Proceedings of TREC.
[3]
Blair-Goldensohn, S., McKeown, K., and Schlaikjer, A. H. 2003. A hybrid approach for QA track definitional questions. In Proceedings of TREC. 185--192.
[4]
Blair-Goldensohn, S., McKeown, K., and Schlaikjer, A. H. 2004. Answering definitional questions: A hybrid approach. In New Directions in Question Answering. 47--58.
[5]
Borkar, V., Deshmukh, K., and Sarawagi, S. 2001. Automatic segmentation of text into structured records. SIGMOD Rec. 30, 2, 175--186.
[6]
Carbonell, J. G. and Goldstein, J. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98). ACM Press, New York, NY, 335--336.
[7]
Chu-Carroll, J., Czuba, K., Prager, J., Ittycheriah, A., and Blair-Goldensohn, S. 2004. IBM's PIQUANT II in TREC 2004. In Proceedings of TREC.
[8]
Cui, H., Kan, M.-Y., and Chua, T.-S. 2004a. Unsupervised learning of soft patterns for generating definitions from online news. In Proceedings of WWW. 90--99.
[9]
Cui, H., Kan, M.-Y., and Chua, T.-S. 2005. Generic soft pattern models for definitional question answering. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '05). ACM Press, New York, NY, 384--391.
[10]
Cui, H., Kan, M.-Y., Chua, T.-S., and Xiao, J. 2004b. A comparative study on sentence retrieval for definitional question answering. In Proceedings of SIGIR 2005 Workshop IR4QA: Information Retrieval for Question Answering.
[11]
Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. 39, 1--38.
[12]
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D. S., and Yates, A. 2004. Web-scale information extraction in Knowitall: (Preliminary results). In WWW '04: Proceedings of the 13th International Conference on World Wide Web (WWW '04). ACM Press, New York, NY, 100--110.
[13]
Gaizauskas, R., Greenwood, M. A., Hepple, M., Roberts, I., and Saggion, H. 2004. The University of Sheffields TREC 2004 Q&A experiments. In Proceedings of TREC.
[14]
Han, K.-S., Chung, H., Kim, S.-B., Song, Y.-I., Lee, J.-Y., and Rim, H.-C. 2004. Korea University Question Answering System at TREC 2004. In Proceedings of TREC.
[15]
Harabagiu, S. M., Moldovan, D. I., Clark, C., Bowden, M., Hickl, A., and Wang, P. 2005. Employing two question answering systems in TREC-2005. In Proceedings of TREC.
[16]
Harabagiu, S. M., Moldovan, D. I., Clark, C., Bowden, M., Williams, J., and Bensley, J. 2003. Answer mining by combining extraction techniques with abductive reasoning. In Proceedings of TREC. 375--382.
[17]
Harabagiu, S. M., Moldovan, D. I., Pasca, M., Mihalcea, R., Surdeanu, M., Bunescu, R. C., Girju, R., Rus, V., and Morarescu, P. 2000. FALCON: Boosting knowledge for answer engines. In Proceedings of TREC.
[18]
Hildebrandt, W., Katz, B., and Lin, J. J. 2004. Answering definition questions with multiple knowledge sources. In Proceedings of HLT-NAACL. 49--56.
[19]
Jing, H. 2000. Sentence reduction for automatic text summarization. In Proceedings of the Sixth Conference on Applied Natural Language Processing. Morgan Kaufmann, San Francisco, CA, 310--315.
[20]
Katz, B., Bilotti, M., Felshin, S., Fernandes, A., Hildebrandt, W., Katzir, R., Lin, J., Loreto, D., Marton, G., Mora, F., and Uzuner, O. 2004. Answering multiple questions on a topic from heterogeneous resources. In Proceedings of TREC.
[21]
Klavans, J. and Muresan, S. 2001. Evaluation of DEFINDER: A system to mine definitions from consumer-oriented medical text. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. 201--202.
[22]
Lannon, J. M. 1991. Technical Writing. HarperCollins, New York, NY.
[23]
Lin, C.-Y. and Hovy, E. H. 2003. Automatic evaluation of summaries using N-gram co-occurrence statistics. In Proceedings of HLT-NAACL.
[24]
Lin, J. and Demner-Fushman, D. 2005. Automatically evaluating answers to definition questions. In Proceedings of HLT/EMNLP. 931--938.
[25]
Lin, J. and Demner-Fushman, D. 2006. Will pyramids built of nuggets topple over? In Proceedings of the Conference on Human Language Technology. 383--390.
[26]
Liu, B., Chin, C. W., and Ng, H. T. 2003. Mining topic-specific concepts and definitions on the Web. In Proceedings of WWW. 251--260.
[27]
Mani, I., Pustejovsky, J., and Sundheim, B. 2004. Introduction to the special issue on temporal information processing. ACM Trans. Asian Lang. Inform. Process. 3, 1, 1--10.
[28]
Manning, C. D. and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.
[29]
McCallum, A. 2003. Efficiently inducing features of conditional random fields. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI03).
[30]
McCallum, A., Freitag, D., and Pereira, F. C. N. 2000. Maximum Entropy Markov Models for information extraction and segmentation. In Proceedings of ICML. 591--598.
[31]
Muresan, S., Popper, S. D., Davis, P. T., and Klavans, J. L. 2003. Building a terminological database from heterogeneous definitional sources. In Proceedings of DG.O.
[32]
Muslea, I. 1999. Extraction patterns for information extraction tasks: A survey. In Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction. 1--6.
[33]
Peng, F., Weischedel, R., Licuanan, A., and Xu, J. 2005. Combining deep linguistics analysis and surface pattern learning: A hybrid approach to Chinese definitional question answering. In Proceedings of HLT/EMNLP. 307--314.
[34]
Prager, J., Radev, D., and Czuba, K. 2001. Answering what-is questions by virtual annotation. In HLT '01: Proceedings of the First International Conference on Human Language Technology Research. Association for Computational Linguistics, Morristown, NJ, 1--5.
[35]
Prager, J. M., Chu-Carroll, J., Czuba, K., Welty, C. A., Ittycheriah, A., and Mahindru, R. 2003. IBM's PIQUANT in TREC2003. In Proceedings of TREC. 283--292.
[36]
Radev, D. R., Jing, H., Sty, M., and Tam, D. 2004. Centroid-based summarization of multiple documents. Inf. Process. Manage. 40, 6, 919--938.
[37]
Ravichandran, D. and Hovy, E. H. 2002. Learning surface text patterns for a question answering system. In Proceedings of ACL. 41--47.
[38]
Riloff, E. and Wiebe, J. 2003. Learning extraction patterns for subjective expressions. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, M. Collins and M. Steedman, Eds. 105--112.
[39]
Rosenfeld, R. 2000. Two decades of statistical language modeling: Where do we go from here. Proc. the IEEE 88, 8.
[40]
Schiffman, B., Mani, I., and Concepcion, K. J. 2001. Producing biographical summaries: Combining linguistic knowledge with corpus statistics. In Proceedings of ACL. 450--457.
[41]
Schwartz, A. S. and Hearst, M. A. 2003. A simple algorithm for identifying abbreviation definitions in biomedical text. In Proceedings of Pacific Symposium on Biocomputing. 451--462.
[42]
Skounakis, M., Craven, M., and Ray, S. 2003. Hierarchical Hidden Markov Models for information extraction. In Proceedings of 18th International Joint Conference on Artificial Intelligence. 427--433.
[43]
Sudo, K., Sekine, S., and Grishman, R. 2003. An improved extraction pattern representation model for automatic IE pattern acquisition. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. 224--231.
[44]
Voorhees, E. M. 2001. Overview of the TREC 2001 Question Answering Track. In Proceedings of TREC.
[45]
Voorhees, E. M. 2003a. Evaluating answers to definition questions. In Proceedings of HLT-NAACL.
[46]
Voorhees, E. M. 2003b. Overview of the TREC 2003 Question Answering Track. In Proceedings of TREC. 54--68.
[47]
Voorhees, E. M. 2004. Overview of the TREC 2004 Question Answering Track. In Proceedings of TREC.
[48]
Xiao, J., Chua, T.-S., and Cui, H. 2004. Cascading use of soft and hard matching pattern rules for weakly supervised information extraction. In Proceedings of COLING 2004. COLING, Geneva, Switzerland, 542--548.
[49]
Xu, J., Licuanan, A., and Weischedel, R. M. 2003. TREC 2003 QA at BBN: Answering definitional questions. In Proceedings of TREC. 98--106.
[50]
Xu, J., Weischedel, R. M., and Licuanan, A. 2004. Evaluation of an extraction-based approach to answering definitional questions. In Proceedings of SIGIR. 418--424.
[51]
Yang, H., Cui, H., Maslennikov, M., Qiu, L., Kan, M.-Y., and Chua, T.-S. 2003. QUALIFIER In TREC-12 QA main task. In Proceedings of TREC. 480--488.
[52]
Zahariev, M. 2003. Efficient acronym-expansion matching for automatic acronym acquisition. In Proceedings of IKE. 32--37.

Cited By

View all
  • (2024)AgriResponse: A Real-Time Agricultural Query-Response Generation System for Assisting Nationwide FarmersIEEE Access10.1109/ACCESS.2023.333925312(294-311)Online publication date: 2024
  • (2023)Question Answering System Approaches: A ReviewInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-8301(288-296)Online publication date: 8-Feb-2023
  • (2023)Around the GLOBE: Numerical Aggregation Question-answering on Heterogeneous Genealogical Knowledge Graphs with Deep Neural NetworksJournal on Computing and Cultural Heritage 10.1145/358608116:3(1-24)Online publication date: 9-Aug-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems
ACM Transactions on Information Systems  Volume 25, Issue 2
April 2007
141 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/1229179
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2007
Published in TOIS Volume 25, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Soft patterns
  2. definitional question answering

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)2
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)AgriResponse: A Real-Time Agricultural Query-Response Generation System for Assisting Nationwide FarmersIEEE Access10.1109/ACCESS.2023.333925312(294-311)Online publication date: 2024
  • (2023)Question Answering System Approaches: A ReviewInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-8301(288-296)Online publication date: 8-Feb-2023
  • (2023)Around the GLOBE: Numerical Aggregation Question-answering on Heterogeneous Genealogical Knowledge Graphs with Deep Neural NetworksJournal on Computing and Cultural Heritage 10.1145/358608116:3(1-24)Online publication date: 9-Aug-2023
  • (2023)A deep natural language processing‐based method for ontology learning of project‐specific properties from building information modelsComputer-Aided Civil and Infrastructure Engineering10.1111/mice.13013Online publication date: 27-Apr-2023
  • (2022)Question Answer System: A State-of-Art Representation of Quantitative and Qualitative AnalysisBig Data and Cognitive Computing10.3390/bdcc60401096:4(109)Online publication date: 7-Oct-2022
  • (2022)Transformer models used for text-based question answering systemsApplied Intelligence10.1007/s10489-022-04052-853:9(10602-10635)Online publication date: 20-Aug-2022
  • (2021)A 3-phase approach based on sequential mining and dependency parsing for enhancing hypernym patterns performanceThe Knowledge Engineering Review10.1017/S026988892100012636Online publication date: 22-Sep-2021
  • (2020)Definitional Question Answering Using Text TripletsData Engineering and Communication Technology10.1007/978-981-15-1097-7_10(119-130)Online publication date: 9-Jan-2020
  • (2020)From Syntactic Structure to Semantic Relationship: Hypernym Extraction from Definitions by Recurrent Neural Networks Using the Part of Speech InformationThe Semantic Web – ISWC 202010.1007/978-3-030-62419-4_30(529-546)Online publication date: 2-Nov-2020
  • (2018)An E-Commerce Customer Service Robot Based on Intention Recognition ModelMobile Commerce10.4018/978-1-5225-2599-8.ch017(328-339)Online publication date: 2018
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media