More Web Proxy on the site http://driver.im/

article

Soft pattern matching models for definitional question answering

Authors:

Tat-Seng ChuaAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 25, Issue 2

Pages 8 - es

https://doi.org/10.1145/1229179.1229182

Published: 01 April 2007 Publication History

Abstract

We explore probabilistic lexico-syntactic pattern matching, also known as soft pattern matching, in a definitional question answering system. Most current systems use regular expression-based hard matching patterns to identify definition sentences. Such rigid surface matching often fares poorly when faced with language variations. We propose two soft matching models to address this problem: one based on bigrams and the other on the Profile Hidden Markov Model (PHMM). Both models provide a theoretically sound method to model pattern matching as a probabilistic process that generates token sequences. We demonstrate the effectiveness of the models on definition sentence retrieval for definitional question answering. We show that both models significantly outperform the state-of-the-art manually constructed hard matching patterns on recent TREC data.

A critical difference between the two models is that the PHMM has a more complex topology. We experimentally show that the PHMM can handle language variations more effectively but requires more training data to converge.

While we evaluate soft pattern models only on definitional question answering, we believe that both models are generic and can be extended to other areas where lexico-syntactic pattern matching can be applied.

References

[1]

Agichtein, E. and Gravano, L. 2000. Snowball: Extracting relations from large plain-text collections. In Proceedings of the Fifth ACM Conference on Digital Libraries (DL '00). ACM Press, New York, NY, 85--94.

[2]

Ahn, D., Jijkoun, V., Mishne, G., &Muml;uller, K., de Rijke, M., and Schlobach, S. 2004. Using Wikipedia at the TREC QA Track. In Proceedings of TREC.

[3]

Blair-Goldensohn, S., McKeown, K., and Schlaikjer, A. H. 2003. A hybrid approach for QA track definitional questions. In Proceedings of TREC. 185--192.

[4]

Blair-Goldensohn, S., McKeown, K., and Schlaikjer, A. H. 2004. Answering definitional questions: A hybrid approach. In New Directions in Question Answering. 47--58.

[5]

Borkar, V., Deshmukh, K., and Sarawagi, S. 2001. Automatic segmentation of text into structured records. SIGMOD Rec. 30, 2, 175--186.

[6]

Carbonell, J. G. and Goldstein, J. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98). ACM Press, New York, NY, 335--336.

[7]

Chu-Carroll, J., Czuba, K., Prager, J., Ittycheriah, A., and Blair-Goldensohn, S. 2004. IBM's PIQUANT II in TREC 2004. In Proceedings of TREC.

[8]

Cui, H., Kan, M.-Y., and Chua, T.-S. 2004a. Unsupervised learning of soft patterns for generating definitions from online news. In Proceedings of WWW. 90--99.

[9]

Cui, H., Kan, M.-Y., and Chua, T.-S. 2005. Generic soft pattern models for definitional question answering. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '05). ACM Press, New York, NY, 384--391.

[10]

Cui, H., Kan, M.-Y., Chua, T.-S., and Xiao, J. 2004b. A comparative study on sentence retrieval for definitional question answering. In Proceedings of SIGIR 2005 Workshop IR4QA: Information Retrieval for Question Answering.

[11]

Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. 39, 1--38.

[12]

Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D. S., and Yates, A. 2004. Web-scale information extraction in Knowitall: (Preliminary results). In WWW '04: Proceedings of the 13th International Conference on World Wide Web (WWW '04). ACM Press, New York, NY, 100--110.

[13]

Gaizauskas, R., Greenwood, M. A., Hepple, M., Roberts, I., and Saggion, H. 2004. The University of Sheffields TREC 2004 Q&A experiments. In Proceedings of TREC.

[14]

Han, K.-S., Chung, H., Kim, S.-B., Song, Y.-I., Lee, J.-Y., and Rim, H.-C. 2004. Korea University Question Answering System at TREC 2004. In Proceedings of TREC.

[15]

Harabagiu, S. M., Moldovan, D. I., Clark, C., Bowden, M., Hickl, A., and Wang, P. 2005. Employing two question answering systems in TREC-2005. In Proceedings of TREC.

[16]

Harabagiu, S. M., Moldovan, D. I., Clark, C., Bowden, M., Williams, J., and Bensley, J. 2003. Answer mining by combining extraction techniques with abductive reasoning. In Proceedings of TREC. 375--382.

[17]

Harabagiu, S. M., Moldovan, D. I., Pasca, M., Mihalcea, R., Surdeanu, M., Bunescu, R. C., Girju, R., Rus, V., and Morarescu, P. 2000. FALCON: Boosting knowledge for answer engines. In Proceedings of TREC.

[18]

Hildebrandt, W., Katz, B., and Lin, J. J. 2004. Answering definition questions with multiple knowledge sources. In Proceedings of HLT-NAACL. 49--56.

[19]

Jing, H. 2000. Sentence reduction for automatic text summarization. In Proceedings of the Sixth Conference on Applied Natural Language Processing. Morgan Kaufmann, San Francisco, CA, 310--315.

[20]

Katz, B., Bilotti, M., Felshin, S., Fernandes, A., Hildebrandt, W., Katzir, R., Lin, J., Loreto, D., Marton, G., Mora, F., and Uzuner, O. 2004. Answering multiple questions on a topic from heterogeneous resources. In Proceedings of TREC.

[21]

Klavans, J. and Muresan, S. 2001. Evaluation of DEFINDER: A system to mine definitions from consumer-oriented medical text. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. 201--202.

[22]

Lannon, J. M. 1991. Technical Writing. HarperCollins, New York, NY.

[23]

Lin, C.-Y. and Hovy, E. H. 2003. Automatic evaluation of summaries using N-gram co-occurrence statistics. In Proceedings of HLT-NAACL.

[24]

Lin, J. and Demner-Fushman, D. 2005. Automatically evaluating answers to definition questions. In Proceedings of HLT/EMNLP. 931--938.

[25]

Lin, J. and Demner-Fushman, D. 2006. Will pyramids built of nuggets topple over&quest; In Proceedings of the Conference on Human Language Technology. 383--390.

[26]

Liu, B., Chin, C. W., and Ng, H. T. 2003. Mining topic-specific concepts and definitions on the Web. In Proceedings of WWW. 251--260.

[27]

Mani, I., Pustejovsky, J., and Sundheim, B. 2004. Introduction to the special issue on temporal information processing. ACM Trans. Asian Lang. Inform. Process. 3, 1, 1--10.

[28]

Manning, C. D. and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.

[29]

McCallum, A. 2003. Efficiently inducing features of conditional random fields. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI03).

[30]

McCallum, A., Freitag, D., and Pereira, F. C. N. 2000. Maximum Entropy Markov Models for information extraction and segmentation. In Proceedings of ICML. 591--598.

[31]

Muresan, S., Popper, S. D., Davis, P. T., and Klavans, J. L. 2003. Building a terminological database from heterogeneous definitional sources. In Proceedings of DG.O.

[32]

Muslea, I. 1999. Extraction patterns for information extraction tasks: A survey. In Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction. 1--6.

[33]

Peng, F., Weischedel, R., Licuanan, A., and Xu, J. 2005. Combining deep linguistics analysis and surface pattern learning: A hybrid approach to Chinese definitional question answering. In Proceedings of HLT/EMNLP. 307--314.

[34]

Prager, J., Radev, D., and Czuba, K. 2001. Answering what-is questions by virtual annotation. In HLT '01: Proceedings of the First International Conference on Human Language Technology Research. Association for Computational Linguistics, Morristown, NJ, 1--5.

[35]

Prager, J. M., Chu-Carroll, J., Czuba, K., Welty, C. A., Ittycheriah, A., and Mahindru, R. 2003. IBM's PIQUANT in TREC2003. In Proceedings of TREC. 283--292.

[36]

Radev, D. R., Jing, H., Sty, M., and Tam, D. 2004. Centroid-based summarization of multiple documents. Inf. Process. Manage. 40, 6, 919--938.

[37]

Ravichandran, D. and Hovy, E. H. 2002. Learning surface text patterns for a question answering system. In Proceedings of ACL. 41--47.

[38]

Riloff, E. and Wiebe, J. 2003. Learning extraction patterns for subjective expressions. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, M. Collins and M. Steedman, Eds. 105--112.

[39]

Rosenfeld, R. 2000. Two decades of statistical language modeling: Where do we go from here. Proc. the IEEE 88, 8.

[40]

Schiffman, B., Mani, I., and Concepcion, K. J. 2001. Producing biographical summaries: Combining linguistic knowledge with corpus statistics. In Proceedings of ACL. 450--457.

[41]

Schwartz, A. S. and Hearst, M. A. 2003. A simple algorithm for identifying abbreviation definitions in biomedical text. In Proceedings of Pacific Symposium on Biocomputing. 451--462.

[42]

Skounakis, M., Craven, M., and Ray, S. 2003. Hierarchical Hidden Markov Models for information extraction. In Proceedings of 18th International Joint Conference on Artificial Intelligence. 427--433.

[43]

Sudo, K., Sekine, S., and Grishman, R. 2003. An improved extraction pattern representation model for automatic IE pattern acquisition. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. 224--231.

[44]

Voorhees, E. M. 2001. Overview of the TREC 2001 Question Answering Track. In Proceedings of TREC.

[45]

Voorhees, E. M. 2003a. Evaluating answers to definition questions. In Proceedings of HLT-NAACL.

[46]

Voorhees, E. M. 2003b. Overview of the TREC 2003 Question Answering Track. In Proceedings of TREC. 54--68.

[47]

Voorhees, E. M. 2004. Overview of the TREC 2004 Question Answering Track. In Proceedings of TREC.

[48]

Xiao, J., Chua, T.-S., and Cui, H. 2004. Cascading use of soft and hard matching pattern rules for weakly supervised information extraction. In Proceedings of COLING 2004. COLING, Geneva, Switzerland, 542--548.

[49]

Xu, J., Licuanan, A., and Weischedel, R. M. 2003. TREC 2003 QA at BBN: Answering definitional questions. In Proceedings of TREC. 98--106.

[50]

Xu, J., Weischedel, R. M., and Licuanan, A. 2004. Evaluation of an extraction-based approach to answering definitional questions. In Proceedings of SIGIR. 418--424.

[51]

Yang, H., Cui, H., Maslennikov, M., Qiu, L., Kan, M.-Y., and Chua, T.-S. 2003. QUALIFIER In TREC-12 QA main task. In Proceedings of TREC. 480--488.

[52]

Zahariev, M. 2003. Efficient acronym-expansion matching for automatic acronym acquisition. In Proceedings of IKE. 32--37.

Cited By

Godara SBedi JParsad RSingh DBana RMarwaha S(2024)AgriResponse: A Real-Time Agricultural Query-Response Generation System for Assisting Nationwide FarmersIEEE Access10.1109/ACCESS.2023.333925312(294-311)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2023.3339253
Mandar Suryavanshi (2023)Question Answering System Approaches: A ReviewInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-8301(288-296)Online publication date: 8-Feb-2023
https://doi.org/10.48175/IJARSCT-8301
Suissa OZhitomirsky-geffet MElmalech A(2023)Around the GLOBE: Numerical Aggregation Question-answering on Heterogeneous Genealogical Knowledge Graphs with Deep Neural NetworksJournal on Computing and Cultural Heritage 10.1145/358608116:3(1-24)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1145/3586081
Show More Cited By

Index Terms

Soft pattern matching models for definitional question answering
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
2. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Generic soft pattern models for definitional question answering
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

This paper explores probabilistic lexico-syntactic pattern matching, also known as soft pattern matching. While previous methods in soft pattern matching are ad hoc in computing the degree of match, we propose two formal matching models: one based on ...
Unsupervised learning of soft patterns for generating definitions from online news
WWW '04: Proceedings of the 13th international conference on World Wide Web

Breaking news often contains timely definitions and descriptions of current terms, organizations and personalities. We utilize such web sources to construct definitions for such terms. Previous work has identified definitions using hand-crafted rules or ...
Answer extraction and ranking strategies for definitional question answering using linguistic features and definition terminology
Special issue: AIRS2005: Information retrieval research in Asia

We propose answer extraction and ranking strategies for definitional question answering using linguistic features and definition terminology. A passage expansion technique based on simple anaphora resolution is introduced to retrieve more informative ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 25, Issue 2

April 2007

141 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/1229179

Issue’s Table of Contents

Copyright © 2007 ACM.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2007

Published in TOIS Volume 25, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

49
Total Citations
View Citations
1,172
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)2

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Godara SBedi JParsad RSingh DBana RMarwaha S(2024)AgriResponse: A Real-Time Agricultural Query-Response Generation System for Assisting Nationwide FarmersIEEE Access10.1109/ACCESS.2023.333925312(294-311)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2023.3339253
Mandar Suryavanshi (2023)Question Answering System Approaches: A ReviewInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-8301(288-296)Online publication date: 8-Feb-2023
https://doi.org/10.48175/IJARSCT-8301
Suissa OZhitomirsky-geffet MElmalech A(2023)Around the GLOBE: Numerical Aggregation Question-answering on Heterogeneous Genealogical Knowledge Graphs with Deep Neural NetworksJournal on Computing and Cultural Heritage 10.1145/358608116:3(1-24)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1145/3586081
Yin MTang LWebster CYi XYing HWen Y(2023)A deep natural language processing‐based method for ontology learning of project‐specific properties from building information modelsComputer-Aided Civil and Infrastructure Engineering10.1111/mice.13013Online publication date: 27-Apr-2023
https://doi.org/10.1111/mice.13013
Zope BMishra SShaw KVora DKotecha KBidwe R(2022)Question Answer System: A State-of-Art Representation of Quantitative and Qualitative AnalysisBig Data and Cognitive Computing10.3390/bdcc60401096:4(109)Online publication date: 7-Oct-2022
https://doi.org/10.3390/bdcc6040109
Nassiri KAkhloufi M(2022)Transformer models used for text-based question answering systemsApplied Intelligence10.1007/s10489-022-04052-853:9(10602-10635)Online publication date: 20-Aug-2022
https://dl.acm.org/doi/10.1007/s10489-022-04052-8
Issa Alaa Aldine AHarzallah MBerio GBéchet NFaour A(2021)A 3-phase approach based on sequential mining and dependency parsing for enhancing hypernym patterns performanceThe Knowledge Engineering Review10.1017/S026988892100012636Online publication date: 22-Sep-2021
https://doi.org/10.1017/S0269888921000126
Kumar CAnirudh CMurthy K(2020)Definitional Question Answering Using Text TripletsData Engineering and Communication Technology10.1007/978-981-15-1097-7_10(119-130)Online publication date: 9-Jan-2020
https://doi.org/10.1007/978-981-15-1097-7_10
Tan YWang XJia T(2020)From Syntactic Structure to Semantic Relationship: Hypernym Extraction from Definitions by Recurrent Neural Networks Using the Part of Speech InformationThe Semantic Web – ISWC 202010.1007/978-3-030-62419-4_30(529-546)Online publication date: 2-Nov-2020
https://dl.acm.org/doi/10.1007/978-3-030-62419-4_30
Peng MQin YTang CDeng X(2018)An E-Commerce Customer Service Robot Based on Intention Recognition ModelMobile Commerce10.4018/978-1-5225-2599-8.ch017(328-339)Online publication date: 2018
https://doi.org/10.4018/978-1-5225-2599-8.ch017
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents