[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/1641451.1641454dlproceedingsArticle/Chapter ViewAbstractPublication PagesirqaConference Proceedingsconference-collections
research-article
Free access

Simple is best: experiments with different document segmentation strategies for passage retrieval

Published: 24 August 2008 Publication History

Abstract

Passage retrieval is used in QA to filter large document collections in order to find text units relevant for answering given questions. In our QA system we apply standard IR techniques and index-time passaging in the retrieval component. In this paper we investigate several ways of dividing documents into passages. In particular we look at semantically motivated approaches (using coreference chains and discourse clues) compared with simple window-based techniques. We evaluate retrieval performance and the overall QA performance in order to study the impact of the different segmentation approaches. From our experiments we can conclude that the simple techniques using fixed-sized windows clearly outperform the semantically motivated approaches, which indicates that uniformity in size seems to be more important than semantic coherence in our setup.

References

[1]
Callan, James P. 1994. Passage-level evidence in document retrieval. In SIGIR '94: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 302--310, New York, NY, USA. Springer-Verlag New York, Inc.
[2]
Greenwood, Mark A. 2004. Using pertainyms to improve passage retrieval for questions requesting information about a location. In Proceedings of the Workshop on Information Retrieval for Question Answering (SIGIR 2004), Sheffield, UK.
[3]
Hearst, Marti A. and Christian Plaunt. 1993. Subtopic structuring for full-length document access. In Research and Development in Information Retrieval, pages 59--68.
[4]
Hearst, Marti A. 1997. Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1):33--64.
[5]
Hoste, V. 2005. Optimization Issues in Machine Learning of Coreference Resolution. Ph.D. thesis, University of Antwerp.
[6]
Kaszkiel, Marcin and Justin Zobel. 1997. Passage retrieval revisited. In SIGIR '97: Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, pages 178--185, New York, NY, USA. ACM Press.
[7]
Kaszkiel, Marcin and Justin Zobel. 2001. Effective ranking with arbitrary passages. Journal of the American Society of Information Science, 52(4):344--364.
[8]
Lester, Nicholas, Hugh Williams, Justin Zobel, Falk Scholer, Dirk Bahle, John Yiannis, Bodo von Billerbeck, Steven Garcia, and William Webber. 2006. The Zettair search engine. http://www.seg.rmit.edu.au/zettair/.
[9]
Llopis, F., J. Vicedo, and A. Ferrández. 2002. Passage selection to improve question answering. In Proceedings of the COLING 2002 Workshop on Multilingual Summarization and Question Answering.
[10]
Moldovan, D., S. Harabagiu, M. Pasca, R. Mihalcea, R. Girju, R. Goodrum, and V. Rus. 2000. The structure and performance of an open-domain question answering system.
[11]
Monz, Christof. 2003. From Document Retrieval to Question Answering. Ph.D. thesis, University of Amsterdam.
[12]
Roberts, Ian and Robert Gaizauskas. 2004. Evaluating passage retrieval approaches for question answering. In Proceedings of 26th European Conference on Information Retrieval.
[13]
Robertson, Stephen E., Steve Walker, Micheline Hancock-Beaulieu, Aarron Gull, and Marianna Lau. 1992. Okapi at TREC-3. In Text REtrieval Conference, pages 21--30.
[14]
Tellex, S., B. Katz, J. Lin, A. Fernandes, and G. Marton. 2003. Quantitative evaluation of passage retrieval algorithms for question answering. In Proceedings of the SIGIR conference on Research and development in informaion retrieval, pages 41--47. ACM Press.
[15]
Van Deemter, K. and R. Kibble. 2000. On coreferring: Coreference in muc and related annotation schemes. Computational Linguistics, 26(4):629--637.
[16]
van Noord, Gertjan. 2006. At Last Parsing Is Now Operational. In TALN 2006 Verbum Ex Machina, Actes De La 13e Conference sur Le Traitement Automatique des Langues naturelles, pages 20--42, Leuven.
[17]
Vilain, M., J. Burger, J. Aberdeen, D. Connolly, and L. Hirschman. 1993. A model-theoretic coreference scoring scheme. In Proceedings of the 6th conference on Message understanding (MUC 6), pages 45--52.
[18]
Zobel, Justin, Alistair Moffat, Ross Wilkinson, and Ron Sacks-Davis. 1995. Efficient retrieval of partial documents. Information Processing and Management, 31(3):361--377.

Cited By

View all
  • (2014)Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visual DocumentsProceedings of International Conference on Multimedia Retrieval10.1145/2578726.2578753(217-224)Online publication date: 1-Apr-2014
  • (2008)Using lexico-semantic information for query expansion in passage retrieval for question answeringColing 2008: Proceedings of the 2nd workshop on Information Retrieval for Question Answering10.5555/1641451.1641458(50-57)Online publication date: 24-Aug-2008

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
IRQA '08: Coling 2008: Proceedings of the 2nd workshop on Information Retrieval for Question Answering
August 2008
91 pages
ISBN:9781905593552

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 24 August 2008

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)48
  • Downloads (Last 6 weeks)8
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2014)Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visual DocumentsProceedings of International Conference on Multimedia Retrieval10.1145/2578726.2578753(217-224)Online publication date: 1-Apr-2014
  • (2008)Using lexico-semantic information for query expansion in passage retrieval for question answeringColing 2008: Proceedings of the 2nd workshop on Information Retrieval for Question Answering10.5555/1641451.1641458(50-57)Online publication date: 24-Aug-2008

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media