More Web Proxy on the site http://driver.im/

Article

Passage retrieval based on language models

Authors:

W. Bruce CroftAuthors Info & Claims

CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management

Pages 375 - 382

https://doi.org/10.1145/584792.584854

Published: 04 November 2002 Publication History

Abstract

Previous research has shown that passage-level evidence can bring added benefits to document retrieval when documents are long or span different subject areas. Recent developments in language modeling approach to IR provided a new effective alternative to traditional retrieval models. These two streams of research motivate us to examine the use of passages in a language model framework. This paper reports on experiments using passages in a simple language model and a relevance model, and compares the results with document-based retrieval. Results from the INQUERY search engine, which is not based on a language modeling approach, are also given for comparison. Test data include two heterogeneous and one homogeneous document collections. Our experiments show that passage retrieval is feasible in the language modeling context, and more importantly, it can provide more reliable performance than retrieval based on full documents.

References

[1]

Allan, J. (1995). Relevance feedback with too much data. In E. A. Fox, P. Ingwersen, & R. Fidel (Eds.), Proceedings of the 18th annual international ACM-SIGIR conference on research and development in information retrieval, Seattle, WA, July (pp. 337--343).]]

Digital Library

[2]

Berger, A. and Lafferty, J. (1999). Information retrieval as statistical translation. In Proceedings on the 22nd annual international ACM SIGIR conference, pp. 222--229.]]

Digital Library

[3]

Buckley, C., Salton, G., Allan, J., and Singhal, A. (1995). Automatic query expansion using SMART: TREC 3. In Third Text Retrieval Conference (TREC-3) proceedings.]]

[4]

Callan, J.P. (1994). Passage-level evidence in document retrieval. In B.W. Croft & C.J. van Rijsbergen (Eds.), Proceedings of the 17th annual international ACM-SIGIR conference on research and developments in information retrieval, Dublin, Ireland, July (pp. 302--310), New York: ACM.]]

Digital Library

[5]

Hiemstra, D. (1998). A linguistically motivated probabilistic model of information retrieval. In Proceedings of the Second European Conference on Research and Advance Technology for Digital Libraries (ECDL), pp. 569--584.]]

Digital Library

[6]

Hearst, M. A. (1993). TextTiling, a quantitative approach to discourse segmentation. Technical Report 93/24 Sequoia 2000 Technical Report, University of California, Berkeley.]]

Digital Library

[7]

Hearst, M.A., & Plaunt, C. (1993). Subtopic structuring for full-length document access. In R. Korfhage, E. Rasmussen, & P. Willet (Eds.), Proceedings of the 16th annual international ACM-SIGIR conference on research and development in information retrieval, Pittsburgh, PA (pp.59--68), New York: ACM.]]

Digital Library

[8]

Kaszkiel, M. and Zobel, J. (1997). Passage retrieval revisited. In N. J. Belkin, D. Narasimhalu, & P. Willett (Eds.), Proceedings of the 20th annual international ACM-SIGIR conference on research and development in information retrieval, Philadelphia, PA (pp. 178--185).]]

Digital Library

[9]

Kaszkiel, M. and Zobel, J. (2001). Effective ranking with arbitrary passages. Journal of the American Society For Information Science and Technology, 52(4):344--364.]]

[10]

Lafferty, J. and Zhai, C. (2001). Document language models, query models, and risk minimization for information retrieval. In W.B. Croft, D.J. Harper, D.H. Kraft, & J. Zobel (Eds.), Proceedings of the 24th annual international ACM-SIGIR conference on research and development in information retrieval, New Orleans, Louisiana (pp.111--119), New York: ACM.]]

Digital Library

[11]

Lavrenko, V. and Croft, W.B. (2001). Relevance-based language models. In W.B. Croft, D.J. Harper, D.H. Kraft, & J. Zobel (Eds.), Proceedings of the 24th annual international ACM-SIGIR conference on research and development in information retrieval, New Orleans, Louisiana (pp.120--127), New York: ACM.]]

Digital Library

[12]

Lavrenko, V., Choquette, M., and Croft, W.B. (2002). Cross-lingual relevance models. To appear in Proceedings of the 25th annual international ACM-SIGIR conference on research and development in information retrieval.]]

Digital Library

[13]

Miller, D., Leek, T., and Schwartz, R. (1999). A hidden Markov model information retrieval system. In Proceedings of the 22nd annual international ACM SIGIR conference, pp. 214--221.]]

Digital Library

[14]

Ponte, J., and Croft, W.B. (1997). Text segmentation by topic. In Proceedings of the 1st European conference on research and advanced technology for digital libraries (pp. 113--125).]]

Digital Library

[15]

Ponte, J., and Croft, W.B. (1998). A language modelling approach to information retrieval. In Proceedings of the 21st annual international ACM-SIGIR conference on research and development in information retrieval (pp.275--281), New York: ACM.]]

Digital Library

[16]

Rosenfeld, R. (2000). Two decades of statistical language modelling: where do we go from here? In Proceedings of the IEEE, 88(8), 2000.]]

[17]

Salton, G., Allan, J., & Buckley, C. (1993). Approaches to passage retrieval in full text information systems. In R. Korfhage, E. Rasmussen, & P. Willet (Eds.), Proceedings of the 16th annual international ACM-SIGIR conference on research and development in information retrieval, Pittsburgh, PA (pp.49--58), New York: ACM.]]

Digital Library

[18]

Salton, G., Allan, J., and Singhal, A.K. (1996). Automatic text decomposition and structuring. Information Processing and Management, 32(2), 127--138.]]

Digital Library

[19]

Song, F., & Croft, W.B. (1999). A general language model for information retrieval. In Proceedings of the 22nd annual international ACM-SIGIR conference on research and development in information retrieval (pp.279--280), New York: ACM.]]

Digital Library

[20]

Turtle, H.R. (1990). Inference networks for document retrieval. Ph. D. thesis, University of Massachusetts, Amherst.]]

Digital Library

[21]

Zhai, C. and Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In W.B. Croft, D.J. Harper, D.H. Kraft, & J. Zobel (Eds.), Proceedings of the 24th annual international ACM-SIGIR conference on research and development in information retrieval, New Orleans, Louisiana (pp. 334--342), New York: ACM.]]

Digital Library

[22]

Zobel, J., Moffat, A., Wilkinson, R., and Sacks-Davis, R. (1995). Efficient retrieval of partial documents. Information Processing and Management, 31(3), 361--377.]]

Digital Library

Cited By

Su ZDou ZZhu YWen J(2024)Passage-aware Search Result DiversificationACM Transactions on Information Systems10.1145/365367242:5(1-29)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3653672
Pan MZhou SLi TLiu YPei QHuang AHuang J(2024)Utilizing passage‐level relevance and kernel pooling for enhancing BERT‐based document rerankingComputational Intelligence10.1111/coin.1265640:3Online publication date: 7-Jun-2024
https://doi.org/10.1111/coin.12656
Sato S(2024)Periphoscape: Enhance Wikipedia Browsing by Presenting Diverse Aspects of TopicsWeb Information Systems Engineering – WISE 202410.1007/978-981-96-0579-8_25(352-366)Online publication date: 29-Nov-2024
https://doi.org/10.1007/978-981-96-0579-8_25
Show More Cited By

Index Terms

Passage retrieval based on language models
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Enhancing relevance models with adaptive passage retrieval
ECIR'08: Proceedings of the IR research, 30th European conference on Advances in information retrieval

Passage retrieval and pseudo relevance feedback/query expansion have been reported as two effective means for improving document retrieval in literature. Relevance models, while improving retrieval in most cases, hurts performance on some heterogeneous ...
Discriminative probabilistic models for passage based retrieval
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

The approach of using passage-level evidence for document retrieval has shown mixed results when it is applied to a variety of test beds with different characteristics. One main reason of the inconsistent performance is that there exists no unified ...
Completely-arbitrary passage retrieval in language modeling approach
AIRS'08: Proceedings of the 4th Asia information retrieval conference on Information retrieval technology

Passage retrieval has been expected to be an alternative method to re-solve length-normalization problem, since passages have more uniform lengths and topics, than documents. An important issue in the passage retrieval is to determine the type of the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management

November 2002

704 pages

ISBN:1581134924

DOI:10.1145/584792

General Chair:
Charles Nicholas
University of Maryland Baltimore County
,
Program Chairs:
David Grossman
Illinois Institute of Technology
,
Konstantinos Kalpakis
University of Maryland Baltimore County
,
Sajda Qureshi
Erasmus University, Rotterdam
,
Han van Dissel
Erasmus University, Rotterdam
,
Len Seligman
The MITRE Corporation

Copyright © 2002 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 November 2002

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

CIKM02

Sponsor:

CIKM02: Eleventh ACM International Conference on Information and Knowledge Management

November 4 - 9, 2002

Virginia, McLean, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

121
Total Citations
View Citations
1,116
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)2

Reflects downloads up to 27 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Su ZDou ZZhu YWen J(2024)Passage-aware Search Result DiversificationACM Transactions on Information Systems10.1145/365367242:5(1-29)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3653672
Pan MZhou SLi TLiu YPei QHuang AHuang J(2024)Utilizing passage‐level relevance and kernel pooling for enhancing BERT‐based document rerankingComputational Intelligence10.1111/coin.1265640:3Online publication date: 7-Jun-2024
https://doi.org/10.1111/coin.12656
Sato S(2024)Periphoscape: Enhance Wikipedia Browsing by Presenting Diverse Aspects of TopicsWeb Information Systems Engineering – WISE 202410.1007/978-981-96-0579-8_25(352-366)Online publication date: 29-Nov-2024
https://doi.org/10.1007/978-981-96-0579-8_25
Kamalloo EClarke CRafiei DChen HDuh WHuang HKato MMothe JPoblete B(2023)Limitations of Open-Domain Question Answering Benchmarks for Document-level ReasoningProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592011(2123-2128)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3592011
Koopman BMourad ALi HVegt AZhuang SGibson SDang YLawrence DZuccon G(2023)AgAsk: an agent to help answer farmer’s questions from scientific documentsInternational Journal on Digital Libraries10.1007/s00799-023-00369-y25:4(569-584)Online publication date: 19-Jun-2023
https://doi.org/10.1007/s00799-023-00369-y
Sangeetha MKeerthika PDevendran KSridhar SRaagav SVigneshwar T(2022)Compute Query and Document Similarity using Explicit Semantic Analysis2022 6th International Conference on Computing Methodologies and Communication (ICCMC)10.1109/ICCMC53470.2022.9754087(761-766)Online publication date: 29-Mar-2022
https://doi.org/10.1109/ICCMC53470.2022.9754087
Wu ZLiu YMao JZhang MMa S(2022)Leveraging Document-Level and Query-Level Passage Cumulative Gain for Document RankingJournal of Computer Science and Technology10.1007/s11390-022-2031-y37:4(814-838)Online publication date: 30-Jul-2022
https://doi.org/10.1007/s11390-022-2031-y
Chakraborty AGanguly DCaputo AJones G(2022)Kernel density estimation based factored relevance model for multi-contextual point-of-interest recommendationInformation Retrieval10.1007/s10791-021-09400-925:1(44-90)Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1007/s10791-021-09400-9
Althammer SHofstätter SSertkan MVerberne SHanbury A(2022)PARM: A Paragraph Aggregation Retrieval Model for Dense Document-to-Document RetrievalAdvances in Information Retrieval10.1007/978-3-030-99736-6_2(19-34)Online publication date: 10-Apr-2022
https://dl.acm.org/doi/10.1007/978-3-030-99736-6_2
Choi EPalomaki JLamm MKwiatkowski TDas DCollins M(2021)Decontextualization: Making Sentences Stand-AloneTransactions of the Association for Computational Linguistics10.1162/tacl_a_003779(447-461)Online publication date: 26-Apr-2021
https://doi.org/10.1162/tacl_a_00377
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents