[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/584792.584854acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Passage retrieval based on language models

Published: 04 November 2002 Publication History

Abstract

Previous research has shown that passage-level evidence can bring added benefits to document retrieval when documents are long or span different subject areas. Recent developments in language modeling approach to IR provided a new effective alternative to traditional retrieval models. These two streams of research motivate us to examine the use of passages in a language model framework. This paper reports on experiments using passages in a simple language model and a relevance model, and compares the results with document-based retrieval. Results from the INQUERY search engine, which is not based on a language modeling approach, are also given for comparison. Test data include two heterogeneous and one homogeneous document collections. Our experiments show that passage retrieval is feasible in the language modeling context, and more importantly, it can provide more reliable performance than retrieval based on full documents.

References

[1]
Allan, J. (1995). Relevance feedback with too much data. In E. A. Fox, P. Ingwersen, & R. Fidel (Eds.), Proceedings of the 18th annual international ACM-SIGIR conference on research and development in information retrieval, Seattle, WA, July (pp. 337--343).]]
[2]
Berger, A. and Lafferty, J. (1999). Information retrieval as statistical translation. In Proceedings on the 22nd annual international ACM SIGIR conference, pp. 222--229.]]
[3]
Buckley, C., Salton, G., Allan, J., and Singhal, A. (1995). Automatic query expansion using SMART: TREC 3. In Third Text Retrieval Conference (TREC-3) proceedings.]]
[4]
Callan, J.P. (1994). Passage-level evidence in document retrieval. In B.W. Croft & C.J. van Rijsbergen (Eds.), Proceedings of the 17th annual international ACM-SIGIR conference on research and developments in information retrieval, Dublin, Ireland, July (pp. 302--310), New York: ACM.]]
[5]
Hiemstra, D. (1998). A linguistically motivated probabilistic model of information retrieval. In Proceedings of the Second European Conference on Research and Advance Technology for Digital Libraries (ECDL), pp. 569--584.]]
[6]
Hearst, M. A. (1993). TextTiling, a quantitative approach to discourse segmentation. Technical Report 93/24 Sequoia 2000 Technical Report, University of California, Berkeley.]]
[7]
Hearst, M.A., & Plaunt, C. (1993). Subtopic structuring for full-length document access. In R. Korfhage, E. Rasmussen, & P. Willet (Eds.), Proceedings of the 16th annual international ACM-SIGIR conference on research and development in information retrieval, Pittsburgh, PA (pp.59--68), New York: ACM.]]
[8]
Kaszkiel, M. and Zobel, J. (1997). Passage retrieval revisited. In N. J. Belkin, D. Narasimhalu, & P. Willett (Eds.), Proceedings of the 20th annual international ACM-SIGIR conference on research and development in information retrieval, Philadelphia, PA (pp. 178--185).]]
[9]
Kaszkiel, M. and Zobel, J. (2001). Effective ranking with arbitrary passages. Journal of the American Society For Information Science and Technology, 52(4):344--364.]]
[10]
Lafferty, J. and Zhai, C. (2001). Document language models, query models, and risk minimization for information retrieval. In W.B. Croft, D.J. Harper, D.H. Kraft, & J. Zobel (Eds.), Proceedings of the 24th annual international ACM-SIGIR conference on research and development in information retrieval, New Orleans, Louisiana (pp.111--119), New York: ACM.]]
[11]
Lavrenko, V. and Croft, W.B. (2001). Relevance-based language models. In W.B. Croft, D.J. Harper, D.H. Kraft, & J. Zobel (Eds.), Proceedings of the 24th annual international ACM-SIGIR conference on research and development in information retrieval, New Orleans, Louisiana (pp.120--127), New York: ACM.]]
[12]
Lavrenko, V., Choquette, M., and Croft, W.B. (2002). Cross-lingual relevance models. To appear in Proceedings of the 25th annual international ACM-SIGIR conference on research and development in information retrieval.]]
[13]
Miller, D., Leek, T., and Schwartz, R. (1999). A hidden Markov model information retrieval system. In Proceedings of the 22nd annual international ACM SIGIR conference, pp. 214--221.]]
[14]
Ponte, J., and Croft, W.B. (1997). Text segmentation by topic. In Proceedings of the 1st European conference on research and advanced technology for digital libraries (pp. 113--125).]]
[15]
Ponte, J., and Croft, W.B. (1998). A language modelling approach to information retrieval. In Proceedings of the 21st annual international ACM-SIGIR conference on research and development in information retrieval (pp.275--281), New York: ACM.]]
[16]
Rosenfeld, R. (2000). Two decades of statistical language modelling: where do we go from here? In Proceedings of the IEEE, 88(8), 2000.]]
[17]
Salton, G., Allan, J., & Buckley, C. (1993). Approaches to passage retrieval in full text information systems. In R. Korfhage, E. Rasmussen, & P. Willet (Eds.), Proceedings of the 16th annual international ACM-SIGIR conference on research and development in information retrieval, Pittsburgh, PA (pp.49--58), New York: ACM.]]
[18]
Salton, G., Allan, J., and Singhal, A.K. (1996). Automatic text decomposition and structuring. Information Processing and Management, 32(2), 127--138.]]
[19]
Song, F., & Croft, W.B. (1999). A general language model for information retrieval. In Proceedings of the 22nd annual international ACM-SIGIR conference on research and development in information retrieval (pp.279--280), New York: ACM.]]
[20]
Turtle, H.R. (1990). Inference networks for document retrieval. Ph. D. thesis, University of Massachusetts, Amherst.]]
[21]
Zhai, C. and Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In W.B. Croft, D.J. Harper, D.H. Kraft, & J. Zobel (Eds.), Proceedings of the 24th annual international ACM-SIGIR conference on research and development in information retrieval, New Orleans, Louisiana (pp. 334--342), New York: ACM.]]
[22]
Zobel, J., Moffat, A., Wilkinson, R., and Sacks-Davis, R. (1995). Efficient retrieval of partial documents. Information Processing and Management, 31(3), 361--377.]]

Cited By

View all
  • (2024)Passage-aware Search Result DiversificationACM Transactions on Information Systems10.1145/365367242:5(1-29)Online publication date: 13-May-2024
  • (2024)Utilizing passage‐level relevance and kernel pooling for enhancing BERT‐based document rerankingComputational Intelligence10.1111/coin.1265640:3Online publication date: 7-Jun-2024
  • (2024)Periphoscape: Enhance Wikipedia Browsing by Presenting Diverse Aspects of TopicsWeb Information Systems Engineering – WISE 202410.1007/978-981-96-0579-8_25(352-366)Online publication date: 29-Nov-2024
  • Show More Cited By

Index Terms

  1. Passage retrieval based on language models

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management
    November 2002
    704 pages
    ISBN:1581134924
    DOI:10.1145/584792
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 November 2002

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. information retrieval
    2. language model
    3. passage retrieval

    Qualifiers

    • Article

    Conference

    CIKM02

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)34
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 27 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Passage-aware Search Result DiversificationACM Transactions on Information Systems10.1145/365367242:5(1-29)Online publication date: 13-May-2024
    • (2024)Utilizing passage‐level relevance and kernel pooling for enhancing BERT‐based document rerankingComputational Intelligence10.1111/coin.1265640:3Online publication date: 7-Jun-2024
    • (2024)Periphoscape: Enhance Wikipedia Browsing by Presenting Diverse Aspects of TopicsWeb Information Systems Engineering – WISE 202410.1007/978-981-96-0579-8_25(352-366)Online publication date: 29-Nov-2024
    • (2023)Limitations of Open-Domain Question Answering Benchmarks for Document-level ReasoningProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592011(2123-2128)Online publication date: 19-Jul-2023
    • (2023)AgAsk: an agent to help answer farmer’s questions from scientific documentsInternational Journal on Digital Libraries10.1007/s00799-023-00369-y25:4(569-584)Online publication date: 19-Jun-2023
    • (2022)Compute Query and Document Similarity using Explicit Semantic Analysis2022 6th International Conference on Computing Methodologies and Communication (ICCMC)10.1109/ICCMC53470.2022.9754087(761-766)Online publication date: 29-Mar-2022
    • (2022)Leveraging Document-Level and Query-Level Passage Cumulative Gain for Document RankingJournal of Computer Science and Technology10.1007/s11390-022-2031-y37:4(814-838)Online publication date: 30-Jul-2022
    • (2022)Kernel density estimation based factored relevance model for multi-contextual point-of-interest recommendationInformation Retrieval10.1007/s10791-021-09400-925:1(44-90)Online publication date: 1-Mar-2022
    • (2022)PARM: A Paragraph Aggregation Retrieval Model for Dense Document-to-Document RetrievalAdvances in Information Retrieval10.1007/978-3-030-99736-6_2(19-34)Online publication date: 10-Apr-2022
    • (2021)Decontextualization: Making Sentences Stand-AloneTransactions of the Association for Computational Linguistics10.1162/tacl_a_003779(447-461)Online publication date: 26-Apr-2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media