[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/319950.320022acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article
Free access

A general language model for information retrieval

Published: 01 November 1999 Publication History

Abstract

Statistical language modeling has been successfully used for speech recognition, part-of-speech tagging, and syntactic parsing. Recently, it has also been applied to information retrieval. According to this new paradigm, each document is viewed as a language sample, and a query as a generation process. The retrieved documents are ranked based on the probabilities of producing a query from the corresponding language models of these documents. In this paper, we will present a new language model for information retrieval, which is based on a range of data smoothing techniques, including the Good-Turning estimate, curve-fitting functions, and model combinations. Our model is conceptually simple and intuitive, and can be easily extended to incorporate probabilities of phrases such as word pairs and word triples. The experiments with the Wall Street Journal and TREC4 data sets showed that the performance of our model is comparable to that of INQUERY and better than that of another language model for information retrieval. In particular, word pairs are shown to be useful in improving the retrieval performance.

References

[1]
Callan, J.P., Croft, W.B., and Broglio, J. TREC and TIPS~R cxpcrimonts with iNQUERY. Information Processing and Management, 31(3): 327-343, 1995.]]
[2]
Chamiak, E. Statistical Language Learning. The M1T Press, Cambridge MA, 1993.]]
[3]
Croft, W.B., and Turtle, H.R. Text Retrieval and Inference. In Text-Based Intelligent Systems, edj.'ted by Paul S. Jacob, pages 127-155, Lawrence Ertbaum Associates, Publishers, 1992.]]
[4]
Fralces, W.B., and Baeza-Yates, R. (editors). Information Retrieval: Data Structures and Algorithms. Englewood Cliffs, New Jersey: Prentice Hall, 1992.]]
[5]
Hiemstra, D. A Linguistically Motivated Probabilistic Model of Information Retrieval. Second European Conference on Digital Libraries, pages 569-584, 1998.]]
[6]
Leek, T., Miller, D.R.H., and Schwartz, R.M. A Hidden Markov Model Information Retrieval System. TREC-7 PrOngs, 1998.]]
[7]
Manning, C., and Schtitze, H. Foundations of Statistical Natural Language Prcr. essing. The M1T Press, 1999.]]
[8]
Miller, D.R.H., Leek, T., and Schwartz, R.M. A Hidden Markov Model Information Retrieval System. In Pro~edings of SIGIR99, pages 214-221. University of California, Berkeloy, Aug., 1999.]]
[9]
Ponte, J.M. A Language Modeling Approach to Information Retrieval. Ph.D. thesis, University of Massachusetts at Amherst, 1998.]]
[10]
Ponte, J.M., and Croft, W.B. A Language Modeling Approach to Information Retrieval. In PrOngs of SIGIR'98, pages 275-281. Melbourne, Ausaalia, 1998.]]
[11]
Robcrtson, S.E. The probability ranking principle in IR. journal of Documentation, 33(4): 294-304, Decem~r 1977.]]
[12]
Salton, G. Automatic Information Organization and Retrieval. McCrraw-Hill, 1968.]]

Cited By

View all
  • (2024)Language Statistics at Different Spatial, Temporal, and Grammatical ScalesEntropy10.3390/e2609073426:9(734)Online publication date: 29-Aug-2024
  • (2024)Helpful or Harmful? Exploring the Efficacy of Large Language Models for Online Grooming PreventionProceedings of the 2024 European Interdisciplinary Cybersecurity Conference10.1145/3655693.3655694(1-10)Online publication date: 5-Jun-2024
  • (2024)FIN2SUM: Advancing AI-Driven Financial Text Summarization with LLMs2024 International Conference on Trends in Quantum Computing and Emerging Business Technologies10.1109/TQCEBT59414.2024.10545078(1-5)Online publication date: 22-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '99: Proceedings of the eighth international conference on Information and knowledge management
November 1999
564 pages
ISBN:1581131461
DOI:10.1145/319950
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 1999

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. curve-fitting functions
  2. good-turing estimate
  3. model combinations
  4. statistical language modeling

Qualifiers

  • Article

Conference

CIKM99
Sponsor:
CIKM99: Conference on Information and Knowledge Management
November 2 - 6, 1999
Missouri, Kansas City, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)731
  • Downloads (Last 6 weeks)66
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Language Statistics at Different Spatial, Temporal, and Grammatical ScalesEntropy10.3390/e2609073426:9(734)Online publication date: 29-Aug-2024
  • (2024)Helpful or Harmful? Exploring the Efficacy of Large Language Models for Online Grooming PreventionProceedings of the 2024 European Interdisciplinary Cybersecurity Conference10.1145/3655693.3655694(1-10)Online publication date: 5-Jun-2024
  • (2024)FIN2SUM: Advancing AI-Driven Financial Text Summarization with LLMs2024 International Conference on Trends in Quantum Computing and Emerging Business Technologies10.1109/TQCEBT59414.2024.10545078(1-5)Online publication date: 22-Mar-2024
  • (2024)Missing Mass Under Random Duplications2024 IEEE International Symposium on Information Theory (ISIT)10.1109/ISIT57864.2024.10619664(522-526)Online publication date: 7-Jul-2024
  • (2024)LEq: Large Language Models Generate Expanded Queries for Searching2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT61001.2024.10725314(1-4)Online publication date: 24-Jun-2024
  • (2024)Iterative Self-Supervised Learning for Legal Similar Case RetrievalIEEE Access10.1109/ACCESS.2024.335862212(17231-17241)Online publication date: 2024
  • (2024)Confidence Intervals for Parameters of Unobserved EventsJournal of the American Statistical Association10.1080/01621459.2024.2314318(1-20)Online publication date: 7-Feb-2024
  • (2024)Building datasets to support information extraction and structure parsing from electronic theses and dissertationsInternational Journal on Digital Libraries10.1007/s00799-024-00395-425:2(175-196)Online publication date: 1-Jun-2024
  • (2024)KnowFIRES: A Knowledge-Graph Framework for Interpreting Retrieved Entities from SearchAdvances in Information Retrieval10.1007/978-3-031-56069-9_15(182-188)Online publication date: 23-Mar-2024
  • (2024)LaQuE: Enabling Entity Search at ScaleAdvances in Information Retrieval10.1007/978-3-031-56060-6_18(270-285)Online publication date: 16-Mar-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media