NTLM: A Time-Enhanced Language Model Based Ranking Approach for Web Search

Xiaowen Li²³,
Peiquan Jin²³,
Xujian Zhao²³,
Hong Chen²³ &
…
Lihua Yue²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6724))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1011 Accesses

Abstract

Time plays important roles in Web search, because most Web pages contain time information and a lot of Web queries are time-related. However, traditional search engines have little consideration on the time information in Web pages. In particular, they do not take into account the time information of Web pages when ranking search results. In this paper, we present NTLM, a new time-enhanced language model based ranking algorithm for Web search. First, we present an effective algorithm to extract <keyword, content time > pairs for Web pages, which associate each keyword in a Web page with an appropriate content time. Then we introduce the new concept of temporal tf, the time-constrained term frequency, for each keyword. After that, we propose a time-enhanced language model to measure the similarity between temporal-textual queries and Web pages on the basis of the combination of textual relevance and temporal relevance. We conduct comparison experiments between NTLM and five competitor algorithms and use two datasets, different types of queries, and two metrics as MRR and NDCG to evaluate the performance. The experimental results show that in the step of extracting <keyword, content time > pairs, NTLM reaches a high precision of 93.2%, and in the ranking step, NTLM wins the best with respect to MRR and NDCG.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Identifying top relevant dates for implicit time sensitive queries

Article 05 May 2017

Information Retrieval with Implicitly Temporal Queries

Temporal Information Retrieval and Its Application: A Survey

References

Metzler, D., Jones, R., Peng, F., Zhang, R.: Improving Search Relevance for Implicitly Temporal Queries. In: Proc. of SIGIR (2009)
Google Scholar
Nunes, S., Ribeiro, C., David, G.: Use of Temporal Expressions in Web Search. In: Advances in Information Retrieval, Proc. of 30th European Conference on IR Research, ECIR, pp. 580–584 (2008)
Google Scholar
ICTCLAS, http://www.ictclas.org/
Yamron, J.: Topic Detection and Tracking Segmentation Task. In: Proc. of the Topic Detection and Tracking Workshop (1997)
Google Scholar
Ponte, J.M., Croft, W.B.: A Language Modeling Approach to Information Retrieval. In: Proc. of SIGIR, pp. 275–281 (1998)
Google Scholar
Hiemstra, D.: Term-Specific Smoothing for the Language Modeling Approach to Information Retrieval The Importance of a Query Term. In: SIGIR, pp. 35–41 (2002)
Google Scholar
Zhai, C., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Information Retrieval. ACM Transactions on Information Systems 22(2), 179–214 (2004)
Article Google Scholar
Smucker, M.D., Allan, J.: An Investigation of Dirichlet Prior Smoothing’s Performance Advantage, Technical Report IR-548, Center for Intelligent Information Retrieval (CIIR), Department of Computer Science, University of Massachusetts Amherst (2007)
Google Scholar
History section of China, http://zh.wikipedia.org/zh-cn/Category:中国历史
Google Scholar
The qq significant events segment, http://news.qq.com/topic/feature.htm
The qq tech segment, http://tech.qq.com/
The qq news segment, http://news.qq.com/
TREC Question Answering Track, http://trec.nist.gov/data/qamain.html
Jarvelin, K., Kekalainen, J.: Cumulated Gain-Based Evaluation of IR Techniques. ACM Transactions on Information Systems 20(4), 422–446 (2002)
Article Google Scholar
Yoshioka, M., Haraguchi, M.: Study on the Combination of Probabilistic and Boolean IR Models for WWW Documents Retrieval. In: Proc. of NTCIR-4 WEB, pp. 9–16 (2004)
Google Scholar
Baeza- Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Google Scholar
Robertson, S. E., Walker, S.: Okapi/keenbow at TREC-8. NIST Special publication: The Eighth Text Retrieval Conference (TREC 8), p. 151 (1999)
Google Scholar
Arıkan, E.: Exploiting Temporal References in Text Retrieval, Master’s Thesis in Computer Science, Saarbruecken University (2009)
Google Scholar
Li, X., Croft, W.B.: Time-Based Language Models. In: Proc. of CIKM, pp. 469–475 (2003)
Google Scholar
Wechsler, M.: The Probability Ranking Principle Revisited. Information Retrieval 3(3), 217–227 (2000)
Article MATH Google Scholar
Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: Proc. of WWW, pp. 107–117 (1998)
Google Scholar
Deniz, E., Chris, F., Terence, J.P.: Chronica: a Temporal Web Search Engine. In: Proc. of ICWE, pp. 119–120 (2006)
Google Scholar
Dyreson, C., Lin, H., Wang, Y.: Managing Versions of Web Documents in a Transaction-time Web Server. In: Proc. of WWW, pp. 422–432 (2004)
Google Scholar
Berberich, K., Bedathur, S.J., Neumann, T., Weikum, G.: A Time Machine for Text Search. In: Proc. of SIGIR, pp. 519–526 (2007)
Google Scholar
Yu, P.S., Li, X., Liu, B.: On the Temporal Dimension of Search. In: Proc. of WWW, poster, pp. 448–449 (2004)
Google Scholar
Tezuka, T., Tanaka, K.: Temporal and spatial attribute extraction from web documents and time-specific regional web search system. In: Kwon, Y.-J., Bouju, A., Claramunt, C. (eds.) W2GIS 2004. LNCS, vol. 3428, pp. 14–25. Springer, Heidelberg (2005)
Chapter Google Scholar
Song, F., Croft, W.B.: A General Language Model for Information Retrieval. In: Proc. of SIGIR, pp. 279–280 (1999)
Google Scholar
Hiemstra, D.: Using Language Models for Information Retrieval, PhD thesis, University of Twente (2001)
Google Scholar
Lafferty, J., Zhai, C.: Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In: Proc. of SIGIR, pp. 111–119 (2001)
Google Scholar
Dakka, W., Gravano, L., Ipeirotis, P.G.: Answering General Time-Sensitive Queries. In: Proc. of CIKM, pp. 1437–1438 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, University of Science and Technology of China, 230027, Hefei, China
Xiaowen Li, Peiquan Jin, Xujian Zhao, Hong Chen & Lihua Yue

Authors

Xiaowen Li
View author publications
You can also search for this author in PubMed Google Scholar
Peiquan Jin
View author publications
You can also search for this author in PubMed Google Scholar
Xujian Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Hong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Lihua Yue
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dickson Computer Systems, 7A Victory Avenue 4/F Homantin, Kowloon, Hong Kong, China
Dickson K. W. Chiu
Ecole Nationale Supérieure de Mécanique et d’Aréotechnique, Laboratoire d’Informatique Scientifique et Industrielle, Téléport 2 - avenue Clément Ader, 86961, Futuroscope Chasseneuil Cedex, France
Ladjel Bellatreche
Dept. of Computer Science and Engineering, Ritsumeikan University, Wakakusa 6-4-10, 525-0045, Kusatu, Shiga, Japan
Hideyasu Sasaki
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kong, China
Ho-fung Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
Shing-Chi Cheung
School of Computer Science, Hangshou Dianzi University, Xiasha Higher Education Zone, 310018, Hanshou City, Zhejiang, China
Haiyang Hu
Department of Computer Science and Software Engineering, The University of Melbourne, 3010, Parkville, Victoria, Australia
Jie Shao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, X., Jin, P., Zhao, X., Chen, H., Yue, L. (2011). NTLM: A Time-Enhanced Language Model Based Ranking Approach for Web Search. In: Chiu, D.K.W., et al. Web Information Systems Engineering – WISE 2010 Workshops. WISE 2010. Lecture Notes in Computer Science, vol 6724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24396-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-24396-7_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24395-0
Online ISBN: 978-3-642-24396-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics