Abstract
Time plays important roles in Web search, because most Web pages contain time information and a lot of Web queries are time-related. However, traditional search engines have little consideration on the time information in Web pages. In particular, they do not take into account the time information of Web pages when ranking search results. In this paper, we present NTLM, a new time-enhanced language model based ranking algorithm for Web search. First, we present an effective algorithm to extract <keyword, content time > pairs for Web pages, which associate each keyword in a Web page with an appropriate content time. Then we introduce the new concept of temporal tf, the time-constrained term frequency, for each keyword. After that, we propose a time-enhanced language model to measure the similarity between temporal-textual queries and Web pages on the basis of the combination of textual relevance and temporal relevance. We conduct comparison experiments between NTLM and five competitor algorithms and use two datasets, different types of queries, and two metrics as MRR and NDCG to evaluate the performance. The experimental results show that in the step of extracting <keyword, content time > pairs, NTLM reaches a high precision of 93.2%, and in the ranking step, NTLM wins the best with respect to MRR and NDCG.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Metzler, D., Jones, R., Peng, F., Zhang, R.: Improving Search Relevance for Implicitly Temporal Queries. In: Proc. of SIGIR (2009)
Nunes, S., Ribeiro, C., David, G.: Use of Temporal Expressions in Web Search. In: Advances in Information Retrieval, Proc. of 30th European Conference on IR Research, ECIR, pp. 580–584 (2008)
ICTCLAS, http://www.ictclas.org/
Yamron, J.: Topic Detection and Tracking Segmentation Task. In: Proc. of the Topic Detection and Tracking Workshop (1997)
Ponte, J.M., Croft, W.B.: A Language Modeling Approach to Information Retrieval. In: Proc. of SIGIR, pp. 275–281 (1998)
Hiemstra, D.: Term-Specific Smoothing for the Language Modeling Approach to Information Retrieval The Importance of a Query Term. In: SIGIR, pp. 35–41 (2002)
Zhai, C., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Information Retrieval. ACM Transactions on Information Systems 22(2), 179–214 (2004)
Smucker, M.D., Allan, J.: An Investigation of Dirichlet Prior Smoothing’s Performance Advantage, Technical Report IR-548, Center for Intelligent Information Retrieval (CIIR), Department of Computer Science, University of Massachusetts Amherst (2007)
History section of China, http://zh.wikipedia.org/zh-cn/Category:中 国 历 史
The qq significant events segment, http://news.qq.com/topic/feature.htm
The qq tech segment, http://tech.qq.com/
The qq news segment, http://news.qq.com/
TREC Question Answering Track, http://trec.nist.gov/data/qamain.html
Jarvelin, K., Kekalainen, J.: Cumulated Gain-Based Evaluation of IR Techniques. ACM Transactions on Information Systems 20(4), 422–446 (2002)
Yoshioka, M., Haraguchi, M.: Study on the Combination of Probabilistic and Boolean IR Models for WWW Documents Retrieval. In: Proc. of NTCIR-4 WEB, pp. 9–16 (2004)
Baeza- Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Robertson, S. E., Walker, S.: Okapi/keenbow at TREC-8. NIST Special publication: The Eighth Text Retrieval Conference (TREC 8), p. 151 (1999)
Arıkan, E.: Exploiting Temporal References in Text Retrieval, Master’s Thesis in Computer Science, Saarbruecken University (2009)
Li, X., Croft, W.B.: Time-Based Language Models. In: Proc. of CIKM, pp. 469–475 (2003)
Wechsler, M.: The Probability Ranking Principle Revisited. Information Retrieval 3(3), 217–227 (2000)
Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: Proc. of WWW, pp. 107–117 (1998)
Deniz, E., Chris, F., Terence, J.P.: Chronica: a Temporal Web Search Engine. In: Proc. of ICWE, pp. 119–120 (2006)
Dyreson, C., Lin, H., Wang, Y.: Managing Versions of Web Documents in a Transaction-time Web Server. In: Proc. of WWW, pp. 422–432 (2004)
Berberich, K., Bedathur, S.J., Neumann, T., Weikum, G.: A Time Machine for Text Search. In: Proc. of SIGIR, pp. 519–526 (2007)
Yu, P.S., Li, X., Liu, B.: On the Temporal Dimension of Search. In: Proc. of WWW, poster, pp. 448–449 (2004)
Tezuka, T., Tanaka, K.: Temporal and spatial attribute extraction from web documents and time-specific regional web search system. In: Kwon, Y.-J., Bouju, A., Claramunt, C. (eds.) W2GIS 2004. LNCS, vol. 3428, pp. 14–25. Springer, Heidelberg (2005)
Song, F., Croft, W.B.: A General Language Model for Information Retrieval. In: Proc. of SIGIR, pp. 279–280 (1999)
Hiemstra, D.: Using Language Models for Information Retrieval, PhD thesis, University of Twente (2001)
Lafferty, J., Zhai, C.: Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In: Proc. of SIGIR, pp. 111–119 (2001)
Dakka, W., Gravano, L., Ipeirotis, P.G.: Answering General Time-Sensitive Queries. In: Proc. of CIKM, pp. 1437–1438 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, X., Jin, P., Zhao, X., Chen, H., Yue, L. (2011). NTLM: A Time-Enhanced Language Model Based Ranking Approach for Web Search. In: Chiu, D.K.W., et al. Web Information Systems Engineering – WISE 2010 Workshops. WISE 2010. Lecture Notes in Computer Science, vol 6724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24396-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-24396-7_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24395-0
Online ISBN: 978-3-642-24396-7
eBook Packages: Computer ScienceComputer Science (R0)