[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2939672.2939677acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Ranking Relevance in Yahoo Search

Published: 13 August 2016 Publication History

Abstract

Search engines play a crucial role in our daily lives. Relevance is the core problem of a commercial search engine. It has attracted thousands of researchers from both academia and industry and has been studied for decades. Relevance in a modern search engine has gone far beyond text matching, and now involves tremendous challenges. The semantic gap between queries and URLs is the main barrier for improving base relevance. Clicks help provide hints to improve relevance, but unfortunately for most tail queries, the click information is too sparse, noisy, or missing entirely. For comprehensive relevance, the recency and location sensitivity of results is also critical. In this paper, we give an overview of the solutions for relevance in the Yahoo search engine. We introduce three key techniques for base relevance -- ranking functions, semantic matching features and query rewriting. We also describe solutions for recency sensitive relevance and location sensitive relevance. This work builds upon 20 years of existing efforts on Yahoo search, summarizes the most recent advances and provides a series of practical relevance solutions. The performance reported is based on Yahoo's commercial search engine, where tens of billions of urls are indexed and served by the ranking system.

References

[1]
E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In SIGIR '06.
[2]
A. Broder, M. Fontoura, V. Josifovski, and L. Riedel. A semantic approach to contextual advertising. In SIGIR '07.
[3]
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML '05.
[4]
C. J. C. Burges. From RankNet to LambdaRank to LambdaMART: An overview. Technical report, Microsoft Research, 2010.
[5]
Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In ICML '07.
[6]
O. Chapelle and Y. Chang. Yahoo! learning to rank challenge overview. In JMLR: Workshop and Conference Proceedings, pages 1--24, 2011.
[7]
A. Dong, Y. Chang, Z. Zheng, G. Mishne, J. Bai, R. Zhang, K. Buchner, C. Liao, and F. Diaz. Towards recency ranking in web search. In WSDM '10.
[8]
D. Downey, S. Dumais, and E. Horvitz. Heads and tails: Studies of web search with common and rare queries. In SIGIR '07.
[9]
G. E. Dupret and B. Piwowarski. A user browsing model to predict search engine click data from past observations. In SIGIR '08.
[10]
J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 2001.
[11]
J. Gao, X. He, S. Xie, and A. Ali. Learning lexicon models from search logs for query expansion. In EMNLP-CoNLL '12.
[12]
Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In VLDB '04.
[13]
P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning deep structured semantic models for web search using clickthrough data. In CIKM '13.
[14]
S. Huo, M. Zhang, Y. Liu, and S. Ma. Improving tail query performance by fusion model. In CIKM '14.
[15]
K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems, 20(4):422--446, Oct. 2002.
[16]
T. Joachims. Optimizing search engines using clickthrough data. In KDD '02.
[17]
A. Joshi, R. Kumar, B. Reed, and A. Tomkins. Anchor-based proximity measures. In WWW '07.
[18]
C. Liu, F. Guo, and C. Faloutsos. Bbm: bayesian browsing model from petabyte-scale data. In KDD '09.
[19]
C. Liu, F. Guo, and C. Faloutsos. Bayesian browsing model: Exact inference of document relevancfe from petabyte-scale data. ACM TKDD, 4(4):19:1--19:26, Oct. 2010.
[20]
B. Long, O. Chapelle, Y. Zhang, Y. Chang, Z. Zheng, and B. Tseng. Active learning for ranking through expected loss optimization. In SIGIR '10.
[21]
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008.
[22]
D. Metzler and W. B. Croft. A markov random field model for term dependencies. In SIGIR '05.
[23]
C. Müller and I. Gurevych. A study on the semantic relatedness of query and document terms in information retrieval. In EMNLP '09.
[24]
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 1999--66, Stanford InfoLab, November 1999. Previous number = SIDL-WP-1999-0120.
[25]
M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. In WWW '07.
[26]
S. Riezler and Y. Liu. Query rewriting using monolingual statistical machine translation. Computational Linguistics, 2010.
[27]
S. Robertson and H. Zaragoza. The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr., 3(4):333--389, Apr. 2009.
[28]
S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. Trec, 1994.
[29]
G. Salton, A. Wong, and C.-S. Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613--620, Nov. 1975.
[30]
R. Srikant, S. Basu, N. Wang, and D. Pregibon. User browsing models: relevance versus examination. In KDD '10.
[31]
I. Szpektor, A. Gionis, and Y. Maarek. Improving recommendation for long-tail queries via templates. In WWW '11.
[32]
Z. Zheng, K. Chen, G. Sun, and H. Zha. A regression framework for learning ranking functions using relative relevance judgments. In SIGIR '07.
[33]
K. Zhou, X. Li, and H. Zha. Collaborative ranking: Improving the relevance for tail queries. In CIKM '12.

Cited By

View all
  • (2024)Meta Learning to Rank for Sparsely Supervised QueriesACM Transactions on Information Systems10.1145/3698876Online publication date: 8-Oct-2024
  • (2024)Full Stage Learning to Rank: A Unified Framework for Multi-Stage SystemsProceedings of the ACM Web Conference 202410.1145/3589334.3645523(3621-3631)Online publication date: 13-May-2024
  • (2024)Diversity-aware strategies for static index pruningInformation Processing & Management10.1016/j.ipm.2024.10379561:5(103795)Online publication date: Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
ISBN:9781450342322
DOI:10.1145/2939672
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2016

Permissions

Request permissions for this article.

Check for updates

Badges

  • Best Paper

Author Tags

  1. deep learning
  2. learning to rank
  3. query rewriting
  4. semantic matching

Qualifiers

  • Research-article

Conference

KDD '16
Sponsor:

Acceptance Rates

KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)101
  • Downloads (Last 6 weeks)13
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Meta Learning to Rank for Sparsely Supervised QueriesACM Transactions on Information Systems10.1145/3698876Online publication date: 8-Oct-2024
  • (2024)Full Stage Learning to Rank: A Unified Framework for Multi-Stage SystemsProceedings of the ACM Web Conference 202410.1145/3589334.3645523(3621-3631)Online publication date: 13-May-2024
  • (2024)Diversity-aware strategies for static index pruningInformation Processing & Management10.1016/j.ipm.2024.10379561:5(103795)Online publication date: Sep-2024
  • (2024)Users’ satisfaction based ranking for Yahoo AnswersMultimedia Tools and Applications10.1007/s11042-024-18433-383:28(71265-71284)Online publication date: 7-Feb-2024
  • (2024)Learning bivariate scoring functions for rankingDiscover Computing10.1007/s10791-024-09444-727:1Online publication date: 27-Sep-2024
  • (2024)Improving the Efficiency of Pattern Matching Algorithm in Image MiningProceedings of International Conference on Recent Trends in Computing10.1007/978-981-97-1724-8_47(547-560)Online publication date: 26-Jul-2024
  • (2024)Responsible Opinion Formation on Debated Topics in Web SearchAdvances in Information Retrieval10.1007/978-3-031-56066-8_32(437-465)Online publication date: 24-Mar-2024
  • (2023)An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse VectorsACM Transactions on Information Systems10.1145/360979742:2(1-43)Online publication date: 8-Nov-2023
  • (2023)An Analysis of Fusion Functions for Hybrid RetrievalACM Transactions on Information Systems10.1145/359651242:1(1-35)Online publication date: 20-May-2023
  • (2023)Faster Dynamic Pruning via Reordering of Documents in Inverted IndexesProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591987(2001-2005)Online publication date: 19-Jul-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media