More Web Proxy on the site http://driver.im/

research-article

Ranking Relevance in Yahoo Search

Authors:

Changsung Kang,

Chikashi Nobata,

Jean-Marc Langlois,

Yi ChangAuthors Info & Claims

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 323 - 332

https://doi.org/10.1145/2939672.2939677

Published: 13 August 2016 Publication History

Abstract

Search engines play a crucial role in our daily lives. Relevance is the core problem of a commercial search engine. It has attracted thousands of researchers from both academia and industry and has been studied for decades. Relevance in a modern search engine has gone far beyond text matching, and now involves tremendous challenges. The semantic gap between queries and URLs is the main barrier for improving base relevance. Clicks help provide hints to improve relevance, but unfortunately for most tail queries, the click information is too sparse, noisy, or missing entirely. For comprehensive relevance, the recency and location sensitivity of results is also critical. In this paper, we give an overview of the solutions for relevance in the Yahoo search engine. We introduce three key techniques for base relevance -- ranking functions, semantic matching features and query rewriting. We also describe solutions for recency sensitive relevance and location sensitive relevance. This work builds upon 20 years of existing efforts on Yahoo search, summarizes the most recent advances and provides a series of practical relevance solutions. The performance reported is based on Yahoo's commercial search engine, where tens of billions of urls are indexed and served by the ranking system.

References

[1]

E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In SIGIR '06.

Digital Library

[2]

A. Broder, M. Fontoura, V. Josifovski, and L. Riedel. A semantic approach to contextual advertising. In SIGIR '07.

Digital Library

[3]

C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML '05.

Digital Library

[4]

C. J. C. Burges. From RankNet to LambdaRank to LambdaMART: An overview. Technical report, Microsoft Research, 2010.

[5]

Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In ICML '07.

Digital Library

[6]

O. Chapelle and Y. Chang. Yahoo! learning to rank challenge overview. In JMLR: Workshop and Conference Proceedings, pages 1--24, 2011.

[7]

A. Dong, Y. Chang, Z. Zheng, G. Mishne, J. Bai, R. Zhang, K. Buchner, C. Liao, and F. Diaz. Towards recency ranking in web search. In WSDM '10.

Digital Library

[8]

D. Downey, S. Dumais, and E. Horvitz. Heads and tails: Studies of web search with common and rare queries. In SIGIR '07.

Digital Library

[9]

G. E. Dupret and B. Piwowarski. A user browsing model to predict search engine click data from past observations. In SIGIR '08.

Digital Library

[10]

J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 2001.

[11]

J. Gao, X. He, S. Xie, and A. Ali. Learning lexicon models from search logs for query expansion. In EMNLP-CoNLL '12.

Digital Library

[12]

Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In VLDB '04.

[13]

P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning deep structured semantic models for web search using clickthrough data. In CIKM '13.

Digital Library

[14]

S. Huo, M. Zhang, Y. Liu, and S. Ma. Improving tail query performance by fusion model. In CIKM '14.

Digital Library

[15]

K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems, 20(4):422--446, Oct. 2002.

Digital Library

[16]

T. Joachims. Optimizing search engines using clickthrough data. In KDD '02.

Digital Library

[17]

A. Joshi, R. Kumar, B. Reed, and A. Tomkins. Anchor-based proximity measures. In WWW '07.

Digital Library

[18]

C. Liu, F. Guo, and C. Faloutsos. Bbm: bayesian browsing model from petabyte-scale data. In KDD '09.

Digital Library

[19]

C. Liu, F. Guo, and C. Faloutsos. Bayesian browsing model: Exact inference of document relevancfe from petabyte-scale data. ACM TKDD, 4(4):19:1--19:26, Oct. 2010.

Digital Library

[20]

B. Long, O. Chapelle, Y. Zhang, Y. Chang, Z. Zheng, and B. Tseng. Active learning for ranking through expected loss optimization. In SIGIR '10.

Digital Library

[21]

C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008.

[22]

D. Metzler and W. B. Croft. A markov random field model for term dependencies. In SIGIR '05.

Digital Library

[23]

C. Müller and I. Gurevych. A study on the semantic relatedness of query and document terms in information retrieval. In EMNLP '09.

[24]

L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 1999--66, Stanford InfoLab, November 1999. Previous number = SIDL-WP-1999-0120.

[25]

M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. In WWW '07.

Digital Library

[26]

S. Riezler and Y. Liu. Query rewriting using monolingual statistical machine translation. Computational Linguistics, 2010.

Digital Library

[27]

S. Robertson and H. Zaragoza. The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr., 3(4):333--389, Apr. 2009.

Digital Library

[28]

S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. Trec, 1994.

[29]

G. Salton, A. Wong, and C.-S. Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613--620, Nov. 1975.

Digital Library

[30]

R. Srikant, S. Basu, N. Wang, and D. Pregibon. User browsing models: relevance versus examination. In KDD '10.

Digital Library

[31]

I. Szpektor, A. Gionis, and Y. Maarek. Improving recommendation for long-tail queries via templates. In WWW '11.

Digital Library

[32]

Z. Zheng, K. Chen, G. Sun, and H. Zha. A regression framework for learning ranking functions using relative relevance judgments. In SIGIR '07.

Digital Library

[33]

K. Zhou, X. Li, and H. Zha. Collaborative ranking: Improving the relevance for tail queries. In CIKM '12.

Digital Library

Cited By

Wu XPuthenputhussery AShang HKang CFang Y(2024)Meta Learning to Rank for Sparsely Supervised QueriesACM Transactions on Information Systems10.1145/3698876Online publication date: 8-Oct-2024
https://doi.org/10.1145/3698876
Zheng KZhao HHuang RZhang BMou NNiu YSong YWang HGai KChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Full Stage Learning to Rank: A Unified Framework for Multi-Stage SystemsProceedings of the ACM Web Conference 202410.1145/3589334.3645523(3621-3631)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645523
Yigit-Sert SAltingovde IUlusoy Ö(2024)Diversity-aware strategies for static index pruningInformation Processing & Management10.1016/j.ipm.2024.10379561:5(103795)Online publication date: Sep-2024
https://doi.org/10.1016/j.ipm.2024.103795
Show More Cited By

Index Terms

Ranking Relevance in Yahoo Search
1. Information systems
  1. World Wide Web
    1. Web searching and information discovery
      1. Web search engines

Recommendations

Extracting search-focused key n-grams for relevance ranking in web search
WSDM '12: Proceedings of the fifth ACM international conference on Web search and data mining

In web search, relevance ranking of popular pages is relatively easy, because of the inclusion of strong signals such as anchor text and search log data. In contrast, with less popular pages, relevance ranking becomes very challenging due to a lack of ...
Quality-biased ranking for queries with commercial intent
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web

Modern search engines are good enough to answer popular commercial queries with mainly highly relevant documents. However, our experiments show that users behavior on such relevant commercial sites may differ from one to another web-site with the same ...
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

This work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 2016

2176 pages

ISBN:9781450342322

DOI:10.1145/2939672

General Chairs:
Balaji Krishnapuram
IBM
,
Mohak Shah
Bosch
,
Program Chairs:
Alex Smola
Amazon
,
Charu Aggarwal
IBM
,
Dou Shen
Baidu
,
Rajeev Rastogi
Amazon

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Best Paper

Author Tags

Qualifiers

Research-article

Conference

KDD '16

Sponsor:

KDD '16: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 13 - 17, 2016

California, San Francisco, USA

Acceptance Rates

KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

74
Total Citations
View Citations
1,246
Total Downloads

Downloads (Last 12 months)101
Downloads (Last 6 weeks)13

Reflects downloads up to 21 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu XPuthenputhussery AShang HKang CFang Y(2024)Meta Learning to Rank for Sparsely Supervised QueriesACM Transactions on Information Systems10.1145/3698876Online publication date: 8-Oct-2024
https://doi.org/10.1145/3698876
Zheng KZhao HHuang RZhang BMou NNiu YSong YWang HGai KChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Full Stage Learning to Rank: A Unified Framework for Multi-Stage SystemsProceedings of the ACM Web Conference 202410.1145/3589334.3645523(3621-3631)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645523
Yigit-Sert SAltingovde IUlusoy Ö(2024)Diversity-aware strategies for static index pruningInformation Processing & Management10.1016/j.ipm.2024.10379561:5(103795)Online publication date: Sep-2024
https://doi.org/10.1016/j.ipm.2024.103795
Banjar AShaheen AAmjad TAlharbey RDaud A(2024)Users’ satisfaction based ranking for Yahoo AnswersMultimedia Tools and Applications10.1007/s11042-024-18433-383:28(71265-71284)Online publication date: 7-Feb-2024
https://doi.org/10.1007/s11042-024-18433-3
Nardini FTrani RVenturini R(2024)Learning bivariate scoring functions for rankingDiscover Computing10.1007/s10791-024-09444-727:1Online publication date: 27-Sep-2024
https://doi.org/10.1007/s10791-024-09444-7
Vinoth Kumar SSiddique Ibrahim SShyamala Devi MChristopher Paul AMuralithran D(2024)Improving the Efficiency of Pattern Matching Algorithm in Image MiningProceedings of International Conference on Recent Trends in Computing10.1007/978-981-97-1724-8_47(547-560)Online publication date: 26-Jul-2024
https://doi.org/10.1007/978-981-97-1724-8_47
Rieger ADraws TMattis NMaxwell DElsweiler DGadiraju UMcKay DBozzon APera M(2024)Responsible Opinion Formation on Debated Topics in Web SearchAdvances in Information Retrieval10.1007/978-3-031-56066-8_32(437-465)Online publication date: 24-Mar-2024
https://dl.acm.org/doi/10.1007/978-3-031-56066-8_32
Bruch SNardini FIngber ALiberty E(2023)An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse VectorsACM Transactions on Information Systems10.1145/360979742:2(1-43)Online publication date: 8-Nov-2023
https://dl.acm.org/doi/10.1145/3609797
Bruch SGai SIngber A(2023)An Analysis of Fusion Functions for Hybrid RetrievalACM Transactions on Information Systems10.1145/359651242:1(1-35)Online publication date: 20-May-2023
https://dl.acm.org/doi/10.1145/3596512
Yafay EAltingovde IChen HDuh WHuang HKato MMothe JPoblete B(2023)Faster Dynamic Pruning via Reordering of Documents in Inverted IndexesProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591987(2001-2005)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591987
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents