[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2911451.2911531acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Learning Query and Document Relevance from a Web-scale Click Graph

Published: 07 July 2016 Publication History

Abstract

Click-through logs over query-document pairs provide rich and valuable information for multiple tasks in information retrieval. This paper proposes a vector propagation algorithm on the click graph to learn vector representations for both queries and documents in the same semantic space. The proposed approach incorporates both click and content information, and the produced vector representations can directly improve ranking performance for queries and documents that have been observed in the click log. For new queries and documents that are not in the click log, we propose a two-step framework to generate the vector representation, which significantly improves the coverage of our vectors while maintaining the high quality. Experiments on Web-scale search logs from a major commercial search engine demonstrate the effectiveness and scalability of the proposed method. Evaluation results show that NDCG scores are significantly improved against multiple baselines by using the proposed method both as a ranking model and as a feature in a learning-to-rank framework.

References

[1]
E. Agichtein, E. Brill, S. Dumais, and R. Ragno. Learning user interaction models for predicting web search result preferences. In Proceedings of SIGIR, pages 3--10, 2006.
[2]
R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In Proceedings of Workshop at EDBT, pages 588--596, 2005.
[3]
R. Baeza-Yates and A. Tiberi. Extracting semantic relations from query logs. In Proceedings of SIGKDD, 2007.
[4]
D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Proceedings of SIGKDD, pages 407--416, 2000.
[5]
M. Bendersky, D. Metzler, and W. B. Croft. Effective query formulation with multiple information sources. In Proceedings of WSDM, pages 443--452, 2012.
[6]
A. Broder, E. Gabrilovich, V. Josifovski, G. Mavromatis, D. Metzler, and J. Wang. Exploiting site-level information to improve web search. In Proceedings of CIKM, pages 1393--1396, 2010.
[7]
O. Chapelle and Y. Chang. Yahoo! learning to rank challenge overview. Journal of Machine Learning, 2011.
[8]
K. Collins-Thompson and J. Callan. Query expansion using random walk models. In Proceedings of CIKM, pages 704--711, 2005.
[9]
N. Craswell and M. Szummer. Random walks on the click graph. In Proceedings of SIGIR, pages 239--246, 2007.
[10]
H. Deng, M. R. Lyu, and I. King. A generalized co-hits algorithm and its application to bipartite graphs. In Proceedings of SIGKDD, pages 239--248, 2009.
[11]
J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29:1189--1232, 2000.
[12]
J. Gao, X. He, and J. Nie. Clickthrough-based translation models for web search: from word models to phrase models. In Proceedings of CIKM, 2010.
[13]
J. Gao, W. Yuan, X. Li, K. Deng, and J. Nie. Smoothing clickthrough data for web search ranking. In Proceedings of SIGIR, 2009.
[14]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. JACM, 46(5):604--632, 1999.
[15]
M. J. Kusner, Y. Sun, N. I. Kolkin, and K. Q. Weinberger. From word embeddings to document distances. In Proceedings ICML, 2015.
[16]
X. Li, Y.-Y. Wang, and A. Acero. Learning query intent from regularized click graphs. In Proceedings of SIGIR, pages 339--346, 2008.
[17]
H. Ma, H. Yang, I. King, and M. R. Lyu. Learning latent semantic relations from clickthrough data for query suggestion. In Proceedings of CIKM, pages 709--718, 2008.
[18]
Q. Mei, D. Zhou, and K. Church. Query suggestion using hitting time. In Proceedings of CIKM, pages 469--478, 2008.
[19]
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In Processings of Workshop at ICLR, 2013.
[20]
C. Müller and I. Gurevych. A study on the semantic relatedness of query and document terms in information retrieval. In Proceedings of EMNLP, pages 1338--1347, 2009.
[21]
B. Poblete and R. Baeza-Yates. Query-sets: using implicit feedback and query patterns to organize web documents. In Proceedings of WWW, pages 41--50, 2008.
[22]
S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. Trec, 1994.
[23]
G. Salton, A. Wong, and C.-S. Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613--620, 1975.
[24]
X. Shen, B. Tan, and C. Zhai. Context-sensitive information retrieval using implicit feedback. In Proceedings of SIGIR, pages 43--50, 2005.
[25]
Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. Learning semantic representations using convolutional neural networks for web search. In Proceeding of WWW, 2014.
[26]
K. M. Svore and C. J. Burges. A machine learning approach for improved bm25 retrieval. Proceedings of CIKM, 2009.
[27]
W. Wu, H. Li, and J. Xu. Learning query and document similarities from click-through bipartite graph with metadata. In Proceedings of WSDM, pages 687--696, 2013.
[28]
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR, pages 334--342, 2001.
[29]
M. Zhukovskiy and T. Khatkevich. An optimization framework for propagation of query-document features by query similarity functions. In Proceedings of CIKM, 2015.

Cited By

View all
  • (2024)Boosting LLM-based Relevance Modeling with Distribution-Aware Robust LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680052(4718-4725)Online publication date: 21-Oct-2024
  • (2024)Revisiting Document Expansion and Filtering for Effective First-Stage RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657850(186-196)Online publication date: 10-Jul-2024
  • (2024)BDP: Bipartite Graph Adversarial Defense Algorithm Based on Graph Purification2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650844(1-9)Online publication date: 30-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
July 2016
1296 pages
ISBN:9781450340694
DOI:10.1145/2911451
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. click-through bipartite graph
  2. query-document relevance
  3. vector generation
  4. vector propagation
  5. web search

Qualifiers

  • Research-article

Conference

SIGIR '16
Sponsor:

Acceptance Rates

SIGIR '16 Paper Acceptance Rate 62 of 341 submissions, 18%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)1
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Boosting LLM-based Relevance Modeling with Distribution-Aware Robust LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680052(4718-4725)Online publication date: 21-Oct-2024
  • (2024)Revisiting Document Expansion and Filtering for Effective First-Stage RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657850(186-196)Online publication date: 10-Jul-2024
  • (2024)BDP: Bipartite Graph Adversarial Defense Algorithm Based on Graph Purification2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650844(1-9)Online publication date: 30-Jun-2024
  • (2023)PSLOG: Pretraining with Search Logs for Document RankingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599477(2072-2082)Online publication date: 6-Aug-2023
  • (2023)SINCERE: Sequential Interaction Networks representation learning on Co-Evolving RiEmannian manifoldsProceedings of the ACM Web Conference 202310.1145/3543507.3583353(360-371)Online publication date: 30-Apr-2023
  • (2023)Graph Enhanced BERT for Query UnderstandingProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591845(3315-3319)Online publication date: 19-Jul-2023
  • (2023)Session Search with Pre-trained Graph Classification ModelProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591766(953-962)Online publication date: 19-Jul-2023
  • (2023)Safe Deployment for Counterfactual Learning to Rank with Exposure-Based Risk MinimizationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591760(249-258)Online publication date: 19-Jul-2023
  • (2023)Stock trend prediction based on industry relationships driven hypergraph attention networksApplied Intelligence10.1007/s10489-023-05035-z53:23(29448-29464)Online publication date: 31-Oct-2023
  • (2022)Addressing Cold Start in Product Search via Empirical BayesProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557066(3141-3151)Online publication date: 17-Oct-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media