More Web Proxy on the site http://driver.im/

research-article

BBM: bayesian browsing model from petabyte-scale data

Authors:

Christos FaloutsosAuthors Info & Claims

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 537 - 546

https://doi.org/10.1145/1557019.1557081

Published: 28 June 2009 Publication History

Abstract

Given a quarter of petabyte click log data, how can we estimate the relevance of each URL for a given query? In this paper, we propose the Bayesian Browsing Model (BBM), a new modeling technique with following advantages: (a) it does exact inference; (b) it is single-pass and parallelizable; (c) it is effective.

We present two sets of experiments to test model effectiveness and efficiency. On the first set of over 50 million search instances of 1.1 million distinct queries, BBM out-performs the state-of-the-art competitor by 29.2% in log-likelihood while being 57 times faster. On the second click-log set, spanning a quarter of petabyte data, we showcase the scalability of BBM: we implemented it on a commercial MapReduce cluster, and it took only 3 hours to compute the relevance for 1.15 billion distinct query-URL pairs.

Supplementary Material

JPG File (p537-liu.jpg)

Download
10.88 KB

MP4 File (p537-liu.mp4)

Download
94.70 MB

References

[1]

E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In SIGIR '06, pages 19--26, 2006.

Digital Library

[2]

E. Agichtein, E. Brill, S. Dumais, and R. Ragno. Learning user interaction models for predicting web search result preferences. In SIGIR '06, pages 3--10, 2006.

Digital Library

[3]

B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In PODS '02, pages 1--16, 2002.

Digital Library

[4]

R. Baeza-Yates. Applications of web query mining. Advances in Information Retrieval, pages 7--22, 2005.

Digital Library

[5]

R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In International Workshop on Clustering Information over the Web, 2004.

Digital Library

[6]

M. Bilenko and R. W. White. Mining the search trails of surfing crowds: identifying relevant websites from user activity. In WWW '08: Proceeding of the 17th international conference on World Wide Web, pages 51--60, 2008.

Digital Library

[7]

C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML '05: Proceedings of the 22nd international conference on Machine learning, pages 89--96, 2005.

Digital Library

[8]

N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In WSDM '08: Proceedings of the first international conference on Web search and data mining, pages 87--94, 2008.

Digital Library

[9]

J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI'04: Proceedings of the 6th conference on Symposium on Opearting Systems Design&Implementation, 2004.

Digital Library

[10]

P. Domingos and G. Hulten. Mining high-speed data streams. In KDD '00: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 71--80, 2000.

Digital Library

[11]

G. E. Dupret and B. Piwowarski. A user browsing model to predict search engine click data from past observations. In SIGIR '08, pages 331--338, 2008.

Digital Library

[12]

Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. Journal Machine Learning Research, 4:933--969, 2003.

Digital Library

[13]

M. M. Gaber, A. Zaslavsky, and S. Krishnaswamy. Mining data streams: a review. SIGMOD Record, 34(2):18--26, 2005.

Digital Library

[14]

C. Giannella, J. Han, J. Pei, X. Yan, and P. S. Yu. Mining frequent patterns in data streamsat multiple time granularities. Next Generation Data Mining, 2003.

[15]

S. Guha, A. Meyerson, N. Mishra, and R. Motwani. Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering, 15, 2003.

Digital Library

[16]

F. Guo, L. Li, and C. Faloutsos. Tailoring click models to user goals. In WSCD '09: Proceedings of the workshop on Web search click data, 2009.

Digital Library

[17]

F. Guo, C. Liu, and Y.-M. Wang. Efficient multiple-click models in web search. In WSDM '09, 2009.

Digital Library

[18]

T. Joachims. Optimizing search engines using clickthrough data. In KDD '02, pages 133--142, 2002.

Digital Library

[19]

T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In SIGIR '05, pages 154--161, 2005.

Digital Library

[20]

T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Transactions on Information Systems, 25(2):7, 2007.

Digital Library

[21]

T. Minka. Expectation propagation for approximate bayesian inference. In UAI '01: Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, pages 362--369, 2001.

Digital Library

[22]

F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. In KDD '05, pages 239--248, 2005.

Digital Library

[23]

M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 521--530, 2007.

Digital Library

[24]

M. Shokouhi1, F. Scholer, and A. Turpin. Investigating the effectiveness of clickthrough data for document reordering. In ECIR'08, pages 591--595, 2008.

Digital Library

[25]

C. Silverstein, H. Marais, M. Henzinger, and M. Moricz. Analysis of a very large web search engine query log. SIGIR Forum, 33(1):6--12, 1999.

Digital Library

[26]

A. Trotman. Learning to rank. Information Retrieval, 8(3):359--381, 2005.

Digital Library

[27]

M.-F. Tsai, T.-Y. Liu, T. Qin, H.-H. Chen, and W.-Y. Ma. FRank: a ranking method with fidelity loss. In SIGIR '07, pages 383--390, 2007.

Digital Library

[28]

S. Wedig and O. Madani. A large-scale analysis of query logs for assessing personalization opportunities. In KDD '06, pages 742--747, 2006.

Digital Library

[29]

G.-R. Xue, H.-J. Zeng, Z. Chen, Y. Yu, W.-Y. Ma, W. Xi, and W. Fan. Optimizing web search using web click-through data. In CIKM '04, pages 118--126, 2004.

Digital Library

[30]

J. S. Yedidia, W. T. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans. Inf. Theory, 51(7):2282--2312, 2005.

Digital Library

[31]

Z. Zhang and O. Nasraoui. Mining search engine query logs for query recommendation. In WWW'06, pages 1039--1040, 2006.

Digital Library

Cited By

Liu JWang YWang JWang MChu X(2024)Probabilistic graph model and neural network perspective of click models for web searchKnowledge and Information Systems10.1007/s10115-024-02145-z66:10(5829-5873)Online publication date: 6-Jun-2024
https://doi.org/10.1007/s10115-024-02145-z
Thijssen SKhandel PYates AVarbanescu A(2024)MassiveClicks: A Massively-Parallel Framework for Efficient Click Models TrainingEuro-Par 2023: Parallel Processing Workshops10.1007/978-3-031-50684-0_18(232-245)Online publication date: 16-Apr-2024
https://doi.org/10.1007/978-3-031-50684-0_18
Khandel PMarkov IYates AVarbanescu A(2022)ParClick: A Scalable Algorithm for EM-based Click ModelsProceedings of the ACM Web Conference 202210.1145/3485447.3511967(392-400)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3485447.3511967
Show More Cited By

Index Terms

BBM: bayesian browsing model from petabyte-scale data
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Efficient multiple-click models in web search
WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining

Many tasks that leverage web search users' implicit feedback rely on a proper and unbiased interpretation of user clicks. Previous eye-tracking experiments and studies on explaining position-bias of user clicks provide a spectrum of hypotheses and ...
Click chain model in web search
WWW '09: Proceedings of the 18th international conference on World wide web

Given a terabyte click log, can we build an efficient and effective click model? It is commonly believed that web search click logs are a gold mine for search business, because they reflect users' preference over web documents presented by the search ...
Bayesian Browsing Model: Exact Inference of Document Relevance from Petabyte-Scale Data

A fundamental challenge in utilizing Web search click data is to infer user-perceived relevance from the search log. Not only is the inference a difficult problem involving statistical reasonings but the bulky size, together with the ever-increasing ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

June 2009

1426 pages

ISBN:9781605584959

DOI:10.1145/1557019

General Chairs:
John Elder
Elder Research, Inc., USA
,
Françoise Soulié Fogelman
KXEN, France
,
Program Chairs:
Peter Flach
University of Bristol, UK
,
Mohammed Zaki
RPI, USA

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD09

Sponsor:

KDD09: The 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

June 28 - July 1, 2009

Paris, France

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

41
Total Citations
View Citations
727
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)2

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu JWang YWang JWang MChu X(2024)Probabilistic graph model and neural network perspective of click models for web searchKnowledge and Information Systems10.1007/s10115-024-02145-z66:10(5829-5873)Online publication date: 6-Jun-2024
https://doi.org/10.1007/s10115-024-02145-z
Thijssen SKhandel PYates AVarbanescu A(2024)MassiveClicks: A Massively-Parallel Framework for Efficient Click Models TrainingEuro-Par 2023: Parallel Processing Workshops10.1007/978-3-031-50684-0_18(232-245)Online publication date: 16-Apr-2024
https://doi.org/10.1007/978-3-031-50684-0_18
Khandel PMarkov IYates AVarbanescu A(2022)ParClick: A Scalable Algorithm for EM-based Click ModelsProceedings of the ACM Web Conference 202210.1145/3485447.3511967(392-400)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3485447.3511967
Li XWang EWu CZhao DHua BZhang ZXu JJin Z(2022)The redox-conditions controlled manganese carbonate mineralization in the Paleozoic Qiaerlong deep basin, Western Kunlun Mountains, ChinaOre Geology Reviews10.1016/j.oregeorev.2022.104993(104993)Online publication date: Jun-2022
https://doi.org/10.1016/j.oregeorev.2022.104993
Yang BWei LPu Z(2020)Measuring and Improving User Experience Through Artificial Intelligence-Aided DesignFrontiers in Psychology10.3389/fpsyg.2020.59537411Online publication date: 19-Nov-2020
https://doi.org/10.3389/fpsyg.2020.595374
Mao JChu ZLiu YZhang MMa SFang YZhang YAllan JBalog KCarterette BGuo J(2019)Investigating the Reliability of Click ModelsProceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3341981.3344242(125-128)Online publication date: 26-Sep-2019
https://dl.acm.org/doi/10.1145/3341981.3344242
Shin KGhoting AKim MRaghavan H(2019)SWeG: Lossless and Lossy Summarization of Web-Scale GraphsThe World Wide Web Conference10.1145/3308558.3313402(1679-1690)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308558.3313402
Islam MSrikant RBasu S(2019)Micro-Browsing Models for Search Snippets2019 IEEE 35th International Conference on Data Engineering (ICDE)10.1109/ICDE.2019.00206(1904-1909)Online publication date: Apr-2019
https://doi.org/10.1109/ICDE.2019.00206
Markov IBorisov Ade Rijke MLim EWinslett MSanderson MFu ASun JCulpepper SLo EHo JDonato DAgrawal RZheng YCastillo CSun ATseng VLi C(2017)Online Expectation-Maximization for Click ModelsProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3133053(2195-2198)Online publication date: 6-Nov-2017
https://dl.acm.org/doi/10.1145/3132847.3133053
Khan IKumar PMiranda DGoldberg D(2017)What is skipped: Finding desirable items in e-commerce search by discovering the worst title tokens2017 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2017.8258078(1447-1456)Online publication date: Dec-2017
https://doi.org/10.1109/BigData.2017.8258078
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten