[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1557019.1557081acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

BBM: bayesian browsing model from petabyte-scale data

Published: 28 June 2009 Publication History

Abstract

Given a quarter of petabyte click log data, how can we estimate the relevance of each URL for a given query? In this paper, we propose the Bayesian Browsing Model (BBM), a new modeling technique with following advantages: (a) it does exact inference; (b) it is single-pass and parallelizable; (c) it is effective.
We present two sets of experiments to test model effectiveness and efficiency. On the first set of over 50 million search instances of 1.1 million distinct queries, BBM out-performs the state-of-the-art competitor by 29.2% in log-likelihood while being 57 times faster. On the second click-log set, spanning a quarter of petabyte data, we showcase the scalability of BBM: we implemented it on a commercial MapReduce cluster, and it took only 3 hours to compute the relevance for 1.15 billion distinct query-URL pairs.

Supplementary Material

JPG File (p537-liu.jpg)
MP4 File (p537-liu.mp4)

References

[1]
E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In SIGIR '06, pages 19--26, 2006.
[2]
E. Agichtein, E. Brill, S. Dumais, and R. Ragno. Learning user interaction models for predicting web search result preferences. In SIGIR '06, pages 3--10, 2006.
[3]
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In PODS '02, pages 1--16, 2002.
[4]
R. Baeza-Yates. Applications of web query mining. Advances in Information Retrieval, pages 7--22, 2005.
[5]
R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In International Workshop on Clustering Information over the Web, 2004.
[6]
M. Bilenko and R. W. White. Mining the search trails of surfing crowds: identifying relevant websites from user activity. In WWW '08: Proceeding of the 17th international conference on World Wide Web, pages 51--60, 2008.
[7]
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML '05: Proceedings of the 22nd international conference on Machine learning, pages 89--96, 2005.
[8]
N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In WSDM '08: Proceedings of the first international conference on Web search and data mining, pages 87--94, 2008.
[9]
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI'04: Proceedings of the 6th conference on Symposium on Opearting Systems Design&Implementation, 2004.
[10]
P. Domingos and G. Hulten. Mining high-speed data streams. In KDD '00: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 71--80, 2000.
[11]
G. E. Dupret and B. Piwowarski. A user browsing model to predict search engine click data from past observations. In SIGIR '08, pages 331--338, 2008.
[12]
Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. Journal Machine Learning Research, 4:933--969, 2003.
[13]
M. M. Gaber, A. Zaslavsky, and S. Krishnaswamy. Mining data streams: a review. SIGMOD Record, 34(2):18--26, 2005.
[14]
C. Giannella, J. Han, J. Pei, X. Yan, and P. S. Yu. Mining frequent patterns in data streamsat multiple time granularities. Next Generation Data Mining, 2003.
[15]
S. Guha, A. Meyerson, N. Mishra, and R. Motwani. Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering, 15, 2003.
[16]
F. Guo, L. Li, and C. Faloutsos. Tailoring click models to user goals. In WSCD '09: Proceedings of the workshop on Web search click data, 2009.
[17]
F. Guo, C. Liu, and Y.-M. Wang. Efficient multiple-click models in web search. In WSDM '09, 2009.
[18]
T. Joachims. Optimizing search engines using clickthrough data. In KDD '02, pages 133--142, 2002.
[19]
T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In SIGIR '05, pages 154--161, 2005.
[20]
T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Transactions on Information Systems, 25(2):7, 2007.
[21]
T. Minka. Expectation propagation for approximate bayesian inference. In UAI '01: Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, pages 362--369, 2001.
[22]
F. Radlinski and T. Joachims. Query chains: learning to rank from implicit feedback. In KDD '05, pages 239--248, 2005.
[23]
M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 521--530, 2007.
[24]
M. Shokouhi1, F. Scholer, and A. Turpin. Investigating the effectiveness of clickthrough data for document reordering. In ECIR'08, pages 591--595, 2008.
[25]
C. Silverstein, H. Marais, M. Henzinger, and M. Moricz. Analysis of a very large web search engine query log. SIGIR Forum, 33(1):6--12, 1999.
[26]
A. Trotman. Learning to rank. Information Retrieval, 8(3):359--381, 2005.
[27]
M.-F. Tsai, T.-Y. Liu, T. Qin, H.-H. Chen, and W.-Y. Ma. FRank: a ranking method with fidelity loss. In SIGIR '07, pages 383--390, 2007.
[28]
S. Wedig and O. Madani. A large-scale analysis of query logs for assessing personalization opportunities. In KDD '06, pages 742--747, 2006.
[29]
G.-R. Xue, H.-J. Zeng, Z. Chen, Y. Yu, W.-Y. Ma, W. Xi, and W. Fan. Optimizing web search using web click-through data. In CIKM '04, pages 118--126, 2004.
[30]
J. S. Yedidia, W. T. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans. Inf. Theory, 51(7):2282--2312, 2005.
[31]
Z. Zhang and O. Nasraoui. Mining search engine query logs for query recommendation. In WWW'06, pages 1039--1040, 2006.

Cited By

View all
  • (2024)Probabilistic graph model and neural network perspective of click models for web searchKnowledge and Information Systems10.1007/s10115-024-02145-z66:10(5829-5873)Online publication date: 6-Jun-2024
  • (2024)MassiveClicks: A Massively-Parallel Framework for Efficient Click Models TrainingEuro-Par 2023: Parallel Processing Workshops10.1007/978-3-031-50684-0_18(232-245)Online publication date: 16-Apr-2024
  • (2022)ParClick: A Scalable Algorithm for EM-based Click ModelsProceedings of the ACM Web Conference 202210.1145/3485447.3511967(392-400)Online publication date: 25-Apr-2022
  • Show More Cited By

Index Terms

  1. BBM: bayesian browsing model from petabyte-scale data

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
    June 2009
    1426 pages
    ISBN:9781605584959
    DOI:10.1145/1557019
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 June 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. bayesian models
    2. click log analysis
    3. web search

    Qualifiers

    • Research-article

    Conference

    KDD09

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Probabilistic graph model and neural network perspective of click models for web searchKnowledge and Information Systems10.1007/s10115-024-02145-z66:10(5829-5873)Online publication date: 6-Jun-2024
    • (2024)MassiveClicks: A Massively-Parallel Framework for Efficient Click Models TrainingEuro-Par 2023: Parallel Processing Workshops10.1007/978-3-031-50684-0_18(232-245)Online publication date: 16-Apr-2024
    • (2022)ParClick: A Scalable Algorithm for EM-based Click ModelsProceedings of the ACM Web Conference 202210.1145/3485447.3511967(392-400)Online publication date: 25-Apr-2022
    • (2022)The redox-conditions controlled manganese carbonate mineralization in the Paleozoic Qiaerlong deep basin, Western Kunlun Mountains, ChinaOre Geology Reviews10.1016/j.oregeorev.2022.104993(104993)Online publication date: Jun-2022
    • (2020)Measuring and Improving User Experience Through Artificial Intelligence-Aided DesignFrontiers in Psychology10.3389/fpsyg.2020.59537411Online publication date: 19-Nov-2020
    • (2019)Investigating the Reliability of Click ModelsProceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3341981.3344242(125-128)Online publication date: 26-Sep-2019
    • (2019)SWeG: Lossless and Lossy Summarization of Web-Scale GraphsThe World Wide Web Conference10.1145/3308558.3313402(1679-1690)Online publication date: 13-May-2019
    • (2019)Micro-Browsing Models for Search Snippets2019 IEEE 35th International Conference on Data Engineering (ICDE)10.1109/ICDE.2019.00206(1904-1909)Online publication date: Apr-2019
    • (2017)Online Expectation-Maximization for Click ModelsProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3133053(2195-2198)Online publication date: 6-Nov-2017
    • (2017)What is skipped: Finding desirable items in e-commerce search by discovering the worst title tokens2017 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2017.8258078(1447-1456)Online publication date: Dec-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media