[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1557019.1557048acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Large-scale behavioral targeting

Published: 28 June 2009 Publication History

Abstract

Behavioral targeting (BT) leverages historical user behavior to select the ads most relevant to users to display. The state-of-the-art of BT derives a linear Poisson regression model from fine-grained user behavioral data and predicts click-through rate (CTR) from user history. We designed and implemented a highly scalable and efficient solution to BT using Hadoop MapReduce framework. With our parallel algorithm and the resulting system, we can build above 450 BT-category models from the entire Yahoo's user base within one day, the scale that one can not even imagine with prior systems. Moreover, our approach has yielded 20% CTR lift over the existing production system by leveraging the well-grounded probabilistic model fitted from a much larger training dataset.
Specifically, our major contributions include: (1) A MapReduce statistical learning algorithm and implementation that achieve optimal data parallelism, task parallelism, and load balance in spite of the typically skewed distribution of domain data. (2) An in-place feature vector generation algorithm with linear time complexity O(n) regardless of the granularity of sliding target window. (3) An in-memory caching scheme that significantly reduces the number of disk IOs to make large-scale learning practical. (4) Highly efficient data structures and sparse representations of models and data to enable fast model updates. We believe that our work makes significant contributions to solving large-scale machine learning problems of industrial relevance in general. Finally, we report comprehensive experimental results, using industrial proprietary codebase and datasets.

Supplementary Material

JPG File (p209-chen.jpg)
MP4 File (p209-chen.mp4)

References

[1]
http://hadoop.apache.org/.
[2]
S. Agarwal, P. Renaker, and A. Smith. Determining ad targeting information and/or ad creative information using past search queries. U.S. Patent 10/813,925, filed: Mar 31, 2004.
[3]
A. C. Cameron and P. K. Trivedi. Regression Analysis of Count Data. Cambridge University Press, 1998.
[4]
J. Canny. GaP: a factor model for discrete data. ACM Conference on Information Retrieval (SIGIR 2004), pages 122--129, 2004.
[5]
J. Canny, S. Zhong, S. Gaffney, C. Brower, P. Berkhin, and G. H. John. Granular data for behavioral targeting. U.S. Patent Application 20090006363.
[6]
E. Chang. Scalable collaborative filtering algorithms for mining social networks. In The NIPS 2008 Workshop on "Beyond Search: Computational Intelligence for the Web, 2008.
[7]
Y. Chen, D. Pavlov, P. Berkhin, and J. Canny. Large-scale behavioral targeting for advertising over a network. U.S. Patent Application 12/351,749, filed: Jan 09, 2009.
[8]
C. Y. Chung, J. M. Koran, L.-J. Lin, and H. Yin. Model for generating user profiles in a behavioral targeting system. U.S. Patent 11/394,374, filed: Mar 29, 2006.
[9]
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008.
[10]
N. E. Gibbs, W. G. Poole, Jr., and P. K. Stockmeyer. A comparison of several bandwidth and profile reduction algorithms. ACM Transactions on Mathematical Software (TOMS), 2(3):322--330, 1976.
[11]
D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems (NIPS), 13:556--562, 2000.
[12]
D. A. Spielman and S.-H. Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM, 51(3), 2004.

Cited By

View all
  • (2022)A Deep Learning-Based Decision Support System for Mobile Performance MarketingInternational Journal of Information Technology & Decision Making10.1142/S021962202250047X22:02(679-703)Online publication date: 31-Aug-2022
  • (2022)MBTI BERT: A Transformer-Based Machine Learning Approach Using MBTI Model For Textual Inputs2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00338(2285-2292)Online publication date: Dec-2022
  • (2022)Proxy-Terms Based Query Obfuscation Technique for Private Web SearchIEEE Access10.1109/ACCESS.2022.314992910(17845-17863)Online publication date: 2022
  • Show More Cited By

Index Terms

  1. Large-scale behavioral targeting

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
    June 2009
    1426 pages
    ISBN:9781605584959
    DOI:10.1145/1557019
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 June 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. behavioral targeting
    2. grid computing
    3. large-scale

    Qualifiers

    • Research-article

    Conference

    KDD09

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)46
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 24 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)A Deep Learning-Based Decision Support System for Mobile Performance MarketingInternational Journal of Information Technology & Decision Making10.1142/S021962202250047X22:02(679-703)Online publication date: 31-Aug-2022
    • (2022)MBTI BERT: A Transformer-Based Machine Learning Approach Using MBTI Model For Textual Inputs2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00338(2285-2292)Online publication date: Dec-2022
    • (2022)Proxy-Terms Based Query Obfuscation Technique for Private Web SearchIEEE Access10.1109/ACCESS.2022.314992910(17845-17863)Online publication date: 2022
    • (2022)An economic analysis of maximally representative allocationsApplied Economics10.1080/00036846.2022.208237054:59(6744-6754)Online publication date: 1-Sep-2022
    • (2021)Temporal and cultural limits of privacy in smartphone app usageScientific Reports10.1038/s41598-021-82294-111:1Online publication date: 16-Feb-2021
    • (2020)Online Display Advertising MarketsInformation Systems Research10.1287/isre.2019.090231:2(556-575)Online publication date: 1-Jun-2020
    • (2020)Application of social networks users digital fingerprints to predict their information imageProceedings of the 13th International Conference on Theory and Practice of Electronic Governance10.1145/3428502.3428635(839-842)Online publication date: 23-Sep-2020
    • (2020)Impersonation-as-a-Service: Characterizing the Emerging Criminal Infrastructure for User Impersonation at ScaleProceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security10.1145/3372297.3417892(1665-1680)Online publication date: 30-Oct-2020
    • (2020)Designing for Trust: A Behavioral Framework for Sharing Economy PlatformsProceedings of The Web Conference 202010.1145/3366423.3380279(2133-2143)Online publication date: 20-Apr-2020
    • (2020)How Does Personification Impact Ad Performance and Empathy? An Experiment with Online AdvertisingInternational Journal of Human–Computer Interaction10.1080/10447318.2020.180924637:2(141-155)Online publication date: 26-Aug-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media