[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3404835.3463116acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections

GemNN: Gating-enhanced Multi-task Neural Networks with Feature Interaction Learning for CTR Prediction

Published: 11 July 2021 Publication History


Deep neural network (DNN) models have been widely used for click-through rate (CTR) prediction in online advertising. The training framework typically consists of embedding layers and multi-layer perceptions (MLP). At Baidu Search Ads (a.k.a. Phoenix Nest), the new generation of CTR training platform has become PaddleBox, a GPU-based parameter server system. In this paper, we present Baidu's recently updated CTR training framework, called Gating-enhanced Multi-task Neural Networks (GemNN). In particular, we develop a neural network based multi-task learning model to predict CTR in a coarse-to-fine manner, which gradually reduces ad candidates and allows parameter sharing from upstream tasks to downstream tasks to improve the training efficiency. Also, we introduce a gating mechanism between embedding layers and MLP to learn feature interactions and control the information flow fed to MLP layers. We have launched our solution in Baidu PaddleBox platform and observed considerable improvements in both offline and online evaluations. It is now part of the current production~system.


Alex Beutel, Paul Covington, Sagar Jain, Can Xu, Jia Li, Vince Gatto, and Ed H. Chi. 2018. Latent Cross: Making Use of Context in Recurrent Recommender Systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM). Marina Del Rey, CA, 46--54.
Andrei Broder. 2002. A taxonomy of web search. SIGIR Forum, Vol. 36, 2 (2002), 3--10.
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. 2016. Wide & Deep Learning for Recommender Systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems (DLRS@RecSys). Boston, MA, 7--10.
Weiyu Cheng, Yanyan Shen, and Linpeng Huang. 2020. Adaptive Factorization Network: Learning Adaptive-Order Feature Interactions. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI). New York, NY, 3609--3616.
Kyunghyun Cho, Bart van Merrienboer, cC aglar Gü lcc ehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar, 1724--1734.
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys). Boston, MA, 191--198.
Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz. 2007. Internet Advertising and the Generalized Second-Price Auction: Selling Billions of Dollars Worth of Keywords. American Economic Review, Vol. 97, 1 (March 2007), 242--259.
Bora Edizel, Amin Mantrach, and Xiao Bai. 2017. Deep Character-Level Click-Through Rate Prediction for Sponsored Search. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Shinjuku, Tokyo, 305--314.
Daniel C. Fain and Jan O. Pedersen. 2006. Sponsored search: A brief history. Bulletin of the American Society for Information Science and Technology, Vol. 32, 2 (2006), 12--13.
Miao Fan, Jiacheng Guo, Shuai Zhu, Shuo Miao, Mingming Sun, and Ping Li. 2019. MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu's Sponsored Search. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). Anchorage, AK, 2509--2517.
Hongliang Fei, Shulong Tan, Pengju Guo, Wenbo Zhang, Hongfang Zhang, and Ping Li. 2020. Sample Optimization For Display Advertising. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM). Virtual Event, Ireland, 2017--2020.
Yufei Feng, Fuyu Lv, Weichen Shen, Menghan Wang, Fei Sun, Yu Zhu, and Keping Yang. 2019. Deep Session Interest Network for Click-Through Rate Prediction. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI). Macao, China, 2301--2307.
Felix A. Gers, Jü rgen Schmidhuber, and Fred A. Cummins. 2000. Learning to Forget: Continual Prediction with LSTM. Neural Comput., Vol. 12, 10 (2000), 2451--2471.
Thore Graepel, Joaquin Qui n onero Candela, Thomas Borchert, and Ralf Herbrich. 2010. Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine. In Proceedings of the 27th International Conference on Machine Learning (ICML). Haifa, Israel, 13--20.
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI). Melbourne, Australia, 1725--1731.
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He, and Zhenhua Dong. 2018. DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction. CoRR, Vol. abs/1804.04950 (2018).
Wei Guo, Ruiming Tang, Huifeng Guo, Jianhua Han, Wen Yang, and Yuzhou Zhang. 2019. Order-aware Embedding Neural Network for CTR Prediction. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Paris, France, 1121--1124.
Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, and Joaquin Qui n onero Candela. 2014. Practical Lessons from Predicting Clicks on Ads at Facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising (ADKDD). New York City, NY, 5:1--5:9.
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry P. Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM). San Francisco, CA, 2333--2338.
Tongwen Huang, Qingyun She, Zhiqiang Wang, and Junlin Zhang. 2020. GateNet: Gating-Enhanced Deep Network for Click-Through Rate Prediction. arXiv preprint arXiv:2007.03519 (2020).
Tongwen Huang, Zhiqi Zhang, and Junlin Zhang. 2019. FiBiNET: combining feature importance and bilinear feature interaction for click-through rate prediction. In Proceedings of the 13th ACM Conference on Recommender Systems (RecSys). Copenhagen, Denmark, 169--177.
Yuchin Juan, Damien Lefortier, and Olivier Chapelle. 2017. Field-aware Factorization Machines in a Real-world Online Advertising System. In Proceedings of the 26th International Conference on World Wide Web Companion (WWW). Perth, Australia, 680--688.
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), Yoshua Bengio and Yann LeCun (Eds.). San Diego, CA.
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature, Vol. 521, 7553 (2015), 436--444.
Feng Li, Zhenrui Chen, Pengjie Wang, Yi Ren, Di Zhang, and Xiaoyu Zhu. 2019 a. Graph Intention Network for Click-through Rate Prediction in Sponsored Search. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Paris, France, 961--964.
Zeyu Li, Wei Cheng, Yang Chen, Haifeng Chen, and Wei Wang. 2020. Interpretable Click-Through Rate Prediction through Hierarchical Attention. In Proceedings of the Thirteenth ACM International Conference on Web Search and Data Mining (WSDM). Houston, TX, 313--321.
Zekun Li, Zeyu Cui, Shu Wu, Xiaoyu Zhang, and Liang Wang. 2019 b. Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM). Beijing, China, 539--548.
Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018. xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). London, UK, 1754--1763.
Bin Liu, Ruiming Tang, Yingzhi Chen, Jinkai Yu, Huifeng Guo, and Yuzhou Zhang. 2019. Feature Generation by Convolutional Neural Network for Click-Through Rate Prediction. In Proceedings of the World Wide Web Conference (WWW). San Francisco, CA, 1119--1129.
Bin Liu, Niannan Xue, Huifeng Guo, Ruiming Tang, Stefanos Zafeiriou, Xiuqiang He, and Zhenguo Li. 2020. AutoGroup: Automatic Feature Grouping for Modelling Explicit High-Order Feature Interactions in CTR Prediction. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval (SIGIR). Virtual Event, China, 199--208.
Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H. Chi. 2018. Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). London, UK, 1930--1939.
Junwei Pan, Jian Xu, Alfonso Lobos Ruiz, Wenliang Zhao, Shengjun Pan, Yu Sun, and Quan Lu. 2018. Field-weighted Factorization Machines for Click-Through Rate Prediction in Display Advertising. In Proceedings of the 2018 World Wide Web Conference on World Wide Web (WWW). Lyon, France, 1349--1357.
Qi Pi, Weijie Bian, Guorui Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). Anchorage, AK, 2671--2679.
Matthew Richardson, Ewa Dominowska, and Robert Ragno. 2007. Predicting Clicks: Estimating the Click-Through Rate for New Ads. In Proceedings of the 16th International Conference on World Wide Web (WWW). Banff, Canada, 521--530.
Jü rgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks, Vol. 61 (2015), 85--117.
Ying Shan, T. Ryan Hoens, Jian Jiao, Haijing Wang, Dong Yu, and J. C. Mao. 2016. Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). San Francisco, CA, 255--262.
Weiping Song, Chence Shi, Zhiping Xiao, Zhijian Duan, Yewen Xu, Ming Zhang, and Jian Tang. 2019. AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM). Beijing, China, 1161--1170.
Shulong Tan, Zhixin Zhou, Zhaozhuo Xu, and Ping Li. 2020. Fast Item Ranking under Neural Network based Measures. In International Conference on Web Search and Data Mining (WSDM) .
Michael Tsang, Dehua Cheng, Hanpeng Liu, Xue Feng, Eric Zhou, and Yan Liu. 2020. Feature Interaction Interpretability: A Case for Explaining Ad-Recommendation Systems via Neural Interaction Detection. In Proceedings of the 8th International Conference on Learning Representations (ICLR). Addis Ababa, Ethiopia.
Hal R. Varian. 2007. Position auctions. International Journal of Industrial Organization, Vol. 25, 6 (2007), 1163 -- 1178.
Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & Cross Network for Ad Click Predictions. In Proceedings of the ADKDD'17. Halifax, Canada, 12:1--12:7.
Ruoxi Wang, Rakesh Shivanna, Derek Z Cheng, Sagar Jain, Dong Lin, Lichan Hong, and Ed H Chi. 2020. DCN-M: Improved Deep & Cross Network for Feature Cross Learning in Web-scale Learning to Rank Systems. arXiv preprint arXiv:2008.13535 (2020).
Shu Wu, Feng Yu, Xueli Yu, Qiang Liu, Liang Wang, Tieniu Tan, Jie Shao, and Fan Huang. 2020. TFNet: Multi-Semantic Feature Interaction for CTR Prediction. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval (SIGIR). Virtual Event, China, 1885--1888.
Zhiqiang Xu, Dong Li, Weijie Zhao, Xing Shen, Tianbo Huang, Xiaoyun Li, and Ping Li. 2021. Agile and Accurate CTR Prediction Model Training for Massive-Scale Online Advertising Systems. In Proceedings of the 2021 International Conference on Management of Data (SIGMOD). Virtual Event, Xi'an, Shaanxi, China.
Niannan Xue, Bin Liu, Huifeng Guo, Ruiming Tang, Fengwei Zhou, Stefanos P Zafeiriou, Yuzhou Zhang, Jun Wang, and Zhenguo Li. 2020, early access. AutoHash: Learning Higher-order Feature Interactions for Deep CTR Prediction. IEEE Transactions on Knowledge and Data Engineering (2020, early access).
Tan Yu, Yi Yang, Yi Li, Xiaodong Chen, Mingming Sun, and Ping Li. 2020. Combo-Attention Network for Baidu Video Advertising. In Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). Virtual Event, CA, 2474--2482.
Shuangfei Zhai, Keng-hao Chang, Ruofei Zhang, and Zhongfei (Mark) Zhang. 2016. DeepIntent: Learning Attentions for Online Advertising with Recurrent Neural Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). San Francisco, CA, 1295--1304.
Weijie Zhao, Shulong Tan, and Ping Li. 2020 a. SONG: Approximate Nearest Neighbor Search on GPU. In Proceedings of the 35th IEEE International Conference on Data Engineering (ICDE). Dallas, TX, 1033--1044.
Weijie Zhao, Deping Xie, Ronglai Jia, Yulei Qian, Ruiquan Ding, Mingming Sun, and Ping Li. 2020 b. Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems. In Proceedings of the 3rd Conference on Machine Learning and Systems (MLSys). Austin, TX.
Weijie Zhao, Jingyuan Zhang, Deping Xie, Yulei Qian, Ronglai Jia, and Ping Li. 2019. AIBox: CTR Prediction Model Training on a Single Node. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM). Beijing, China, 319--328.
Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019 a. Deep Interest Evolution Network for Click-Through Rate Prediction. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI). Honolulu, HI, 5941--5948.
Guorui Zhou, Xiaoqiang Zhu, Chengru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep Interest Network for Click-Through Rate Prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). London, UK, 1059--1068.
Zhixin Zhou, Shulong Tan, Zhaozhuo Xu, and Ping Li. 2019 b. Möbius Transformation for Fast Inner Product Search on Graph. In Advances in Neural Information Processing Systems (NeurIPS). Vancouver, Canada, 8216--8227.

Cited By

View all
  • (2024)Modeling Long- and Short-Term Service Recommendations with a Deep Multi-Interest Network for Edge ComputingTsinghua Science and Technology10.26599/TST.2022.901005429:1(86-98)Online publication date: Feb-2024
  • (2024)AdaGIN: Adaptive Graph Interaction Network for Click-Through Rate PredictionACM Transactions on Information Systems10.1145/368178543:1(1-31)Online publication date: 4-Nov-2024
  • (2024)SimCEN: Simple Contrast-enhanced Network for CTR PredictionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681203(2311-2320)Online publication date: 28-Oct-2024
  • Show More Cited By

Index Terms

  1. GemNN: Gating-enhanced Multi-task Neural Networks with Feature Interaction Learning for CTR Prediction



    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors


    Published In

    cover image ACM Conferences
    SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2021
    2998 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 July 2021


    Request permissions for this article.

    Check for updates

    Author Tags

    1. CTR prediction
    2. feature interaction
    3. gating
    4. multi-task learning


    • Short-paper


    SIGIR '21

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)129
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 10 Dec 2024

    Other Metrics


    Cited By

    View all
    • (2024)Modeling Long- and Short-Term Service Recommendations with a Deep Multi-Interest Network for Edge ComputingTsinghua Science and Technology10.26599/TST.2022.901005429:1(86-98)Online publication date: Feb-2024
    • (2024)AdaGIN: Adaptive Graph Interaction Network for Click-Through Rate PredictionACM Transactions on Information Systems10.1145/368178543:1(1-31)Online publication date: 4-Nov-2024
    • (2024)SimCEN: Simple Contrast-enhanced Network for CTR PredictionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681203(2311-2320)Online publication date: 28-Oct-2024
    • (2024)Mitigating Sample Selection Bias with Robust Domain Adaption in Multimedia RecommendationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680615(7581-7590)Online publication date: 28-Oct-2024
    • (2024)GUITAR: Gradient Pruning toward Fast Neural RankingProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657728(163-173)Online publication date: 10-Jul-2024
    • (2024)Pull together: Option-weighting-enhanced mixture-of-experts knowledge tracingExpert Systems with Applications10.1016/j.eswa.2024.123419248(123419)Online publication date: Aug-2024
    • (2023)Task Aware Feature Extraction Framework for Sequential Dependence Multi-Task LearningProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608772(151-160)Online publication date: 14-Sep-2023
    • (2023)Towards Deeper, Lighter and Interpretable Cross Network for CTR PredictionProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615089(2523-2533)Online publication date: 21-Oct-2023
    • (2023)Asymmetric Hashing for Fast Ranking via Neural Network MeasuresProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591640(697-707)Online publication date: 19-Jul-2023
    • (2023)Sequence Separation-Based Modeling of Denoised Implicit Feedback Behavior2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)10.1109/IRI58017.2023.00056(283-288)Online publication date: Aug-2023
    • Show More Cited By

    View Options

    Login options

    View options


    View or Download as a PDF file.



    View online with eReader.








    Share this Publication link

    Share on social media