[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases. Research Track (ECML PKDD 2021)

Abstract

We develop a novel framework that adds the regularizers of the sparse group lasso to a family of adaptive optimizers in deep learning, such as Momentum, Adagrad, Adam, AMSGrad, AdaHessian, and create a new class of optimizers, which are named Group Momentum, Group Adagrad, Group Adam, Group AMSGrad and Group AdaHessian, etc., accordingly. We establish theoretically proven convergence guarantees in the stochastic convex settings, based on primal-dual methods. We evaluate the regularized effect of our new optimizers on three large-scale real-world ad click datasets with state-of-the-art deep learning models. The experimental results reveal that compared with the original optimizers with the post-processing procedure which uses the magnitude pruning method, the performance of the models can be significantly improved on the same sparsity level. Furthermore, in comparison to the cases without magnitude pruning, our methods can achieve extremely high sparsity with significantly better or highly competitive performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 103.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 129.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The codes will be released if the paper is accepted.

  2. 2.

    We only use the data from season 2 and 3 because of the same data schema.

  3. 3.

    See https://github.com/Atomu2014/Ads-RecSys-Datasets/ for details.

  4. 4.

    Limited by training resources available, we don’t use the optimal hyperparameter settings of [23].

References

  1. Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: Keeton, K., Roscoe, T. (eds.) 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, 2–4 November 2016, pp. 265–283. USENIX Association (2016). https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi

  2. Avazu: Avazu click-through rate prediction (2015). https://www.kaggle.com/c/avazu-ctr-prediction/data

  3. Criteo: Criteo display ad challenge (2014). http://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset

  4. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011). https://doi.org/10.5555/1953048.2021068

    Article  MathSciNet  MATH  Google Scholar 

  5. Graepel, T., Candela, J.Q., Borchert, T., Herbrich, R.: Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s Bing search engine. In: Fürnkranz, J., Joachims, T. (eds.) Proceedings of the 27th International Conference on Machine Learning, ICML 2010, Haifa, Israel, 21–24 June 2010, pp. 13–20. Omnipress (2010). https://icml.cc/Conferences/2010/papers/901.pdf

  6. Gupta, V., Koren, T., Singer, Y.: Shampoo: preconditioned stochastic tensor optimization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018, vol. 80, pp. 1837–1845. PMLR (2018). http://proceedings.mlr.press/v80/gupta18a.html

  7. Liao, H., Peng, L., Liu, Z., Shen, X.: IPinYou global RTB bidding algorithm competition (2013). https://www.kaggle.com/lastsummer/ipinyou

  8. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA (2015)

    Google Scholar 

  9. Littlestone, N.: From on-line to batch learning. In: Rivest, R.L., Haussler, D., Warmuth, M.K. (eds.) Proceedings of the 2nd Annual Workshop on Computational Learning Theory, COLT 1989, Santa Cruz, CA, USA, 31 July–2 August 1989, pp. 269–284. Morgan Kaufmann (1989). http://dl.acm.org/citation.cfm?id=93365

  10. McMahan, H.B.: Follow-the-regularized-leader and mirror descent: equivalence theorems and L1 regularization. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, AISTATS 2011, Fort Lauderdale, FL, USA, vol. 15, pp. 525–533. PMLR (2011)

    Google Scholar 

  11. McMahan, H.B., et al.: Ad click prediction: a view from the trenches. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, Illinois, USA, pp. 1222–1230. ACM (2013)

    Google Scholar 

  12. McMahan, H.B., Streeter, M.J.: Adaptive bound optimization for online convex optimization. In: The 23rd Conference on Learning Theory, COLT 2010, Haifa, Israel, 27–29 June 2010, pp. 244–256. Omnipress (2010). http://colt2010.haifa.il.ibm.com/papers/COLT2010proceedings.pdf#page=252

  13. Naumov, M., et al.: Deep learning recommendation model for personalization and recommendation systems. CoRR abs/1906.00091 (2019). http://arxiv.org/abs/1906.00091

  14. Nesterov, Y.E.: Smooth minimization of non-smooth functions. Math. Program. 103, 127–152 (2005)

    Article  MathSciNet  Google Scholar 

  15. Nesterov, Y.E.: Primal-dual subgradient methods for convex problems. Math. Program. 120(1), 221–259 (2009). https://doi.org/10.1007/s10107-007-0149-x

    Article  MathSciNet  MATH  Google Scholar 

  16. Ni, X., et al.: Feature selection for Facebook feed ranking system via a group-sparsity-regularized training algorithm. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM 2019, Beijing, China, pp. 2085–2088. ACM (2019)

    Google Scholar 

  17. Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964). https://doi.org/10.1016/0041-5553(64)90137-5

    Article  Google Scholar 

  18. Qu, Y., et al.: Product-based neural networks for user response prediction. In: Bonchi, F., Domingo-Ferrer, J., Baeza-Yates, R., Zhou, Z., Wu, X. (eds.) IEEE 16th International Conference on Data Mining, ICDM 2016, Barcelona, Spain, 12–15 December 2016, pp. 1149–1154. IEEE Computer Society (2016). https://doi.org/10.1109/ICDM.2016.0151

  19. Reddi, S.J., Kale, S., Kumar, S.: On the convergence of Adam and beyond. In: Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada. OpenReview.net (2018)

    Google Scholar 

  20. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Statist. 22(3), 400–407 (1951)

    Article  MathSciNet  Google Scholar 

  21. Rockafellar, R.T.: Convex Analysis (Princeton Landmarks in Mathematics and Physics). Princeton University Press (1970)

    Google Scholar 

  22. Scardapane, S., Comminiello, D., Hussain, A., Uncini, A.: Group sparse regularization for deep neural networks. Neurocomputing 241, 43–52 (2016). https://doi.org/10.1016/j.neucom.2017.02.029

    Article  Google Scholar 

  23. Wang, R., Fu, B., Fu, G., Wang, M.: Deep & cross network for ad click predictions. In: Proceedings of the ADKDD 2017, Halifax, NS, Canada, 13–17 August 2017, pp. 12:1–12:7. ACM (2017). https://doi.org/10.1145/3124749.3124754

  24. Xiao, L.: Dual averaging method for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11, 2543–2596 (2010). https://doi.org/10.5555/1756006.1953017

    Article  MathSciNet  MATH  Google Scholar 

  25. Yang, H., Xu, Z., King, I., Lyu, M.R.: Online learning for group lasso. In: Proceedings of the 27th International Conference on Machine Learning, ICML 2010, Haifa, Israel, pp. 1191–1198. Omnipress (2010)

    Google Scholar 

  26. Yao, Z., Gholami, A., Shen, S., Keutzer, K., Mahoney, M.W.: ADAHESSIAN: an adaptive second order optimizer for machine learning. CoRR abs/2006.00719 (2020). https://arxiv.org/abs/2006.00719

  27. Zeiler, M.D.: ADADELTA: an adaptive learning rate method. CoRR abs/1212.5701 (2012). https://arxiv.org/abs/1212.5701

  28. Zhu, M., Gupta, S.: To prune, or not to prune: exploring the efficacy of pruning for model compression. In: 6th International Conference on Learning Representations, ICLR 2018, Workshop Track Proceedings Vancouver, BC, Canada, 30 April–3 May 2018. OpenReview.net (2018). https://openreview.net/forum?id=Sy1iIDkPM

  29. Appendix. https://github.com/yadandan/adaptive_optimizers_with_sparse_group_lasso/blob/master/appendix.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yun Yue .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yue, Y. et al. (2021). Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12977. Springer, Cham. https://doi.org/10.1007/978-3-030-86523-8_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86523-8_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86522-1

  • Online ISBN: 978-3-030-86523-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics