[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3534678.3539034acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

AutoShard: Automated Embedding Table Sharding for Recommender Systems

Published: 14 August 2022 Publication History

Abstract

Embedding learning is an important technique in deep recommendation models to map categorical features to dense vectors. However, the embedding tables often demand an extremely large number of parameters, which become the storage and efficiency bottlenecks. Distributed training solutions have been adopted to partition the embedding tables into multiple devices. However, the embedding tables can easily lead to imbalances if not carefully partitioned. This is a significant design challenge of distributed systems named embedding table sharding, i.e., how we should partition the embedding tables to balance the costs across devices, which is a non-trivial task because 1) it is hard to efficiently and precisely measure the cost, and 2) the partition problem is known to be NP-hard. In this work, we introduce our novel practice in Meta, namely AutoShard, which uses a neural cost model to directly predict the multi-table costs and leverages deep reinforcement learning to solve the partition problem. Experimental results on an open-sourced large-scale synthetic dataset and Meta's production dataset demonstrate the superiority of AutoShard over the heuristics. Moreover, the learned policy of AutoShard can transfer to sharding tasks with various numbers of tables and different ratios of the unseen tables without any fine-tuning. Furthermore, AutoShard can efficiently shard hundreds of tables in seconds. The effectiveness, transferability, and efficiency of AutoShard make it desirable for production use. Our algorithms have been deployed in Meta production environment. A prototype is available at https://github.com/daochenzha/autoshard

Supplemental Material

MP4 File
The video presents our novel practice in Meta, namely AutoShard for embedding table sharding. We introduce how we approach a load balance by using a neural cost model to predict the embedding costs and leveraging deep reinforcement learning to solve the partition problem. A prototype is open-sourced at https://github.com/daochenzha/autoshard

References

[1]
[n.d.]. Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine. https://github.com/amazon-archives/amazon-dsstne.
[2]
Bilge Acun, Matthew Murphy, Xiaodong Wang, Jade Nie, Carole-Jean Wu, and Kim Hazelwood. 2021. Understanding training efficiency of deep learning recommendation models at scale. In HPCA.
[3]
Thomas Barrett, William Clements, Jakob Foerster, and Alex Lvovsky. 2020. Exploratory combinatorial optimization with reinforcement learning. In AAAI.
[4]
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In DLRS Workshop.
[5]
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for YouTube recommendations. In RecSys.
[6]
Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. 2019. Autoaugment: Learning augmentation policies from data. In CVPR.
[7]
Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Vlad Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, et al . 2018. Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In ICML.
[8]
Carlos A Gomez-Uribe and Neil Hunt. 2015. The netflix recommender system: Algorithms, business value, and innovation. ACM Transactions on Management Information Systems (TMIS) 6, 4 (2015), 1--19.
[9]
Udit Gupta, Carole-Jean Wu, Xiaodong Wang, Maxim Naumov, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Mark Hempstead, Bill Jia, et al. 2020. The architectural implications of facebook's dnn-based personalized recommendation. In HPCA.
[10]
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In WWW.
[11]
Doo Seok Jeong and Cheol Seong Hwang. 2018. Nonvolatile memory materials for neuromorphic intelligent machines. Advanced Materials 30, 42 (2018), 1704729.
[12]
Biye Jiang, Chao Deng, Huimin Yi, Zelin Hu, Guorui Zhou, Yang Zheng, Sui Huang, Xinyang Guo, Dongyue Wang, Yue Song, et al. 2019. XDL: an industrial deep learning framework for high-dimensional sparse data. In KDD Workshop.
[13]
Manas R Joglekar, Cong Li, Mei Chen, Taibai Xu, Xiaoming Wang, Jay K Adams, Pranav Khaitan, Jiahui Liu, and Quoc V Le. 2020. Neural input search for large scale recommendation models. In KDD.
[14]
Wang-Cheng Kang, Derek Zhiyuan Cheng, Ting Chen, Xinyang Yi, Dong Lin, Lichan Hong, and Ed H Chi. 2020. Learning multi-granular quantized embeddings for large-vocab categorical features in recommender systems. In WWW.
[15]
Wang-Cheng Kang, Derek Zhiyuan Cheng, Tiansheng Yao, Xinyang Yi, Ting Chen, Lichan Hong, and Ed H Chi. 2021. Learning to Embed Categorical Features without Embedding Tables for Recommendation. In KDD.
[16]
Daya Khudia, Jianyu Huang, Protonu Basu, Summer Deng, Haixin Liu, Jongsoo Park, and Mikhail Smelyanskiy. 2021. FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference. arXiv preprint arXiv:2101.05615 (2021).
[17]
Heinrich Küttler, Nantas Nardelli, Thibaut Lavril, Marco Selvatici, Viswanath Sivakumar, Tim Rocktäschel, and Edward Grefenstette. 2019. Torchbeast: A pytorch platform for distributed rl. arXiv preprint arXiv:1910.03552 (2019).
[18]
Kwei-Herng Lai, Daochen Zha, Guanchu Wang, Junjie Xu, Yue Zhao, Devesh Kumar, Yile Chen, Purav Zumkhawaka, Minyang Wan, Diego Martinez, et al. 2021. TODS: An Automated Time Series Outlier Detection System. In AAAI.
[19]
Liam Li and Ameet Talwalkar. 2020. Random search and reproducibility for neural architecture search. In UAI.
[20]
Yuening Li, Zhengzhang Chen, Daochen Zha, Kaixiong Zhou, Haifeng Jin, Haifeng Chen, and Xia Hu. 2021. Automated Anomaly Detection via Curiosity-Guided Search and Self-Imitation Learning. IEEE Transactions on Neural Networks and Learning Systems (2021).
[21]
David C Liu, Stephanie Rogers, Raymond Shiau, Dmitry Kislyuk, Kevin C Ma, Zhigang Zhong, Jenny Liu, and Yushi Jing. 2017. Related pins at pinterest: The evolution of a real-world recommender system. In WWW.
[22]
Siyi Liu, Chen Gao, Yihong Chen, Depeng Jin, and Yong Li. 2021. Learnable Embedding sizes for Recommender Systems. In ICLR.
[23]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).
[24]
Maxim Naumov, John Kim, Dheevatsa Mudigere, Srinivas Sridharan, Xiaodong Wang, Whitney Zhao, Serhat Yilmaz, Changkyu Kim, Hector Yuen, Mustafa Ozdal, et al . 2020. Deep learning training in facebook data centers: Design of scale-up and scale-out systems. arXiv preprint arXiv:2003.09518 (2020).
[25]
Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole- Jean Wu, Alisson G Azzolini, et al . 2019. Deep learning recommendation model for personalization and recommendation systems. arXiv preprint arXiv:1906.00091 (2019).
[26]
Geet Sethi, Bilge Acun, Niket Agarwal, Christos Kozyrakis, Caroline Trippel, and Carole-Jean Wu. 2022. RecShard: Statistical Feature-Based Memory Optimization for Industry-Scale Neural Recommendation. In ASPLOS.
[27]
Hao-Jun Michael Shi, Dheevatsa Mudigere, Maxim Naumov, and Jiyan Yang. 2020. Compositional embeddings using complementary partitions for memory-efficient recommendation systems. In KDD.
[28]
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of go without human knowledge. nature (2017).
[29]
Weiping Song, Chence Shi, Zhiping Xiao, Zhijian Duan, Yewen Xu, Ming Zhang, and Jian Tang. 2019. Autoint: Automatic feature interaction learning via self- attentive neural networks. In CIKM.
[30]
Daochen Zha, Kwei-Herng Lai, Songyi Huang, Yuanpu Cao, Keerthana Reddy, Juan Vargas, Alex Nguyen, Ruzhe Wei, Junyu Guo, and Xia Hu. 2021. RLCard: a platform for reinforcement learning in card games. In IJCAI.
[31]
Daochen Zha, Kwei-Herng Lai, Mingyang Wan, and Xia Hu. 2020. Meta-AAD: Active anomaly detection with deep reinforcement learning. In ICDM.
[32]
Daochen Zha, Kwei-Herng Lai, Kaixiong Zhou, and Xia Hu. 2019. Experience Replay Optimization. In IJCAI.
[33]
Daochen Zha, Wenye Ma, Lei Yuan, Xia Hu, and Ji Liu. 2021. Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments. In ICLR.
[34]
Daochen Zha, Zaid Pervaiz Bhat, Yi-Wei Chen, Yicheng Wang, Sirui Ding, Anmoll Kumar Jain, Mohammad Qazim Bhat, Kwei-Herng Lai, Jiaben Chen, et al. 2022. AutoVideo: An Automated Video Action Recognition System. In IJCAI.
[35]
Daochen Zha, Jingru Xie, Wenye Ma, Sheng Zhang, Xiangru Lian, Xia Hu, and Ji Liu. 2021. DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning. In ICML.
[36]
Caojin Zhang, Yicun Liu, Yuanpu Xie, Sofia Ira Ktena, Alykhan Tejani, Akshay Gupta, Pranay Kumar Myana, Deepak Dilipkumar, Suvadip Paul, Ikuhiro Ihara, et al . 2020. Model size reduction using frequency based double hashing for recommender systems. In RecSys.
[37]
Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. 2019. Deep learning based recommender system: A survey and new perspectives. ACM Computing Surveys (CSUR) 52, 1 (2019), 1--38.
[38]
Weijie Zhao, Deping Xie, Ronglai Jia, Yulei Qian, Ruiquan Ding, Mingming Sun, and Ping Li. 2020. Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems. In MLSys.
[39]
Xiangyu Zhao, Chong Wang, Ming Chen, Xudong Zheng, Xiaobing Liu, and Jiliang Tang. 2020. Autoemb: Automated embedding dimensionality search in streaming recommendations. In SIGIR.
[40]
Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. In AAAI.
[41]
Barret Zoph and Quoc V Le. 2017. Neural architecture search with reinforcement learning. In ICLR.

Cited By

View all
  • (2024)DLRover-RM: Resource Optimization for Deep Recommendation Models Training in the CloudProceedings of the VLDB Endowment10.14778/3685800.368583217:12(4130-4144)Online publication date: 8-Nov-2024
  • (2024)RAP: Resource-aware Automated GPU Sharing for Multi-GPU Recommendation Model Training and Input PreprocessingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640406(964-979)Online publication date: 27-Apr-2024
  • (2024)Scaling User Modeling: Large-scale Online User Representations for Ads Personalization in MetaCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3648301(47-55)Online publication date: 13-May-2024
  • Show More Cited By

Index Terms

  1. AutoShard: Automated Embedding Table Sharding for Recommender Systems

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
      August 2022
      5033 pages
      ISBN:9781450393850
      DOI:10.1145/3534678
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 14 August 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. cost modeling
      2. recommender system
      3. reinforcement learning

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      KDD '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)300
      • Downloads (Last 6 weeks)26
      Reflects downloads up to 10 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)DLRover-RM: Resource Optimization for Deep Recommendation Models Training in the CloudProceedings of the VLDB Endowment10.14778/3685800.368583217:12(4130-4144)Online publication date: 8-Nov-2024
      • (2024)RAP: Resource-aware Automated GPU Sharing for Multi-GPU Recommendation Model Training and Input PreprocessingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640406(964-979)Online publication date: 27-Apr-2024
      • (2024)Scaling User Modeling: Large-scale Online User Representations for Ads Personalization in MetaCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3648301(47-55)Online publication date: 13-May-2024
      • (2024)AtRec: Accelerating Recommendation Model Training on CPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.338118635:6(905-918)Online publication date: Jun-2024
      • (2024)A Survey on Reinforcement Learning for Recommender SystemsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.328016135:10(13164-13184)Online publication date: Oct-2024
      • (2024)Exploiting Structured Feature and Runtime Isolation for High-Performant Recommendation ServingIEEE Transactions on Computers10.1109/TC.2024.344974973:11(2474-2487)Online publication date: Nov-2024
      • (2024)Heterogeneous Acceleration Pipeline for Recommendation System Training2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00081(1063-1079)Online publication date: 29-Jun-2024
      • (2023)DIVISIONProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619904(36036-36057)Online publication date: 23-Jul-2023
      • (2023)SurCoProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618810(10034-10052)Online publication date: 23-Jul-2023
      • (2023)UGACHE: A Unified GPU Cache for Embedding-based Deep LearningProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613169(627-641)Online publication date: 23-Oct-2023
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media