[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3627535.3638481acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
poster

POSTER: Pattern-Aware Sparse Communication for Scalable Recommendation Model Training

Published: 20 February 2024 Publication History

Abstract

Recommendation models are an important category of deep learning models whose size is growing enormous. They consist of a sparse part with TBs of memory footprint and a dense part that demands PFLOPs of computing capability to train. Unfortunately, the high sparse communication cost to re-organize data for different parallel strategies of the two parts impedes the scalability in training.
Based on observations of sparse access patterns, we design a two-fold fine-grained parallel strategy to accelerate sparse communication. A performance model is built to select an optimal set of items that are replicated across all GPUs so that all-to-all communication volume is reduced, while keeping memory consumption acceptable. The all-to-all overhead is further reduced by parallel scheduling techniques. In our evaluation on 32 GPUs over real-world datasets, 2.16 -- 16.8× end-to-end speedup is achieved over the baselines.

References

[1]
Alimama. 2020. Ad Display/Click Data on Taobao.com. https://www.kaggle.com/datasets/pavansanagapati/ad-displayclick-data-on-taobaocom.
[2]
Criteo. 2014. Criteo 1TB Click Logs Dataset. https://ailab.criteo.com/download-criteo-1tb-click-logs-dataset.
[3]
Criteo. 2014. Criteo Display Advertising Challenge. http://www.kaggle.com/c/criteo-display-ad-challenge.
[4]
Dmytro Ivchenko, Dennis Van Der Staay, Colin Taylor, Xing Liu, Will Feng, Rahul Kindi, Anirudh Sudarshan, and Shahin Sefati. 2022. TorchRec: a PyTorch Domain Library for Recommendation Systems. In Proceedings of the 16th ACM Conference on Recommender Systems. 482--483.
[5]
Soojeong Kim, Gyeong-In Yu, Hojin Park, Sungwoo Cho, Eunji Jeong, Hyeonmin Ha, Sanha Lee, Joo Seong Jeong, and Byung-Gon Chun. 2019. Parallax: Sparsity-aware data parallel training of deep neural networks. In Proceedings of the Fourteenth EuroSys Conference 2019, Dresden, Germany, March 25--28, 2019, George Candea, Robbert van Renesse, and Christof Fetzer (Eds.). ACM, 43:1--43:15.
[6]
Xiangru Lian, Binhang Yuan, Xuefeng Zhu, Yulong Wang, Yongjun He, Honghuan Wu, Lei Sun, Haodong Lyu, Chengjun Liu, Xing Dong, et al. 2022. Persia: An open, hybrid system scaling deep learning-based recommenders up to 100 trillion parameters. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3288--3298.
[7]
Dheevatsa Mudigere, Yuchen Hao, Jianyu Huang, Zhihao Jia, Andrew Tulloch, Srinivas Sridharan, Xing Liu, Mustafa Ozdal, Jade Nie, Jongsoo Park, et al. 2022. Software-hardware co-design for fast and scalable training of deep learning recommendation models. In Proceedings of the 49th Annual International Symposium on Computer Architecture. 993--1011.
[8]
Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G Azzolini, et al. 2019. Deep learning recommendation model for personalization and recommendation systems. arXiv preprint arXiv:1906.00091 (2019).
[9]
Geet Sethi, Bilge Acun, Niket Agarwal, Christos Kozyrakis, Caroline Trippel, and Carole-Jean Wu. 2022. RecShard: statistical feature-based memory optimization for industry-scale neural recommendation. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 344--358.
[10]
Shashank Verma, Wenwen Gao, Hao Wu, Deyu Fu, and Tomasz Grel. 2022. Fast, Terabyte-Scale Recommender Training Made Easy with NVIDIA Merlin Distributed-Embeddings. https://github.com/NVIDIA-Merlin/distributed-embeddings.
[11]
Zehuan Wang, Yingcan Wei, Minseok Lee, Matthias Langer, Fan Yu, Jie Liu, Shijie Liu, Daniel G Abel, Xu Guo, Jianbing Dong, et al. 2022. Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference. In Proceedings of the 16th ACM Conference on Recommender Systems. 534--537.
[12]
Carole-Jean Wu, Robin Burke, Ed H Chi, Joseph Konstan, Julian McAuley, Yves Raimond, and Hao Zhang. 2020. Developing a recommendation benchmark for MLPerf training and inference. arXiv preprint arXiv:2003.07336 (2020).

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '24: Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming
March 2024
498 pages
ISBN:9798400704352
DOI:10.1145/3627535
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2024

Check for updates

Author Tags

  1. distributed deep learning
  2. parallelism

Qualifiers

  • Poster

Conference

PPoPP '24

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 189
    Total Downloads
  • Downloads (Last 12 months)189
  • Downloads (Last 6 weeks)31
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media