More Web Proxy on the site http://driver.im/

research-article

Open access

Sparker: Efficient Reduction for More Scalable Machine Learning with Spark

Authors:

Xiongchao Tang,

Wenguang ChenAuthors Info & Claims

ICPP '21: Proceedings of the 50th International Conference on Parallel Processing

Article No.: 58, Pages 1 - 11

https://doi.org/10.1145/3472456.3472499

Published: 05 October 2021 Publication History

All formats PDF

Abstract

Machine learning applications on Spark suffers from poor scalability. In this paper, we reveal that the key reasons is the non-scalable reduction, which is restricted by the non-splittable object programming interface in Spark. This insight guides us to propose Sparker, Spark with Efficient Reduction. By providing a split aggregation interface, Sparker is able to perform split aggregation with scalable reduction while being backward compatible with existing applications. We implemented Sparker in 2,534 lines of code. Sparker can improve the aggregation performance by up to 6.47 × and can improve the end-to-end performance of MLlib model training by up to 3.69 × with a geometric mean of 1.81 × .

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, 2016. Tensorflow: a system for large-scale machine learning. In OSDI, Vol. 16. 265–283.

[2]

Mohammed Alfatafta, Zuhair AlSader, and Samer Al-Kiswany. 2018. COOL: A Cloud-Optimized Structure for MPI Collective Operations. In 2018 IEEE 11th International Conference on Cloud Computing (CLOUD). IEEE, 746–753.

[3]

Hrishikesh Amur, Wolfgang Richter, David G Andersen, Michael Kaminsky, Karsten Schwan, Athula Balachandran, and Erik Zawadzki. 2013. Memory-efficient groupby-aggregate using compressed buffer trees. In Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 18.

Digital Library

[4]

Mohammadreza Bayatpour, Sourav Chakraborty, Hari Subramoni, Xiaoyi Lu, and Dhabaleswar K Panda. 2017. Scalable reduction collectives with data partitioning-based multi-leader design. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 64.

Digital Library

[5]

Paolo Costa, Austin Donnelly, Antony Rowstron, and Greg O’Shea. 2012. Camdoop: Exploiting In-network Aggregation for Big Data Applications. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation(NSDI’12). USENIX Association, Berkeley, CA, USA, 3–3. http://dl.acm.org/citation.cfm?id=2228298.2228302

[6]

Jason (Jinquan) Dai, Yiheng Wang, Xin Qiu, Ding Ding, Yao Zhang, Yanzhang Wang, Xianyan Jia, Li (Cherry) Zhang, Yan Wan, Zhichao Li, Jiao Wang, Shengsheng Huang, Zhongyuan Wu, Yang Wang, Yuhao Yang, Bowen She, Dongjie Shi, Qi Lu, Kai Huang, and Guoqiong Song. 2019. BigDL: A Distributed Deep Learning Framework for Big Data. In Proceedings of the ACM Symposium on Cloud Computing(SoCC’19). Association for Computing Machinery, 50–60. https://doi.org/10.1145/3357223.3362707

Digital Library

[7]

Yifan Gong, Bingsheng He, and Jianlong Zhong. 2015. Network performance aware MPI collective communication operations in the cloud. IEEE Transactions on Parallel and Distributed Systems 26, 11 (2015), 3079–3089.

Digital Library

[8]

Pieter Hintjens. 2013. ZeroMQ: messaging for many applications. ” O’Reilly Media, Inc.”.

[9]

Jiawei Jiang, Bin Cui, Ce Zhang, and Fangcheng Fu. 2018. DimBoost: Boosting Gradient Boosting Decision Tree to Higher Dimensions. In Proceedings of the 2018 International Conference on Management of Data. ACM, 1363–1376.

Digital Library

[10]

Jie Jiang, Jiawei Jiang, Bin Cui, and Ce Zhang. 2017. TencentBoost: a gradient boosting tree system with parameter server. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). IEEE, 281–284.

[11]

Gautam Kumar, Ganesh Ananthanarayanan, Sylvia Ratnasamy, and Ion Stoica. 2016. Hold’em or fold’em?: aggregation queries under performance variations. In Proceedings of the Eleventh European Conference on Computer Systems. ACM, 7.

Digital Library

[12]

He Ma, Fei Mao, and Graham W. Taylor. 2017. Theano-MPI: A Theano-Based Distributed Training Framework. In Euro-Par 2016: Parallel Processing Workshops. Springer International Publishing, Cham, 800–813.

[13]

Amith R. Mamidala, Georgios Kollias, Chris Ward, and Fausto Artico. 2018. MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning. CoRR abs/1801.03855(2018). arxiv:1801.03855http://arxiv.org/abs/1801.03855

[14]

Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, 2016. Mllib: Machine learning in apache spark. The Journal of Machine Learning Research 17, 1 (2016), 1235–1241.

Digital Library

[15]

OSU Micro-Benchmarks. [n. d.]. http://mvapich.cse.ohio-state.edu/benchmarks/.

[16]

Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun. 2015. Making Sense of Performance in Data Analytics Frameworks. In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15). USENIX Association, Oakland, CA, 293–307. https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/ousterhout

Digital Library

[17]

Pitch Patarasuk and Xin Yuan. 2009. Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel and Distrib. Comput. 69, 2 (2009), 117 – 124. https://doi.org/10.1016/j.jpdc.2008.09.002

Digital Library

[18]

Veselin Raychev, Madanlal Musuvathi, and Todd Mytkowicz. 2015. Parallelizing user-defined aggregations using symbolic execution. In Proceedings of the 25th Symposium on Operating Systems Principles. ACM, 153–167.

Digital Library

[19]

Amedeo Sapio, Marco Canini, Chen-Yu Ho, Jacob Nelson, Panos Kalnis, Changhoon Kim, Arvind Krishnamurthy, Masoud Moshref, Dan RK Ports, and Peter Richtárik. 2019. Scaling Distributed Machine Learning with In-Network Aggregation. arXiv preprint arXiv:1903.06701(2019).

[20]

Marc Snir, William Gropp, Steve Otto, Steven Huss-Lederman, Jack Dongarra, and David Walker. 1998. MPI–the Complete Reference: The MPI core. Vol. 1. MIT press.

[21]

Rajeev Thakur, Rolf Rabenseifner, and William Gropp. 2005. Optimization of Collective Communication Operations in MPICH. Int. J. High Perform. Comput. Appl. 19, 1 (Feb. 2005), 49–66. https://doi.org/10.1177/1094342005051521

Digital Library

[22]

David W Walker and Jack J Dongarra. 1996. MPI: a standard message passing interface. Supercomputer 12(1996), 56–68.

[23]

Yuan Yu, Pradeep Kumar Gunda, and Michael Isard. 2009. Distributed aggregation for data-parallel computing: interfaces and implementations. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. ACM, 247–260.

Digital Library

[24]

Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation(NSDI’12). USENIX Association, Berkeley, CA, USA, 2–2. http://dl.acm.org/citation.cfm?id=2228298.2228301

[25]

Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica. 2016. Apache Spark: A Unified Engine for Big Data Processing. Commun. ACM 59, 11 (Oct. 2016), 56–65. https://doi.org/10.1145/2934664

Digital Library

[26]

Feng Zhang, Jidong Zhai, Xipeng Shen, Dalin Wang, Zheng Chen, Onur Mutlu, Wenguang Chen, and Xiaoyong Du. 2021. TADOC: Text analytics directly on compression. The VLDB Journal 30, 2 (2021), 163–188.

Digital Library

Cited By

Liu JGao WLiang HPeng LWang T(2024)Towards a universal and portable assembly code size reduction: a case study of RISC-V ISACCF Transactions on High Performance Computing10.1007/s42514-024-00190-26:3(263-273)Online publication date: 17-May-2024
https://doi.org/10.1007/s42514-024-00190-2

Recommendations

MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools
SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing

We present MRNet, a software-based multicast/reduction network for building scalable performance and system administration tools. MRNet supports multiple simultaneous, asynchronous collective communication operations. MRNet is flexible, allowing tool ...
Anatomy of machine learning algorithm implementations in MPI, Spark, and Flink

With the ever-increasing need to analyze large amounts of data to get useful insights, it is essential to develop complex parallel machine learning algorithms that can scale with data and number of parallel processes. These algorithms need to run on ...
DSS: A Scalable and Efficient Stratified Sampling Algorithm for Large-Scale Datasets
Network and Parallel Computing
Abstract
Statistical analysis of aggregated records is widely used in various domains such as market research, sociological investigation and network analysis, etc. Stratified sampling (SS), which samples the population divided into distinct groups ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '21: Proceedings of the 50th International Conference on Parallel Processing

August 2021

927 pages

ISBN:9781450390682

DOI:10.1145/3472456

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2021

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICPP 2021

ICPP 2021: 50th International Conference on Parallel Processing

August 9 - 12, 2021

IL, Lemont, USA

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
556
Total Downloads

Downloads (Last 12 months)205
Downloads (Last 6 weeks)28

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu JGao WLiang HPeng LWang T(2024)Towards a universal and portable assembly code size reduction: a case study of RISC-V ISACCF Transactions on High Performance Computing10.1007/s42514-024-00190-26:3(263-273)Online publication date: 17-May-2024
https://doi.org/10.1007/s42514-024-00190-2

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents