[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3472456.3472499acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article
Open access

Sparker: Efficient Reduction for More Scalable Machine Learning with Spark

Published: 05 October 2021 Publication History

Abstract

Machine learning applications on Spark suffers from poor scalability. In this paper, we reveal that the key reasons is the non-scalable reduction, which is restricted by the non-splittable object programming interface in Spark. This insight guides us to propose Sparker, Spark with Efficient Reduction. By providing a split aggregation interface, Sparker is able to perform split aggregation with scalable reduction while being backward compatible with existing applications. We implemented Sparker in 2,534 lines of code. Sparker can improve the aggregation performance by up to 6.47 × and can improve the end-to-end performance of MLlib model training by up to 3.69 × with a geometric mean of 1.81 × .

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, 2016. Tensorflow: a system for large-scale machine learning. In OSDI, Vol. 16. 265–283.
[2]
Mohammed Alfatafta, Zuhair AlSader, and Samer Al-Kiswany. 2018. COOL: A Cloud-Optimized Structure for MPI Collective Operations. In 2018 IEEE 11th International Conference on Cloud Computing (CLOUD). IEEE, 746–753.
[3]
Hrishikesh Amur, Wolfgang Richter, David G Andersen, Michael Kaminsky, Karsten Schwan, Athula Balachandran, and Erik Zawadzki. 2013. Memory-efficient groupby-aggregate using compressed buffer trees. In Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 18.
[4]
Mohammadreza Bayatpour, Sourav Chakraborty, Hari Subramoni, Xiaoyi Lu, and Dhabaleswar K Panda. 2017. Scalable reduction collectives with data partitioning-based multi-leader design. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 64.
[5]
Paolo Costa, Austin Donnelly, Antony Rowstron, and Greg O’Shea. 2012. Camdoop: Exploiting In-network Aggregation for Big Data Applications. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation(NSDI’12). USENIX Association, Berkeley, CA, USA, 3–3. http://dl.acm.org/citation.cfm?id=2228298.2228302
[6]
Jason (Jinquan) Dai, Yiheng Wang, Xin Qiu, Ding Ding, Yao Zhang, Yanzhang Wang, Xianyan Jia, Li (Cherry) Zhang, Yan Wan, Zhichao Li, Jiao Wang, Shengsheng Huang, Zhongyuan Wu, Yang Wang, Yuhao Yang, Bowen She, Dongjie Shi, Qi Lu, Kai Huang, and Guoqiong Song. 2019. BigDL: A Distributed Deep Learning Framework for Big Data. In Proceedings of the ACM Symposium on Cloud Computing(SoCC’19). Association for Computing Machinery, 50–60. https://doi.org/10.1145/3357223.3362707
[7]
Yifan Gong, Bingsheng He, and Jianlong Zhong. 2015. Network performance aware MPI collective communication operations in the cloud. IEEE Transactions on Parallel and Distributed Systems 26, 11 (2015), 3079–3089.
[8]
Pieter Hintjens. 2013. ZeroMQ: messaging for many applications. ” O’Reilly Media, Inc.”.
[9]
Jiawei Jiang, Bin Cui, Ce Zhang, and Fangcheng Fu. 2018. DimBoost: Boosting Gradient Boosting Decision Tree to Higher Dimensions. In Proceedings of the 2018 International Conference on Management of Data. ACM, 1363–1376.
[10]
Jie Jiang, Jiawei Jiang, Bin Cui, and Ce Zhang. 2017. TencentBoost: a gradient boosting tree system with parameter server. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). IEEE, 281–284.
[11]
Gautam Kumar, Ganesh Ananthanarayanan, Sylvia Ratnasamy, and Ion Stoica. 2016. Hold’em or fold’em?: aggregation queries under performance variations. In Proceedings of the Eleventh European Conference on Computer Systems. ACM, 7.
[12]
He Ma, Fei Mao, and Graham W. Taylor. 2017. Theano-MPI: A Theano-Based Distributed Training Framework. In Euro-Par 2016: Parallel Processing Workshops. Springer International Publishing, Cham, 800–813.
[13]
Amith R. Mamidala, Georgios Kollias, Chris Ward, and Fausto Artico. 2018. MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning. CoRR abs/1801.03855(2018). arxiv:1801.03855http://arxiv.org/abs/1801.03855
[14]
Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, 2016. Mllib: Machine learning in apache spark. The Journal of Machine Learning Research 17, 1 (2016), 1235–1241.
[15]
OSU Micro-Benchmarks. [n. d.]. http://mvapich.cse.ohio-state.edu/benchmarks/.
[16]
Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun. 2015. Making Sense of Performance in Data Analytics Frameworks. In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15). USENIX Association, Oakland, CA, 293–307. https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/ousterhout
[17]
Pitch Patarasuk and Xin Yuan. 2009. Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel and Distrib. Comput. 69, 2 (2009), 117 – 124. https://doi.org/10.1016/j.jpdc.2008.09.002
[18]
Veselin Raychev, Madanlal Musuvathi, and Todd Mytkowicz. 2015. Parallelizing user-defined aggregations using symbolic execution. In Proceedings of the 25th Symposium on Operating Systems Principles. ACM, 153–167.
[19]
Amedeo Sapio, Marco Canini, Chen-Yu Ho, Jacob Nelson, Panos Kalnis, Changhoon Kim, Arvind Krishnamurthy, Masoud Moshref, Dan RK Ports, and Peter Richtárik. 2019. Scaling Distributed Machine Learning with In-Network Aggregation. arXiv preprint arXiv:1903.06701(2019).
[20]
Marc Snir, William Gropp, Steve Otto, Steven Huss-Lederman, Jack Dongarra, and David Walker. 1998. MPI–the Complete Reference: The MPI core. Vol. 1. MIT press.
[21]
Rajeev Thakur, Rolf Rabenseifner, and William Gropp. 2005. Optimization of Collective Communication Operations in MPICH. Int. J. High Perform. Comput. Appl. 19, 1 (Feb. 2005), 49–66. https://doi.org/10.1177/1094342005051521
[22]
David W Walker and Jack J Dongarra. 1996. MPI: a standard message passing interface. Supercomputer 12(1996), 56–68.
[23]
Yuan Yu, Pradeep Kumar Gunda, and Michael Isard. 2009. Distributed aggregation for data-parallel computing: interfaces and implementations. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. ACM, 247–260.
[24]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation(NSDI’12). USENIX Association, Berkeley, CA, USA, 2–2. http://dl.acm.org/citation.cfm?id=2228298.2228301
[25]
Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica. 2016. Apache Spark: A Unified Engine for Big Data Processing. Commun. ACM 59, 11 (Oct. 2016), 56–65. https://doi.org/10.1145/2934664
[26]
Feng Zhang, Jidong Zhai, Xipeng Shen, Dalin Wang, Zheng Chen, Onur Mutlu, Wenguang Chen, and Xiaoyong Du. 2021. TADOC: Text analytics directly on compression. The VLDB Journal 30, 2 (2021), 163–188.

Cited By

View all
  • (2024)Towards a universal and portable assembly code size reduction: a case study of RISC-V ISACCF Transactions on High Performance Computing10.1007/s42514-024-00190-26:3(263-273)Online publication date: 17-May-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '21: Proceedings of the 50th International Conference on Parallel Processing
August 2021
927 pages
ISBN:9781450390682
DOI:10.1145/3472456
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2021

Check for updates

Author Tags

  1. Aggregation
  2. Machine Learning
  3. Reduction
  4. Spark

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICPP 2021

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)205
  • Downloads (Last 6 weeks)28
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Towards a universal and portable assembly code size reduction: a case study of RISC-V ISACCF Transactions on High Performance Computing10.1007/s42514-024-00190-26:3(263-273)Online publication date: 17-May-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media