[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3102980.3103004acmconferencesArticle/Chapter ViewAbstractPublication PageshotosConference Proceedingsconference-collections
research-article

A Case Against Tiny Tasks in Iterative Analytics

Published: 07 May 2017 Publication History

Abstract

Big data systems such as Spark are built around the idea of splitting an iterative parallel program into tiny tasks with other aspects of system design built around this basic design principle. Unfortunately, in spite of immense engineering effort, tiny tasks have unavoidably large overheads. We use the example of logistic regression -- a common machine learning primitive -- to compare the performance of Spark to different designs that converge to a hand-coded parallel MPI-based implementation. We conclude that Spark leaves orders of magnitude performance on the table, due to its insistence on setting the granularity of a task to a single iteration. We counter a common argument for the tiny task approach --namely better resilience to faults -- by demonstrating that optimum job checkpoint intervals are far longer than the duration of the tiny tasks favored in Spark's design. We propose an alternative approach that relies on an auto-parallelizing compiler tightly integrated with the MPI runtime, illustrating the opposite end of the spectrum where task granularities are as large as possible.

References

[1]
2015. Apache Spark Survey 2015 Report. http://go.databricks.com/2015-spark-survey/. (2015).
[2]
2017. Cori Supercomputer at NERSC. http://www.nersc.gov/users/computational-systems/cori/. (2017).
[3]
Bilge Acun, Abhishek Gupta, Nikhil Jain, Akhil Langer, Harshitha Menon, Eric Mikida, Xiang Ni, Michael Robson, Yanhua Sun, Ehsan Totoni, Lukasz Wesolowski, and Laxmikant Kale. 2014. Parallel Programming with Migratable Objects: Charm++ in Practice (SC'14).
[4]
George Almási, Philip Heidelberger, Charles J. Archer, Xavier Martorell, C. Chris Erway, José E. Moreira, B. Steinmacher-Burow, and Yili Zheng. 2005. Optimization of MPI Collective Communication on BlueGene/L Systems (ICS '05).
[5]
Michael Anderson, Shaden Smith, Narayanan Sundaram, Mihai Capota, Zheguang Zhao, Subramanya Dulloor, Nadathur Satish, and Theodore L Willke. 2017. Bridging the Gap Between HPC and Big Data Frameworks. Proceedings of the VLDB Endowment 10, 8 (2017).
[6]
M. Barnett, R. Littlefield, D. G. Payne, and R. van de Geijn. 1993. Global combine on mesh architectures with wormhole routing. In Proceedings Seventh International Parallel Processing Symposium.
[7]
Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA.
[8]
R. Choy and A. Edelman. 2005. Parallel MATLAB: Doing it Right. Proc. IEEE 93, 2 (2005).
[9]
Andrew Crotty, Alex Galakatos, Kayhan Dursun, Tim Kraska, Carsten Binnig, Ugur Cetintemel, and Stan Zdonik. 2015. An Architecture for Compiling UDF-centric Workflows. Proc. VLDB Endow. 8, 12 (Aug. 2015).
[10]
Databricks. 2015. Project Tungsten: Bringing Apache Spark Closer to Bare Metal. https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html. (2015).
[11]
Databricks. 2016. GPU Acceleration in Databricks: Speeding Up Deep Learning on Apache Spark. https://databricks.com/blog/2016/10/27/gpu-acceleration-in-databricks.html. (2016).
[12]
Jack Dongarra, Thomas Herault, and Yves Robert. 2015. Fault tolerance techniques for high-performance computing. In Fault-Tolerance Techniques for High-Performance Computing. Springer.
[13]
G. M. Essertel, R. Y. Tahboub, J. M. Decker, K. J. Brown, K. Olukotun, and T. Rompf. 2017. Flare: Native Compilation for Heterogeneous Workloads in Apache Spark. https://arxiv.org/abs/1703.08219. (2017).
[14]
Abhishek Gupta, Bilge Acun, Osman Sarood, and Laxmikant V. Kale. 2014. Towards Realizing the Potential of Malleable Parallel Jobs (HiPC '14).
[15]
Hartmut Kaiser, Thomas Heller, Bryce Adelstein-Lelbach, Adrian Serio, and Dietmar Fey. 2014. HPX: A Task Based Programming Model in a Global Address Space (PGAS '14).
[16]
Frank McSherry, Michael Isard, and Derek G. Murray. 2015. Scalability! But at What COST? (HotOS'15).
[17]
Kay Ousterhout, Aurojit Panda, Joshua Rosen, Shivaram Venkataraman, Reynold Xin, Sylvia Ratnasamy, Scott Shenker, and Ion Stoica. 2013. The Case for Tiny Tasks in Compute Clusters (HotOS'13).
[18]
Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun. 2015. Making Sense of Performance in Data Analytics Frameworks. In Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation (NSDI'15).
[19]
Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. 2013. Sparrow: Distributed, Low Latency Scheduling (SOSP'13).
[20]
Shoumik Palkar, James J. Thomas, Anil Shanbhag, Deepak Narayanan, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe, and Matei Zaharia. 2017. Weld: A Common Runtime for High Performance Data Analytics (CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research).
[21]
Suraj Prabhakaran, Marcel Neumann, Sebastian Rinke, Felix Wolf, Abhishek Gupta, and Laxmikant V. Kale. 2015. A Batch System with Efficient Adaptive Scheduling for Malleable and Evolving Applications (IPDPS '15).
[22]
Rajeev Thakur, Rolf Rabenseifner, and William Gropp. 2005. Optimization of Collective Communication Operations in MPICH. The International Journal of High Performance Computing Applications 19, 1 (2005), 49--66.
[23]
Ehsan Totoni, Todd A. Anderson, and Tatiana Shpeisman. 2017. HPAT: High Performance Analytics with Scripting Ease-of-Use. https://arxiv.org/abs/1611.04934/. (2017).
[24]
Ehsan Totoni, Todd A. Anderson, and Tatiana Shpeisman. 2017. HPAT: High Performance Analytics with Scripting Ease-of-Use (ICS'17 (to appear)).
[25]
E. Totoni, A. Bhatele, E. J. Bohm, N. Jain, C. L. Mendes, R. M. Mokos, G. Zheng, and L. V. Kale. 2011. Simulation-Based Performance Analysis and Tuning for a Two-Level Directly Connected System. In IEEE 17th International Conference on Parallel and Distributed Systems.
[26]
Ehsan Totoni, Wajih Ul Hassan, Todd A. Anderson, and Tatiana Shpeisman. 2017. HiFrames: High Performance Data Frames in a Scripting Language. https://arxiv.org/abs/1704.02341.(2017).
[27]
Shivaram Venkataraman, Aurojit Panda, Ali Ousterhout, Kay Ghodsi, Michael J. Franklin, Benjamin Recht, and Ion Stoica. 2017. Drizzle: Fast and Adaptable Stream Processing at Scale. http://shivaram.org/drafts/drizzle.pdf. (2017).
[28]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI'12).
[29]
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized Streams: Fault-tolerant Streaming Computation at Scale (SOSP'13).

Cited By

View all
  • (2023)The Tiny-Tasks Granularity Trade-Off: Balancing Overhead Versus Performance in Parallel SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.323371234:4(1128-1144)Online publication date: 1-Apr-2023
  • (2020)Heterogeneous MacroTasking (HeMT) for Parallel Processing in the CloudProceedings of the 2020 6th International Workshop on Container Technologies and Container Clouds10.1145/3429885.3429962(7-12)Online publication date: 7-Dec-2020
  • (2018)Decoupling the control plane from program control flow for flexibility and performance in cloud computingProceedings of the Thirteenth EuroSys Conference10.1145/3190508.3190516(1-13)Online publication date: 23-Apr-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
HotOS '17: Proceedings of the 16th Workshop on Hot Topics in Operating Systems
May 2017
185 pages
ISBN:9781450350686
DOI:10.1145/3102980
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 May 2017

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

HotOS '17
Sponsor:
HotOS '17: Workshop on Hot Topics in Operating Systems
May 7 - 10, 2017
BC, Whistler, Canada

Upcoming Conference

HOTOS '25
Workshop on Hot Topics in Operating Systems
May 14 - 16, 2025
Banff or Lake Louise , AB , Canada

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)The Tiny-Tasks Granularity Trade-Off: Balancing Overhead Versus Performance in Parallel SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.323371234:4(1128-1144)Online publication date: 1-Apr-2023
  • (2020)Heterogeneous MacroTasking (HeMT) for Parallel Processing in the CloudProceedings of the 2020 6th International Workshop on Container Technologies and Container Clouds10.1145/3429885.3429962(7-12)Online publication date: 7-Dec-2020
  • (2018)Decoupling the control plane from program control flow for flexibility and performance in cloud computingProceedings of the Thirteenth EuroSys Conference10.1145/3190508.3190516(1-13)Online publication date: 23-Apr-2018
  • (2018)Exploring HPC and Big Data Convergence: A Graph Processing Study on Intel Knights Landing2018 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2018.00019(66-77)Online publication date: Sep-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media