[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2384616.2384639acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

Work-stealing without the baggage

Published: 19 October 2012 Publication History

Abstract

Work-stealing is a promising approach for effectively exploiting software parallelism on parallel hardware. A programmer who uses work-stealing explicitly identifies potential parallelism and the runtime then schedules work, keeping otherwise idle hardware busy while relieving overloaded hardware of its burden. Prior work has demonstrated that work-stealing is very effective in practice. However, work-stealing comes with a substantial overhead: as much as 2x to 12x slowdown over orthodox sequential code.
In this paper we identify the key sources of overhead in work-stealing schedulers and present two significant refinements to their implementation. We evaluate our work-stealing designs using a range of benchmarks, four different work-stealing implementations, including the popular fork-join framework, and a range of architectures. On these benchmarks, compared to orthodox sequential Java, our fastest design has an overhead of just 15%. By contrast, fork-join has a 2.3x overhead and the previous implementation of the system we use has an overhead of 4.1x. These results and our insight into the sources of overhead for work-stealing implementations give further hope to an already promising technique for exploiting increasingly available hardware parallelism.

References

[1]
Intel® Cilk#8482; Plus sdk. URL http://software.intel.com/en-us/articles/intel-cilk-plus.
[2]
V. Cavé, J. Zhao, J. Shirako, and V. Sarkar. Habanero-Java: the new adventures of old X10. In Proceedings of the 9th International Conference on Principles and Practice of Programming in Java, PPPJ '11, pages 51--61, New York, NY, USA, 2011. ACM. ISBN 978--1--4503-0935--6. 10.1145/2093157.2093165.
[3]
P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA '05, pages 519--538, New York, NY, USA, 2005. ACM. ISBN 1--59593-031-0. 10.1145/1094811.1094852.
[4]
G. Cong, S. Kodali, S. Krishnamoorthy, D. Lea, V. Saraswat, and T. Wen. Solving large, irregular graph problems using adaptive work-stealing. In Proceedings of the 2008 37th International Conference on Parallel Processing, ICPP '08, pages 536--545, Washington, DC, USA, 2008. IEEE Computer Society. ISBN 978-0--7695--3374--2. 10.1109/ICPP.2008.88.
[5]
A. Duran, J. Corbalán, and E. Ayguadé. Evaluation of OpenMP task scheduling strategies. In Proceedings of the 4th international conference on OpenMP in a new era of parallelism, IWOMP'08, pages 100--110, Berlin, Heidelberg, 2008. Springer-Verlag. ISBN 3--540--79560-X, 978--3--540--79560--5. URL http://dl.acm.org/citation.cfm?id=1789826.1789838.
[6]
A. Duran, X. Teruel, R. Ferrer, X. Martorell, and E. Ayguade. Barcelona OpenMP tasks suite: A set of benchmarks targeting the exploitation of task parallelism in OpenMP. In Proceedings of the 2009 International Conference on Parallel Processing, ICPP '09, pages 124--131, Washington, DC, USA, 2009. IEEE Computer Society. ISBN 978-0--7695--3802-0. 10.1109/ICPP.2009.64.
[7]
S. J. Fink and F. Qian. Design, implementation and evaluation of adaptive recompilation with on-stack replacement. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, CGO '03, pages 241--252, Washington, DC, USA, 2003. IEEE Computer Society. ISBN 0--7695--1913-X. URL http://dl.acm.org/citation.cfm?id=776261.776288.
[8]
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, PLDI '98, pages 212--223, New York, NY, USA, 1998. ACM. ISBN 0--89791--987--4. 10.1145/277650.277725.
[9]
Frigo, Prokop, Frigo, Leiserson, Prokop, Ramachandran, Dailey, Leiserson, Lyubashevskiy, Kushman, et al.}frigo1998cilkM. Frigo, H. Prokop, M. Frigo, C. Leiserson, H. Prokop, S. Ramachandran, D. Dailey, C. Leiserson, I. Lyubashevskiy, N. Kushman, et al. The Cilk project. Algorithms, 1998.
[10]
U. Hölzle, C. Chambers, and D. Ungar. Debugging optimized code with dynamic deoptimization. In Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation, PLDI '92, pages 32--43, New York, NY, USA, 1992. ACM. ISBN 0--89791--475--9. 10.1145/143103.143114.
[11]
P. Kambadur, A. Gupta, A. Ghoting, H. Avron, and A. Lumsdaine. PFunc: modern task parallelism for modern high performance computing. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pages 43:1--43:11, New York, NY, USA, 2009. ACM. ISBN 978--1--60558--744--8. 10.1145/1654059.1654103.
[12]
D. Lea. A Java fork/join framework. In Proceedings of the ACM 2000 conference on Java Grande, JAVA '00, pages 36--43, New York, NY, USA, 2000. ACM. ISBN 1--58113--288--3. 10.1145/337449.337465.
[13]
D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, OOPSLA '09, pages 227--242, New York, NY, USA, 2009. ACM. ISBN 978--1--60558--766-0. 10.1145/1640089.1640106.
[14]
H.-W. Loidl and K. Hammond. On the granularity of divide-and-conquer parallelism. In Proceedings of the 1995 International Conference on Functional Programming, FP'95, pages 135--144, Swinton, UK, UK, 1995. British Computer Society. URL http://dl.acm.org/citation.cfm?id=2227330.2227343.
[15]
J. Mellor-Crummey. Cilk+, parallel performance, and the cilk runtime system. URL http://www.clear.rice.edu/comp422/lecture-notes/comp422--2012-Lecture5-Cilk.pdf.
[16]
MIT. The Cilk project. URL http://supertech.csail.mit.edu/cilk/index.html.
[17]
E. Mohr, D. Kranz, Halstead, and J. R.H. Lazy task creation: A technique for increasing the granularity of parallel programs. Technical report, Cambridge, MA, USA, 1991.
[18]
J. Reinders. phIntel threading building blocks: outfitting C+ for multi-core processor parallelism. O'Reilly Media, Inc., 2007.
[19]
O. Tardieu, H. Wang, and H. Lin. A work-stealing scheduler for X10's task parallelism with suspension. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 267--276, New York, NY, USA, 2012. ACM. ISBN 978--1--4503--1160--1. 10.1145/2145816.2145850.
[20]
L. Wang, H. Cui, Y. Duan, F. Lu, X. Feng, and P.-C. Yew. An adaptive task creation strategy for work-stealing scheduling. In Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, CGO '10, pages 266--277, New York, NY, USA, 2010. ACM. ISBN 978--1--60558--635--9. 10.1145/1772954.1772992.
[21]
X. Yang, S. M. Blackburn, D. Frampton, J. B. Sartor, and K. S. McKinley. Why nothing matters: the impact of zeroing. In Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications, OOPSLA '11, pages 307--324, New York, NY, USA, 2011. ACM. ISBN 978--1--4503-0940-0. 10.1145/2048066.2048092.

Cited By

View all
  • (2021)Teaching High Productivity and High Performance in an Introductory Parallel Programming Course2021 IEEE 28th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW)10.1109/HiPCW54834.2021.00010(21-28)Online publication date: Dec-2021
  • (2020)PufferFish: NUMA-Aware Work-stealing Library using Elastic Tasks2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC50609.2020.00039(251-260)Online publication date: Dec-2020
  • (2019)Analysis and Optimization of Task Granularity on the Java Virtual MachineACM Transactions on Programming Languages and Systems10.1145/333849741:3(1-47)Online publication date: 16-Jul-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applications
October 2012
1052 pages
ISBN:9781450315616
DOI:10.1145/2384616
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 47, Issue 10
    OOPSLA '12
    October 2012
    1011 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2398857
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. X10
  2. managed languages
  3. scheduling
  4. task parallelism
  5. work-stealing

Qualifiers

  • Research-article

Conference

SPLASH '12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 268 of 1,244 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Teaching High Productivity and High Performance in an Introductory Parallel Programming Course2021 IEEE 28th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW)10.1109/HiPCW54834.2021.00010(21-28)Online publication date: Dec-2021
  • (2020)PufferFish: NUMA-Aware Work-stealing Library using Elastic Tasks2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC50609.2020.00039(251-260)Online publication date: Dec-2020
  • (2019)Analysis and Optimization of Task Granularity on the Java Virtual MachineACM Transactions on Programming Languages and Systems10.1145/333849741:3(1-47)Online publication date: 16-Jul-2019
  • (2019)Efficient lock‐step synchronization in task‐parallel languagesSoftware: Practice and Experience10.1002/spe.272649:9(1379-1401)Online publication date: Jul-2019
  • (2018)Not a cape, but a life preserverCommunication Design Quarterly10.1145/3282665.32826716:2(57-69)Online publication date: 1-Oct-2018
  • (2018)FocusVRProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/32649522:3(1-25)Online publication date: 18-Sep-2018
  • (2018)Sensing Behavioral Change over TimeProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/32649512:3(1-21)Online publication date: 18-Sep-2018
  • (2018)Accelerating Data Analytics on Integrated GPU Platforms via Runtime SpecializationInternational Journal of Parallel Programming10.1007/s10766-016-0482-x46:2(336-375)Online publication date: 1-Apr-2018
  • (2017)The CUREACM Transactions on Embedded Computing Systems10.1145/312652716:5s(1-19)Online publication date: 27-Sep-2017
  • (2016)Understanding and improving JVM GC work stealing at the data center scaleACM SIGPLAN Notices10.1145/3241624.292670651:11(46-54)Online publication date: 14-Jun-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media