More Web Proxy on the site http://driver.im/

research-article

Techniques for efficient placement of synchronization primitives

Authors:

Alexandru Nicolau,

Arun KejariwalAuthors Info & Claims

PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming

Pages 199 - 208

https://doi.org/10.1145/1504176.1504207

Published: 14 February 2009 Publication History

Abstract

Harnessing the hardware parallelism of the emerging multi-cores systems necessitates concurrent software. Unfortunately, most of the existing mainstream software is sequential in nature. Although one could auto-parallelize a given program, the efficacy of this is largely limited to floating-point codes. One of the ways to alleviate the above limitation is to parallelize programs, which cannot be auto-parallelized, via explicit synchronization. In this regard, efficient placement of the synchronization primitives - say, post, wait - plays a key role in achieving high degree of thread-level parallelism (TLP). In this paper, we propose novel compiler techniques for the above. Specifically, given a control flow graph (CFG), the proposed techniques place a post as early as possible and place a wait as late as possible in the CFG, subject to dependences. We demonstrate the efficacy of our techniques, on a real machine, using real codes, specifically, from the industry-standard SPEC CPU benchmarks, the Linux kernel and other widely used open source codes. Our results show that the proposed techniques yield significantly higher levels of TLP than the state-of-the-art.

References

[1]

S. Midkiff and D. Padua. Compiler algorithms for synchronization. IEEE Transactions on Computers, C-36(12):1485--1495, December 1987.

Digital Library

[2]

J. R. Goodman, M. K. Vernon, and P. J. Woest. Efficient synchronization primitives for large-scale cache-coherent multiprocessors. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-III), pages 64--75, Boston, MA, 1989.

Digital Library

[3]

J. Labarta and E. Ayguadé. GTS: Extracting full parallelism out of DO loops. In Proceedings of the Parallel Architectures and Languages Europe, pages 43--54, Eindhoven, The Netherlands, 1989.

Digital Library

[4]

G. Granunke and S. Thakkar. Synchronization algorithms for shared-memory multiprocessors. IEEE Computer, 23(6):60--69, 1990.

Digital Library

[5]

Z. Li. Compiler algorithms for event variable synchronization. In Proceedings of the 1991 ACM International Conference on Supercomputing, Cologne, Germany, June 1991.

Digital Library

[6]

A. Krishnamurthy and K. Yelick. Optimizing parallel programs with explicit synchronization. In Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation, pages 196--204, La Jolla, CA, 1995.

Digital Library

[7]

A. Aiken and D. Gay. Barrier inference. In Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 342--354, San Diego, CA, 1998.

Digital Library

[8]

A. Kagi. Mechanism for Efficient Shared-Memory Lock-based Synchronization. PhD thesis, Department of Computer Science, University of Wisconsin-Madison, 1999.

Digital Library

[9]

D. S. Nikolopoulos and T. S. Papatheodorou. Fast synchronization on scalable cache-coherent multiprocessors using hybrid primitives. In Proceedings of the 14th International Parallel and Distributed Processing Symposium, pages 711--720, Cancun, Mexico, 2000.

Digital Library

[10]

D. F. Bacon, R. Konuru, C. Murthy, and M. J. Serrano. Thin locks: Featherweight synchronization for java. ACM SIGPLAN Notices, 39(4):583--595, 2004.

Digital Library

[11]

A. Kejariwal, X. Tian, H. Saito, W. Li, M. Girkar, U. Banerjee, A. Nicolau, and C. D. Polychronopoulos. Lightweight lock-free synchronization methods for multithreading. In Proceedings of the 20th ACM International Conference on Supercomputing, pages 361--371, Cairns, Australia, 2006.

Digital Library

[12]

The Linux Kernel Archives. http://www.kernel.org.

[13]

J. A. Fisher. Trace Scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, C-30(7):478--490, July 1981.

Digital Library

[14]

A. Nicolau. Percolation scheduling: A parallel compilation technique. Technical Report TR85-678, Dept. of Computer Science, Cornell University, May 1985.

Digital Library

[15]

W. M. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery. The superblock: An effective technique for VLIW and super-scalar compilation. The JournaL of Supercomputing, 7(1-2):229--248, November 1993.

Digital Library

[16]

Jens Knoop, Oliver Rüthing, and Bernhard Steffen. Optimal code motion: Theory and practice. ACM Transactions on Programming Languages and Systems, 16(4):1117--1155, July 1994.

Digital Library

[17]

M. Hailperin. Cost-optimal code motion. ACM Transactions on Programming Languages and Systems, 20(6):1297--1322, 1998.

Digital Library

[18]

E. Morel and C. Renvoise. Global optimization by suppression of partial redun-dancies. Communications of the ACM, 22(2):96--103, February 1979.

Digital Library

[19]

SPEC CPU Benchmarks. http://www.spec.org/benchmarks.html.

[20]

A. Kejariwal, X. Tian, W. Li, M. Girkar, S. Kozhukhov, H. Saito, U. Banerjee, A. Nicolau, A. V. Veidenbaum, and C. D. Polychronopoulos. On the performance potential of different types of speculative thread-level parallelism. In Proceedings of the 20th ACM International Conference on Supercomputing, pages 24--35, Cairns, Australia, 2006.

Digital Library

[21]

U. Banerjee. Dependence Analysis. Kluwer Academic Publishers, Boston, MA, 1997.

Digital Library

[22]

SPEC CPU2006. http://www.spec.org/cpu2006.

[23]

T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. The MIT Press, Cambridge, MA, 1990.

Digital Library

[24]

K. Karplus and A. Nicolau. Efficient hardware for multiway jumps and prefetches. In Proceedings of the 18th annual workshop on Microprogramming, pages 11--18, 1985.

Digital Library

[25]

D. Kuck. The Structure of Computers and Computations, VOLUME 1. John Wiley and Sons, New York, NY, 1978.

Digital Library

[26]

SPEC CINT2006. http://www.spec.org/cpu2006/CINT2006.

[27]

S. Novack and A. Nicolau. Trailblazing: A hierarchical approach to percolation scheduling. International Journal of Parallel Programming, 23(1), 1995.

Digital Library

[28]

A. Nicolau. Percolation scheduling. In Proceedings of the 1985 International Conference on Parallel Processing, August 1985.

[29]

K. Ebcioglu and T. Nakatani. A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture. In Proceedings of the Third Workshop on Languages and Compilers for Parallel Computing, Urbana, IL, May 1990.

Digital Library

[30]

S. Muchnick. Advanced Compiler Design Implementation. Second edition, 2000.

Digital Library

[31]

SPEC CPU2000. http://www.spec.org/cpu2000.

[32]

Sendmail. http://www.sendmail.org/.

[33]

Apache. http://download.nextag.com/apache.

[34]

D. A. Padua. Multiprocessors: Discussion of theoritical and practical problems. Technical Report 79-990, Department of Computer Science, University of Illinois at Urbana-Champaign, November 1979.

[35]

J. Davies. Parallel loop constructs for multiprocessors. Technical Report 81-1070, Department of Computer Science, University of Illinois at Urbana-Champaign, May 1981.

[36]

C. Zhu and P. Yew. A synchronization scheme and its applications for large scale multiprocessors. In Proceedings of the Conference on Distributed Computing Systems, pages 486--491, San Francisco, CA, May 1984.

[37]

J. Mellor-Crummey and M. L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems, 9(1):21--65, 1991.

Digital Library

[38]

D. M. Tullsen, J. L. Lo, S. J. Eggers, and H. M. Levy. Supporting fine-grained synchronization on a simultaneous multithreading processor. In Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, pages 54--58, 1999.

Digital Library

[39]

S. Sridharan, A. Rodrigues, and P. Kogge. Evaluating synchronization techniques for light-weight multithreaded/multicore architectures. In Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures, pages 57--58, San Diego, CA, 2007.

Digital Library

[40]

W. Zhu, V. C Sreedhar, Z. Hu, and G. R. Gao. Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures. In Proceedings of the 34th International Symposium on Computer Architecture, pages 35--45, San Diego, CA, 2007.

Digital Library

[41]

J. Whaley and M. Rinard. Compositional pointer and escape analysis for Java programs. In Proceedings of the 14th ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications, pages 187--206, Denver, CO, 1999.

Digital Library

[42]

A. Salcianu and M. Rinard. Pointer and escape analysis for multithreaded programs. In Proceedings of the 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 12--23, Snowbird, UT, 2001.

Digital Library

[43]

R. Cytron. Doacross: Beyond vectorization for multiprocessors. In Proceedings of the 1986 International Conference on Parallel Processing, pages 836--844, St. Charles, IL, August 1986.

[44]

S. Midkiff and D. Padua. Compiler generated synchronization for DO loops. In Proceedings of the 1986 International Conference on Parallel Processing, pages 544--551, St. Charles, IL, August 1986.

[45]

H. Kasahara, H. Honda, M. Iwata, and M. Hirota. A compilation scheme for macro-dataow computation on hierarchical multiprocessor systems. In Proceedings of the International Conference on Parallel Processing, pages II294--II295, Urbana-Champaign, IL, August 1990.

[46]

M. B. Girkar. Functional Parallelism Theoretical Foundations and Implemen-tation. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, December 1991.

Digital Library

[47]

R. Cytron, M. Hind, and W. Hsieh. Automatic generation of DAG parallelism. In Proceedings of the SIGPLAN '89 Conference on Programming Language Design and Implementation, pages 54--68, Portland, OR, 1989.

Digital Library

[48]

V. Sarkar. Instruction reordering for fork-join parallelism. In Proceedings of the SIGPLAN '90 Conference on Programming Language Design and Implementation, pages 322--336, White Plains, NY, 1990.

Digital Library

[49]

C. Tian, V. Nagarajan, R. Gupta, and S. Tallam. Dynamic recognition of synchronization operations for improved data race detection. In Proceedings of the ACM/SIGSOFT International Symposium on Software Testing and Analysis, pages 143--154, Seattle, WA, 2008.

Digital Library

Cited By

Dutta SSarkar DRawat A(2017)Synchronization Validation for Cross-Thread Dependences in Parallel ProgramsInternational Journal of Parallel Programming10.1007/s10766-016-0467-945:6(1326-1365)Online publication date: 1-Dec-2017
https://dl.acm.org/doi/10.1007/s10766-016-0467-9
Jääskeläinen PLama CSchnetter ERaiskila KTakala JBerg H(2015)poclInternational Journal of Parallel Programming10.1007/s10766-014-0320-y43:5(752-785)Online publication date: 1-Oct-2015
https://dl.acm.org/doi/10.1007/s10766-014-0320-y
Campanoni SBrownell KKanev SJones TWei GBrooks DYew PZhai AKeckler S(2014)HELIX-RCProceeding of the 41st annual international symposium on Computer architecuture10.5555/2665671.2665705(217-228)Online publication date: 14-Jun-2014
https://dl.acm.org/doi/10.5555/2665671.2665705
Show More Cited By

Index Terms

Techniques for efficient placement of synchronization primitives
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Synchronization optimizations for efficient execution on multi-cores
ICS '09: Proceedings of the 23rd international conference on Supercomputing

Multi-cores are becoming ubiquitous as exemplified by Sun's Niagra-2, Intel's Nehalem and AMD's Sau Paulo octal cores. The number of cores per chip is expected to rise in foreseeable future, as evidenced by the recently announced Intel's 80-core ...
Techniques for efficient placement of synchronization primitives
PPoPP '09

Harnessing the hardware parallelism of the emerging multi-cores systems necessitates concurrent software. Unfortunately, most of the existing mainstream software is sequential in nature. Although one could auto-parallelize a given program, the efficacy ...
Compiler Techniques for the Superthreaded Architectures

Several useful compiler and program transformation techniques for the superthreaded architectures are presented in this paper. The superthreaded architecture adopts a thread pipelining execution model to facilitate runtime data dependence checking ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming

February 2009

322 pages

ISBN:9781605583976

DOI:10.1145/1504176

General Chair:
Daniel Reed
Microsoft Research, USA
,
Program Chair:
Vivek Sarkar
Rice University, USA

ACM SIGPLAN Notices Volume 44, Issue 4
PPoPP '09
April 2009
294 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1594835
Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 February 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PPoPP09

Sponsor:

PPoPP09: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 14 - 18, 2009

NC, Raleigh, USA

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
608
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)2

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Dutta SSarkar DRawat A(2017)Synchronization Validation for Cross-Thread Dependences in Parallel ProgramsInternational Journal of Parallel Programming10.1007/s10766-016-0467-945:6(1326-1365)Online publication date: 1-Dec-2017
https://dl.acm.org/doi/10.1007/s10766-016-0467-9
Jääskeläinen PLama CSchnetter ERaiskila KTakala JBerg H(2015)poclInternational Journal of Parallel Programming10.1007/s10766-014-0320-y43:5(752-785)Online publication date: 1-Oct-2015
https://dl.acm.org/doi/10.1007/s10766-014-0320-y
Campanoni SBrownell KKanev SJones TWei GBrooks DYew PZhai AKeckler S(2014)HELIX-RCProceeding of the 41st annual international symposium on Computer architecuture10.5555/2665671.2665705(217-228)Online publication date: 14-Jun-2014
https://dl.acm.org/doi/10.5555/2665671.2665705
Campanoni SBrownell KKanev SJones TWei GBrooks D(2014)HELIX-RCACM SIGARCH Computer Architecture News10.1145/2678373.266570542:3(217-228)Online publication date: 14-Jun-2014
https://dl.acm.org/doi/10.1145/2678373.2665705
Campanoni SJones THolloway GReddi VWei GBrooks DEidt CHoller ASrinivasan UAmarasinghe S(2012)HELIXProceedings of the Tenth International Symposium on Code Generation and Optimization10.1145/2259016.2259028(84-93)Online publication date: 31-Mar-2012
https://dl.acm.org/doi/10.1145/2259016.2259028
Jalan RKejariwal A(2012)Trin-Trin: Who’s Calling? A Pin-Based Dynamic Call Graph Extraction FrameworkInternational Journal of Parallel Programming10.1007/s10766-012-0193-x40:4(410-442)Online publication date: 12-May-2012
https://doi.org/10.1007/s10766-012-0193-x
D'alberto PBodrato MNicolau A(2011)Exploiting parallelism in matrix-computation kernels for symmetric multiprocessor systemsACM Transactions on Mathematical Software10.1145/2049662.204966438:1(1-30)Online publication date: 7-Dec-2011
https://dl.acm.org/doi/10.1145/2049662.2049664
Nicolau AKejariwal A(2011)How Many Threads to Spawn during Program Multithreading?Languages and Compilers for Parallel Computing10.1007/978-3-642-19595-2_12(166-183)Online publication date: 2011
https://doi.org/10.1007/978-3-642-19595-2_12
Nicolau AKejariwal A(2010)How many threads to spawn during program multithreading?Proceedings of the 23rd international conference on Languages and compilers for parallel computing10.5555/1964536.1964548(166-183)Online publication date: 7-Oct-2010
https://dl.acm.org/doi/10.5555/1964536.1964548
Nicolau ALi GVeidenbaum AKejariwal AGschwind MNicolau ASalapura VMoreira J(2009)Synchronization optimizations for efficient execution on multi-coresProceedings of the 23rd international conference on Supercomputing10.1145/1542275.1542303(169-180)Online publication date: 8-Jun-2009
https://dl.acm.org/doi/10.1145/1542275.1542303
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents