[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1504176.1504207acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Techniques for efficient placement of synchronization primitives

Published: 14 February 2009 Publication History

Abstract

Harnessing the hardware parallelism of the emerging multi-cores systems necessitates concurrent software. Unfortunately, most of the existing mainstream software is sequential in nature. Although one could auto-parallelize a given program, the efficacy of this is largely limited to floating-point codes. One of the ways to alleviate the above limitation is to parallelize programs, which cannot be auto-parallelized, via explicit synchronization. In this regard, efficient placement of the synchronization primitives - say, post, wait - plays a key role in achieving high degree of thread-level parallelism (TLP). In this paper, we propose novel compiler techniques for the above. Specifically, given a control flow graph (CFG), the proposed techniques place a post as early as possible and place a wait as late as possible in the CFG, subject to dependences. We demonstrate the efficacy of our techniques, on a real machine, using real codes, specifically, from the industry-standard SPEC CPU benchmarks, the Linux kernel and other widely used open source codes. Our results show that the proposed techniques yield significantly higher levels of TLP than the state-of-the-art.

References

[1]
S. Midkiff and D. Padua. Compiler algorithms for synchronization. IEEE Transactions on Computers, C-36(12):1485--1495, December 1987.
[2]
J. R. Goodman, M. K. Vernon, and P. J. Woest. Efficient synchronization primitives for large-scale cache-coherent multiprocessors. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-III), pages 64--75, Boston, MA, 1989.
[3]
J. Labarta and E. Ayguadé. GTS: Extracting full parallelism out of DO loops. In Proceedings of the Parallel Architectures and Languages Europe, pages 43--54, Eindhoven, The Netherlands, 1989.
[4]
G. Granunke and S. Thakkar. Synchronization algorithms for shared-memory multiprocessors. IEEE Computer, 23(6):60--69, 1990.
[5]
Z. Li. Compiler algorithms for event variable synchronization. In Proceedings of the 1991 ACM International Conference on Supercomputing, Cologne, Germany, June 1991.
[6]
A. Krishnamurthy and K. Yelick. Optimizing parallel programs with explicit synchronization. In Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation, pages 196--204, La Jolla, CA, 1995.
[7]
A. Aiken and D. Gay. Barrier inference. In Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 342--354, San Diego, CA, 1998.
[8]
A. Kagi. Mechanism for Efficient Shared-Memory Lock-based Synchronization. PhD thesis, Department of Computer Science, University of Wisconsin-Madison, 1999.
[9]
D. S. Nikolopoulos and T. S. Papatheodorou. Fast synchronization on scalable cache-coherent multiprocessors using hybrid primitives. In Proceedings of the 14th International Parallel and Distributed Processing Symposium, pages 711--720, Cancun, Mexico, 2000.
[10]
D. F. Bacon, R. Konuru, C. Murthy, and M. J. Serrano. Thin locks: Featherweight synchronization for java. ACM SIGPLAN Notices, 39(4):583--595, 2004.
[11]
A. Kejariwal, X. Tian, H. Saito, W. Li, M. Girkar, U. Banerjee, A. Nicolau, and C. D. Polychronopoulos. Lightweight lock-free synchronization methods for multithreading. In Proceedings of the 20th ACM International Conference on Supercomputing, pages 361--371, Cairns, Australia, 2006.
[12]
The Linux Kernel Archives. http://www.kernel.org.
[13]
J. A. Fisher. Trace Scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, C-30(7):478--490, July 1981.
[14]
A. Nicolau. Percolation scheduling: A parallel compilation technique. Technical Report TR85-678, Dept. of Computer Science, Cornell University, May 1985.
[15]
W. M. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery. The superblock: An effective technique for VLIW and super-scalar compilation. The JournaL of Supercomputing, 7(1-2):229--248, November 1993.
[16]
Jens Knoop, Oliver Rüthing, and Bernhard Steffen. Optimal code motion: Theory and practice. ACM Transactions on Programming Languages and Systems, 16(4):1117--1155, July 1994.
[17]
M. Hailperin. Cost-optimal code motion. ACM Transactions on Programming Languages and Systems, 20(6):1297--1322, 1998.
[18]
E. Morel and C. Renvoise. Global optimization by suppression of partial redun-dancies. Communications of the ACM, 22(2):96--103, February 1979.
[19]
SPEC CPU Benchmarks. http://www.spec.org/benchmarks.html.
[20]
A. Kejariwal, X. Tian, W. Li, M. Girkar, S. Kozhukhov, H. Saito, U. Banerjee, A. Nicolau, A. V. Veidenbaum, and C. D. Polychronopoulos. On the performance potential of different types of speculative thread-level parallelism. In Proceedings of the 20th ACM International Conference on Supercomputing, pages 24--35, Cairns, Australia, 2006.
[21]
U. Banerjee. Dependence Analysis. Kluwer Academic Publishers, Boston, MA, 1997.
[22]
SPEC CPU2006. http://www.spec.org/cpu2006.
[23]
T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. The MIT Press, Cambridge, MA, 1990.
[24]
K. Karplus and A. Nicolau. Efficient hardware for multiway jumps and prefetches. In Proceedings of the 18th annual workshop on Microprogramming, pages 11--18, 1985.
[25]
D. Kuck. The Structure of Computers and Computations, VOLUME 1. John Wiley and Sons, New York, NY, 1978.
[26]
SPEC CINT2006. http://www.spec.org/cpu2006/CINT2006.
[27]
S. Novack and A. Nicolau. Trailblazing: A hierarchical approach to percolation scheduling. International Journal of Parallel Programming, 23(1), 1995.
[28]
A. Nicolau. Percolation scheduling. In Proceedings of the 1985 International Conference on Parallel Processing, August 1985.
[29]
K. Ebcioglu and T. Nakatani. A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture. In Proceedings of the Third Workshop on Languages and Compilers for Parallel Computing, Urbana, IL, May 1990.
[30]
S. Muchnick. Advanced Compiler Design Implementation. Second edition, 2000.
[31]
SPEC CPU2000. http://www.spec.org/cpu2000.
[32]
Sendmail. http://www.sendmail.org/.
[33]
Apache. http://download.nextag.com/apache.
[34]
D. A. Padua. Multiprocessors: Discussion of theoritical and practical problems. Technical Report 79-990, Department of Computer Science, University of Illinois at Urbana-Champaign, November 1979.
[35]
J. Davies. Parallel loop constructs for multiprocessors. Technical Report 81-1070, Department of Computer Science, University of Illinois at Urbana-Champaign, May 1981.
[36]
C. Zhu and P. Yew. A synchronization scheme and its applications for large scale multiprocessors. In Proceedings of the Conference on Distributed Computing Systems, pages 486--491, San Francisco, CA, May 1984.
[37]
J. Mellor-Crummey and M. L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems, 9(1):21--65, 1991.
[38]
D. M. Tullsen, J. L. Lo, S. J. Eggers, and H. M. Levy. Supporting fine-grained synchronization on a simultaneous multithreading processor. In Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, pages 54--58, 1999.
[39]
S. Sridharan, A. Rodrigues, and P. Kogge. Evaluating synchronization techniques for light-weight multithreaded/multicore architectures. In Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures, pages 57--58, San Diego, CA, 2007.
[40]
W. Zhu, V. C Sreedhar, Z. Hu, and G. R. Gao. Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures. In Proceedings of the 34th International Symposium on Computer Architecture, pages 35--45, San Diego, CA, 2007.
[41]
J. Whaley and M. Rinard. Compositional pointer and escape analysis for Java programs. In Proceedings of the 14th ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications, pages 187--206, Denver, CO, 1999.
[42]
A. Salcianu and M. Rinard. Pointer and escape analysis for multithreaded programs. In Proceedings of the 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 12--23, Snowbird, UT, 2001.
[43]
R. Cytron. Doacross: Beyond vectorization for multiprocessors. In Proceedings of the 1986 International Conference on Parallel Processing, pages 836--844, St. Charles, IL, August 1986.
[44]
S. Midkiff and D. Padua. Compiler generated synchronization for DO loops. In Proceedings of the 1986 International Conference on Parallel Processing, pages 544--551, St. Charles, IL, August 1986.
[45]
H. Kasahara, H. Honda, M. Iwata, and M. Hirota. A compilation scheme for macro-dataow computation on hierarchical multiprocessor systems. In Proceedings of the International Conference on Parallel Processing, pages II294--II295, Urbana-Champaign, IL, August 1990.
[46]
M. B. Girkar. Functional Parallelism Theoretical Foundations and Implemen-tation. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, December 1991.
[47]
R. Cytron, M. Hind, and W. Hsieh. Automatic generation of DAG parallelism. In Proceedings of the SIGPLAN '89 Conference on Programming Language Design and Implementation, pages 54--68, Portland, OR, 1989.
[48]
V. Sarkar. Instruction reordering for fork-join parallelism. In Proceedings of the SIGPLAN '90 Conference on Programming Language Design and Implementation, pages 322--336, White Plains, NY, 1990.
[49]
C. Tian, V. Nagarajan, R. Gupta, and S. Tallam. Dynamic recognition of synchronization operations for improved data race detection. In Proceedings of the ACM/SIGSOFT International Symposium on Software Testing and Analysis, pages 143--154, Seattle, WA, 2008.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
February 2009
322 pages
ISBN:9781605583976
DOI:10.1145/1504176
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 44, Issue 4
    PPoPP '09
    April 2009
    294 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1594835
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 February 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. compilers
  2. multithreading
  3. parallelization
  4. performance

Qualifiers

  • Research-article

Conference

PPoPP09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)2
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Synchronization Validation for Cross-Thread Dependences in Parallel ProgramsInternational Journal of Parallel Programming10.1007/s10766-016-0467-945:6(1326-1365)Online publication date: 1-Dec-2017
  • (2015)poclInternational Journal of Parallel Programming10.1007/s10766-014-0320-y43:5(752-785)Online publication date: 1-Oct-2015
  • (2014)HELIX-RCProceeding of the 41st annual international symposium on Computer architecuture10.5555/2665671.2665705(217-228)Online publication date: 14-Jun-2014
  • (2014)HELIX-RCACM SIGARCH Computer Architecture News10.1145/2678373.266570542:3(217-228)Online publication date: 14-Jun-2014
  • (2012)HELIXProceedings of the Tenth International Symposium on Code Generation and Optimization10.1145/2259016.2259028(84-93)Online publication date: 31-Mar-2012
  • (2012)Trin-Trin: Who’s Calling? A Pin-Based Dynamic Call Graph Extraction FrameworkInternational Journal of Parallel Programming10.1007/s10766-012-0193-x40:4(410-442)Online publication date: 12-May-2012
  • (2011)Exploiting parallelism in matrix-computation kernels for symmetric multiprocessor systemsACM Transactions on Mathematical Software10.1145/2049662.204966438:1(1-30)Online publication date: 7-Dec-2011
  • (2011)How Many Threads to Spawn during Program Multithreading?Languages and Compilers for Parallel Computing10.1007/978-3-642-19595-2_12(166-183)Online publication date: 2011
  • (2010)How many threads to spawn during program multithreading?Proceedings of the 23rd international conference on Languages and compilers for parallel computing10.5555/1964536.1964548(166-183)Online publication date: 7-Oct-2010
  • (2009)Synchronization optimizations for efficient execution on multi-coresProceedings of the 23rd international conference on Supercomputing10.1145/1542275.1542303(169-180)Online publication date: 8-Jun-2009
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media