[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2814270.2814283acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

Remote-scope promotion: clarified, rectified, and verified

Published: 23 October 2015 Publication History

Abstract

Modern accelerator programming frameworks, such as OpenCL, organise threads into work-groups. Remote-scope promotion (RSP) is a language extension recently proposed by AMD researchers that is designed to enable applications, for the first time, both to optimise for the common case of intra-work-group communication (using memory scopes to provide consistency only within a work-group) and to allow occasional inter-work-group communication (as required, for instance, to support the popular load-balancing idiom of work stealing). We present the first formal, axiomatic memory model of OpenCL extended with RSP. We have extended the Herd memory model simulator with support for OpenCL kernels that exploit RSP, and used it to discover bugs in several litmus tests and a work-stealing queue, that have been used previously in the study of RSP. We have also formalised the proposed GPU implementation of RSP. The formalisation process allowed us to identify bugs in the description of RSP that could result in well-synchronised programs experiencing memory inconsistencies. We present and prove sound a new implementation of RSP that incorporates bug fixes and requires less non-standard hardware than the original implementation. This work, a collaboration between academia and industry, clearly demonstrates how, when designing hardware support for a new concurrent language feature, the early application of formal tools and techniques can help to prevent errors, such as those we have found, from making it into silicon.

Supplementary Material

Auxiliary Archive (p731-wickerson-s.zip)
This archive contains (1) a virtual machine for replicating the results of simulating our litmus tests with Herd, and (2) our Isabelle formalisation of remote-scope promotion.

References

[1]
J. Alglave, M. Batty, A. F. Donaldson, G. Gopalakrishnan, J. Ketema, D. Poetzl, T. Sorensen, and J. Wickerson. GPU concurrency: weak behaviours and programming assumptions. In ASPLOS, 2015.
[2]
J. Alglave, L. Maranget, S. Sarkar, and P. Sewell. Fences in weak memory models. In CAV, 2010.
[3]
J. Alglave, L. Maranget, and M. Tautschnig. Herding cats: Modelling, simulation, testing, and data-mining for weak memory. ACM TOPLAS, 2014.
[4]
M. Batty. The C11 and C++11 Concurrency Model. PhD thesis, University of Cambridge, October 2014.
[5]
M. Batty, K. Memarian, S. Owens, S. Sarkar, and P. Sewell. Clarifying and compiling C/C++ concurrency: From C++11 to POWER. In POPL, 2012.
[6]
M. Batty, S. Owens, S. Sarkar, P. Sewell, and T. Weber. Mathematizing C++ concurrency. In POPL, 2011.
[7]
A. Betts, N. Chong, A. F. Donaldson, J. Ketema, S. Qadeer, P. Thomson, and J. Wickerson. The design and implementation of a verification technique for GPU kernels. ACM Trans. Program. Lang. Syst., 37(3):10, 2015.
[8]
D. Cederman and P. Tsigas. Dynamic load balancing using work-stealing. In GPU Computing Gems. Elsevier, 2012.
[9]
S. Che, B. M. Beckmann, S. K. Reinhardt, and K. Skadron. Pannotia: Understanding irregular GPGPU graph applications. In IISWC, 2013.
[10]
B. R. Gaster, D. R. Hower, and L. Howes. HRF-Relaxed: Adapting HRF to the complexities of industrial heterogeneous memory models. ACM TACO, 2015.
[11]
B. A. Hechtman, S. Che, D. R. Hower, Y. Tian, B. M. Beckmann, M. D. Hill, S. K. Reinhardt, and D. A. Wood. QuickRelease: A throughput-oriented approach to release consistency on GPUs. In HPCA, 2014.
[12]
D. R. Hower, B. M. Beckmann, B. R. Gaster, B. A. Hechtman, M. D. Hill, S. K. Reinhardt, and D. A. Wood. Heterogeneousrace-free memory models. In ASPLOS, 2014.
[13]
G. Kyriazis. Heterogeneous system architecture: A technical review. Technical report, AMD, 2012.
[14]
C. Lin, V. Nagarajan, and R. Gupta. Fence scoping. In SC, 2014.
[15]
D. Lustig, M. Pellauer, and M. Martonosi. PipeCheck: Specifying and verifying microarchitectural enforcement of memory consistency models. In MICRO, 2014.
[16]
S. Mador-Haim, L. Maranget, S. Sarkar, K. Memarian, J. Alglave, S. Owens, R. Alur, M. M. K. Martin, P. Sewell, and D. Williams. An axiomatic memory model for POWER multiprocessors. In CAV, 2012.
[17]
A. Munshi. The OpenCL Specification (Version 2.0). Khronos OpenCL Working Group, November 2013.
[18]
T. Nipkow, L. Paulson, and M. Wenzel. Isabelle/HOL - A Proof Assistant for Higher-Order Logic. Springer, 2002.
[19]
M. S. Orr, S. Che, A. Yilmazer, B. M. Beckmann, M. D. Hill, and D. A. Wood. Synchronization using remote-scope promotion. In ASPLOS, 2015.
[20]
B. Rajaram, V. Nagarajan, S. Sarkar, and M. Elver. Fast RMWs for TSO: Semantics and implementation. In PLDI, 2013.
[21]
S. Sarkar, K. Memarian, S. Owens, M. Batty, P. Sewell, L. Maranget, J. Alglave, and D. Williams. Synchronising C/C++ and POWER. In PLDI, 2012.
[22]
S. Sarkar, P. Sewell, J. Alglave, L. Maranget, and D. Williams. Understanding POWER multiprocessors. In PLDI, 2011.
[23]
P. Sewell, S. Sarkar, S. Owens, F. Zappa Nardelli, and M. O. Myreen. x86-TSO: A rigorous and usable programmer’s model for x86 multiprocessors. CACM, 53(7), 2010.
[24]
J. Wickerson, M. Batty, and A. F. Donaldson. Overhauling SC atomics in C11 and OpenCL. CoRR, July 2015. http://arxiv.org/abs/1503.07073.

Cited By

View all
  • (2022)Only Buffer When You Need To: Reducing On-chip GPU Traffic with Reconfigurable Local Atomic Buffers2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00056(676-691)Online publication date: Apr-2022
  • (2022)High‐coverage metamorphic testing of concurrency support in C compilersSoftware Testing, Verification and Reliability10.1002/stvr.181232:4Online publication date: 22-Mar-2022
  • (2021)GPS: A Global Publish-Subscribe Model for Multi-GPU Memory ManagementMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480088(46-58)Online publication date: 18-Oct-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications
October 2015
953 pages
ISBN:9781450336895
DOI:10.1145/2814270
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 50, Issue 10
    OOPSLA '15
    October 2015
    953 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2858965
    • Editor:
    • Andy Gill
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Formal methods
  2. Isabelle
  3. OpenCL
  4. graphics processing unit (GPU)
  5. programming language implementation
  6. weak memory models
  7. work stealing

Qualifiers

  • Research-article

Funding Sources

Conference

SPLASH '15
Sponsor:

Acceptance Rates

Overall Acceptance Rate 268 of 1,244 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Only Buffer When You Need To: Reducing On-chip GPU Traffic with Reconfigurable Local Atomic Buffers2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00056(676-691)Online publication date: Apr-2022
  • (2022)High‐coverage metamorphic testing of concurrency support in C compilersSoftware Testing, Verification and Reliability10.1002/stvr.181232:4Online publication date: 22-Mar-2022
  • (2021)GPS: A Global Publish-Subscribe Model for Multi-GPU Memory ManagementMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480088(46-58)Online publication date: 18-Oct-2021
  • (2020)Deterministic Atomic Buffering2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00083(981-995)Online publication date: Oct-2020
  • (2020)ScoRDProceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture10.1109/ISCA45697.2020.00088(1036-1049)Online publication date: 30-May-2020
  • (2019)Autonomous Data-Race-Free GPU Testing2019 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC47752.2019.9042019(81-92)Online publication date: Nov-2019
  • (2017)BARRACUDA: binary-level analysis of runtime RAces in CUDA programsACM SIGPLAN Notices10.1145/3140587.306234252:6(126-140)Online publication date: 14-Jun-2017
  • (2017)Automatically comparing memory consistency modelsACM SIGPLAN Notices10.1145/3093333.300983852:1(190-204)Online publication date: 1-Jan-2017
  • (2017)BARRACUDA: binary-level analysis of runtime RAces in CUDA programsProceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3062341.3062342(126-140)Online publication date: 14-Jun-2017
  • (2017)Hardware Synthesis of Weakly Consistent C ConcurrencyProceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3020078.3021733(169-178)Online publication date: 22-Feb-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media