[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Modeling and analysis of remote memory access programming

Published: 19 October 2016 Publication History

Abstract

Recent advances in networking hardware have led to a new generation of Remote Memory Access (RMA) networks in which processors from different machines can communicate directly, bypassing the operating system and allowing higher performance. Researchers and practitioners have proposed libraries and programming models for RMA to enable the development of applications running on these networks,
However, the memory models implied by these RMA libraries and languages are often loosely specified, poorly understood, and differ depending on the underlying network architecture and other factors. Hence, it is difficult to precisely reason about the semantics of RMA programs or how changes in the network architecture affect them.
We address this problem with the following contributions: (i) a coreRMA language which serves as a common foundation, formalizing the essential characteristics of RMA programming; (ii) complete axiomatic semantics for that language; (iii) integration of our semantics with an existing constraint solver, enabling us to exhaustively generate coreRMA programs (litmus tests) up to a specified bound and check whether the tests satisfy their specification; and (iv) extensive validation of our semantics on real-world RMA systems. We generated and ran 7441 litmus tests using each of the low-level RMA network APIs: DMAPP, VPI Verbs, and Portals 4. Our results confirmed that our model successfully captures behaviors exhibited by these networks. Moreover, we found RMA programs that behave inconsistently with existing documentation, confirmed by network experts.
Our work provides an important step towards understanding existing RMA networks, thus influencing the design of future RMA interfaces and hardware.

References

[1]
P. A. Abdulla, M. F. Atig, Y. Chen, C. Leonardsson, and A. Rezine. Automatic fence insertion in integer programs via predicate abstraction. In Static Analysis - 19th International Symposium, SAS 2012, 2012.
[2]
J. Alglave, D. Kroening, V. Nimal, and M. Tautschnig. Software verification for weak memory via program transformation. In Programming Languages and Systems - 22nd European Symposium on Programming, ESOP 2013, 2013.
[3]
J. Alglave, D. Kroening, V. Nimal, and D. Poetzl. Don’t sit on the fence—A static analysis approach to automatic fence insertion. In Computer Aided Verification - 26th International Conference, CAV 2014, 2014a. J. Alglave, L. Maranget, and M. Tautschnig. Herding cats: Modelling, simulation, testing, and data mining for weak memory. ACM Trans. Program. Lang. Syst., 36(2):7, 2014b.
[4]
R. Alverson, D. Roweth, and L. Kaplan. The Gemini system interconnect. In Proc. of the IEEE Symposium on High Performance Interconnects (HOTI’10), pages 83–87. IEEE Computer Society, 2010.
[5]
B. Arimilli, R. Arimilli, V. Chung, S. Clark, W. Denzel, B. Drerup, T. Hoefler, J. Joyner, J. Lewis, J. Li, N. Ni, and R. Rajamony. The PERCS high-performance interconnect. In Proc. of the IEEE Symposium on High Performance Interconnects (HOTI’10), pages 75–82. IEEE Computer Society, Aug. 2010.
[6]
B. W. Barrett, R. B. Brightwell, K. T. T. Pedretti, K. B. Wheeler, K. S. Hemmert, R. E. Riesen, K. D. Underwood, A. B. Maccabe, and T. B. Hudson. The Portals 4.0 network programming interface. Technical report, Sandia National Laboratories, 2012.
[7]
SAND2012-10087.
[8]
R. Belli and T. Hoefler. Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization. IEEE, May 2015. Accepted at IPDPS’15. J. C. Blanchette, T. Weber, M. Batty, S. Owens, and S. Sarkar. Nitpicking C++ concurrency. In Proceedings of the 13th International ACM SIGPLAN Symposium on Principles and Practices of Declarative Programming, PPDP ’11, 2011.
[9]
A. Bouajjani, E. Derevenetc, and R. Meyer. Checking and enforcing robustness against TSO. In Programming Languages and Systems - 22nd European Symposium on Programming, ESOP 2013, 2013.
[10]
S. Burckhardt and M. Musuvathi. Effective program verification for relaxed memory models. In Computer Aided Verification, 20th International Conference, CAV 2008, 2008.
[11]
S. Burckhardt, R. Alur, and M. M. K. Martin. Checkfence: checking consistency of concurrent data types on relaxed memory models. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, 2007.
[12]
J. Burnim, K. Sen, and C. Stergiou. Sound and complete monitoring of sequential consistency for relaxed memory models. In Tools and Algorithms for the Construction and Analysis of Systems - 17th International Conference, TACAS 2011, 2011.
[13]
D. Chen, N. A. Eisley, P. Heidelberger, R. M. Senger, Y. Sugawara, S. Kumar, V. Salapura, D. L. Satterfield, B. Steinmacher-Burow, and J. J. Parker. The IBM Blue Gene/q Interconnection Network and Message Unit. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11, pages 26:1–26:10, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0771-0.
[14]
G. Faanes, A. Bataineh, D. Roweth, T. Court, E. Froese, B. Alverson, T. Johnson, J. Kopnick, M. Higgins, and J. Reinhard. Cray Cascade: A scalable HPC system based on a Dragonfly network. In Proc. of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’12), pages 103:1–103:9. IEEE Computer Society, 2012. ISBN 978-1-4673- 0804-5. R. Gerstenberger, M. Besta, and T. Hoefler. Enabling Highlyscalable Remote Memory Access Programming with MPI-3 One Sided. In Proc. of the ACM/IEEE Supercomputing, SC ’13, pages 53:1–53:12, 2013.
[15]
S. Hefty. Scalable fabric interfaces, 2014. OpenFabrics International Developer Workshop 2014.
[16]
T. Hoefler, J. Dinan, R. Thakur, B. Barrett, P. Balaji, W. Gropp, and K. Underwood. Remote Memory Access Programming in MPI-3. Argonne National Laboratory, Tech. Rep, 2013.
[17]
N. S. Islam, M. W. Rahman, J. Jose, R. Rajachandrasekar, H. Wang, H. Subramoni, C. Murthy, and D. K. Panda. High performance RDMA-based design of HDFS over InfiniBand. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’12, pages 35:1– 35:35, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press. ISBN 978-1-4673-0804-5. D. Jackson. Software Abstractions: Logic, Language, and Analysis. The MIT Press, 2006. ISBN 0262101149.
[18]
N. Jiang, J. Kim, and W. J. Dally. Indirect adaptive routing on large scale interconnection networks. SIGARCH Comput. Archit. News, 37(3):220–231, June 2009. ISSN 0163-5964.
[19]
S. Kumar, A. Mamidala, D. A. Faraj, B. Smith, M. Blocksome, B. Cernohous, D. Miller, J. Parker, J. Ratterman, P. Heidelberger, D. Chen, and B. D. Steinmacher-Burrow. PAMI: A parallel active message interface for the Blue Gene/Q supercomputer. In Proc. of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’12), pages 763–773. IEEE Computer Society, 2012.
[20]
M. Kuperstein, M. T. Vechev, and E. Yahav. Automatic inference of memory fences. In Proceedings of 10th International Conference on Formal Methods in Computer-Aided Design, FMCAD 2010, 2010.
[21]
M. Kuperstein, M. T. Vechev, and E. Yahav. Partial-coherence abstractions for relaxed memory models. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, 2011.
[22]
G. Li, R. Palmer, M. DeLisi, G. Gopalakrishnan, and R. M. Kirby. Formal specification of MPI 2.0: Case study in specifying a practical concurrent programming API. Sci. Comput. Program., 76(2):65–81, Feb. 2011. ISSN 0167-6423.
[23]
A. Linden and P. Wolper. A verification-based approach to memory fence insertion in PSO memory systems. In Tools and Algorithms for the Construction and Analysis of Systems - 19th International Conference, TACAS 2013, 2013.
[24]
F. Liu, N. Nedev, N. Prisadnikov, M. T. Vechev, and E. Yahav. Dynamic synthesis for relaxed memory models. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, 2012.
[25]
Y. Meshman, A. M. Dan, M. T. Vechev, and E. Yahav. Synthesis of memory fences via refinement propagation. In Static Analysis - 21st International Symposium, SAS 2014, 2014.
[26]
B. Norris and B. Demsky. CDSchecker: checking concurrent data structures written with C/C++ atomics. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2013, 2013.
[27]
R. W. Numrich and J. Reid. Co-array Fortran for parallel programming. SIGPLAN Fortran Forum, 17(2):1–31, 1998.
[28]
OpenFabrics Alliance (OFA). OpenFabrics Enterprise Distribution (OFED) www.openfabrics.org, 2014.
[29]
S. Owens, S. Sarkar, and P. Sewell. A better x86 memory model: x86-TSO. In Theorem Proving in Higher Order Logics, 22nd International Conference, TPHOLs 2009, Munich, Germany, August 17-20, 2009. Proceedings, 2009.
[30]
C.-S. Park, K. Sen, P. Hargrove, and C. Iancu. Efficient data race detection for distributed memory parallel programs. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11, pages 51:1–51:12, New York, NY, USA, 2011. ACM. ISBN 978-1- 4503-0771-0. C. S. Park, K. Sen, and C. Iancu. Scaling data race detection for partitioned global address space programs. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS ’13, pages 47–58, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2130-3.
[31]
M. Poke and T. Hoefler. Dare: High-performance state machine replication on rdma networks. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’15, pages 107–118, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3550-8. 2749246.2749267.
[32]
V. Saraswat, G. Almasi, G. Bikshandi, C. Cascaval, D. Cunningham, D. Grove, S. Kodali, I. Peshansky, and O. Tardieu. The asynchronous partitioned global address space model. In AMP ’10: Proceedings of The First Workshop on Advances in Message Passing, June 2010.
[33]
S. Sarkar, P. Sewell, J. Alglave, L. Maranget, and D. Williams. Understanding POWER multiprocessors. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, 2011.
[34]
C. SPARC International, Inc. The SPARC Architecture Manual: Version 8. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1992. ISBN 0-13-825001-4. The InfiniBand Trade Association. Infiniband Architecture Spec. Vol. 1, Rel. 1.2. InfiniBand Trade Association, 2004.
[35]
E. Torlak, M. Vaziri, and J. Dolby. Memsat: checking axiomatic specifications of memory models. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, 2010.
[36]
UPC Consortium. UPC language specifications, v1.2. Technical report, Lawrence Berkeley National Laboratory, 2005. LBNL- 59208.
[37]
M. Valiev, E. J. Bylaska, N. Govind, K. Kowalski, T. P. Straatsma, H. J. Van Dam, D. Wang, J. Nieplocha, E. Apra, T. L. Windus, et al. NWChem: a comprehensive and scalable open-source solution for large scale molecular simulations. Computer Physics Communications, 181(9):1477–1489, 2010.

Cited By

View all
  • (2024)Semantics of Remote Direct Memory Access: Operational and Declarative Models of RDMA on TSO ArchitecturesProceedings of the ACM on Programming Languages10.1145/36897818:OOPSLA2(1982-2009)Online publication date: 8-Oct-2024
  • (2024)RMASanitizer: Generalized Runtime Detection of Data Races in Remote Memory Access ApplicationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673109(833-844)Online publication date: 12-Aug-2024
  • (2024)RMASanitizer: Generalized Runtime Detection of Data Races in Remote Memory Access ApplicationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673109(833-844)Online publication date: 12-Aug-2024
  • Show More Cited By

Index Terms

  1. Modeling and analysis of remote memory access programming

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 51, Issue 10
    OOPSLA '16
    October 2016
    915 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/3022671
    Issue’s Table of Contents
    • cover image ACM Conferences
      OOPSLA 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications
      October 2016
      915 pages
      ISBN:9781450344449
      DOI:10.1145/2983990
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 October 2016
    Published in SIGPLAN Volume 51, Issue 10

    Check for updates

    Badges

    • Distinguished Paper

    Author Tag

    1. Memory Model

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)25
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 15 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Semantics of Remote Direct Memory Access: Operational and Declarative Models of RDMA on TSO ArchitecturesProceedings of the ACM on Programming Languages10.1145/36897818:OOPSLA2(1982-2009)Online publication date: 8-Oct-2024
    • (2024)RMASanitizer: Generalized Runtime Detection of Data Races in Remote Memory Access ApplicationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673109(833-844)Online publication date: 12-Aug-2024
    • (2024)RMASanitizer: Generalized Runtime Detection of Data Races in Remote Memory Access ApplicationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673109(833-844)Online publication date: 12-Aug-2024
    • (2024)ALock: Asymmetric Lock Primitive for RDMA SystemsProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659977(15-26)Online publication date: 17-Jun-2024
    • (2023)Design Guidelines for Correct, Efficient, and Scalable Synchronization using One-Sided RDMAProceedings of the ACM on Management of Data10.1145/35892761:2(1-26)Online publication date: 20-Jun-2023
    • (2023)Data Center Ethernet and Remote Direct Memory Access: Issues at HyperscaleComputer10.1109/MC.2023.326118456:7(67-77)Online publication date: 1-Jul-2023
    • (2017)Automatic Verification of RMA Programs via Abstraction ExtrapolationVerification, Model Checking, and Abstract Interpretation10.1007/978-3-319-73721-8_3(47-70)Online publication date: 29-Dec-2017
    • (2017)Debugging Latent Synchronization Errors in MPI-3 One-Sided CommunicationTools for High Performance Computing 201610.1007/978-3-319-56702-0_5(83-96)Online publication date: 9-May-2017

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media