[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3318170.3318192acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiwoclConference Proceedingsconference-collections
extended-abstract

Evaluating data parallelism in C++ using the Parallel Research Kernels

Published: 13 May 2019 Publication History

Abstract

The Parallel Research Kernels are a set of simple algorithms that correspond to popular classes of high-performance computing applications. We report on their use to evaluate parallel programming models based upon modern C++.

References

[1]
2019. Parallel Research Kernels. https://github.com/ParRes/Kernels
[2]
2019. Travis CI -- ParRes/Kernels. https://travis-ci.org/ParRes/Kernels
[3]
Vishakha Agrawal, Michael J. Voss, Pablo Reble, Vasanth Tovinkere, Jeff Hammond, and Michael Klemm. 2018. Visualization of OpenMP* Task Dependencies Using Intel® Advisor -- Flow Graph Analyzer. In Evolving OpenMP for Evolving Architectures, Bronis R. de Supinski, Pedro Valero-Lara, Xavier Martorell, Sergi Mateo Bellido, and Jesus Labarta (Eds.). Springer International Publishing, Cham, 175--188.
[4]
R. Belli and T. Hoefler. 2015. Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization. In 2015 IEEE International Parallel and Distributed Processing Symposium. 871--881.
[5]
James Dinan, Clement Cole, Gabriele Jost, Stan Smith, Keith Underwood, and Robert W. Wisniewski. 2014. Reducing Synchronization Overhead Through Bundled Communication. In OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools, Stephen Poole, Oscar Hernandez, and Pavel Shamis (Eds.). Springer International Publishing, Cham, 163--177.
[6]
H. Carter Edwards and Daniel Sunderland. 2012. Kokkos Array Performance-portable Manycore Programming Model. In Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM '12). ACM, New York, NY, USA, 1--10.
[7]
H. Carter Edwards, Daniel Sunderland, Vicki Porter, Chris Amsler, and Sam Mish. 2012. Manycore Performance-portability: Kokkos Multidimensional Array Library. Sci. Program. 20, 2 (April 2012), 89--114.
[8]
H. C. Edwards and C. R. Trott. 2013. Kokkos: Enabling Performance Portability Across Manycore Architectures. In 2013 Extreme Scaling Workshop (xsw 2013). 18--24.
[9]
H. Carter Edwards, Christian R. Trott, and Daniel Sunderland. 2014. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. J. Parallel and Distrib. Comput. 74, 12 (2014), 3202--3216. Domain-Specific Languages and High-Level Frameworks for High-Performance Computing.
[10]
Alessandro Fanfarillo and Jeff Hammond. 2016. CAF Events Implementation Using MPI-3 Capabilities. In Proceedings of the 23rd European MPI Users' Group Meeting (EuroMPI 2016). ACM, New York, NY, USA, 198--207.
[11]
Alessandro Fanfarillo and Davide Del Vento. 2017. Notified Access in Coarray Fortran. In Proceedings of the 24th European MPI Users' Group Meeting (EuroMPI '17). ACM, New York, NY, USA, Article 12, 7 pages.
[12]
Evangelos Georganas, Rob F. Van der Wijngaart, and Timothy G. Mattson. 2016. Design and Implementation of a Parallel Research Kernel for Assessing Dynamic Load-Balancing Capabilities. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 73--82.
[13]
The Khronos 'Group. {n. d.}. Open Source Parallel STL implementation. https://github.com/KhronosGroup/SyclParallelSTL
[14]
Georg Hager. 2019. The McCalpin STREAM benchmark: How do do it right and interpret the results. https://blogs.fau.de/hager/archives/8263
[15]
Richard D. Hornung and Jeffrey A. Keasler. 2014. The RAJA Portability Layer: Overview and Status. (9 2014).
[16]
Intel Corporation. {n. d.}. Threading Building Blocks (TBB). https://github.com/01org/tbb. https://www.threadingbuildingblocks.org/
[17]
ISO. 2017. ISO/IEC 14882:2017 Information technology --- Programming languages --- C++ (fifth ed.). International Organization for Standardization, Geneva, Switzerland. 1605 pages. https://www.iso.org/standard/68564.html
[18]
Hartmut Kaiser, Thomas Heller, Daniel Bourgeois, and Dietmar Fey. 2015. Higherlevel Parallelization for Local and Distributed Asynchronous Task-based Programming. In Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware (ESPM '15). ACM, New York, NY, USA, 29--37.
[19]
E. Kayraklioglu, W. Chang, and T. El-Ghazawi. 2017. Comparative Performance and Optimization of Chapel in Modern Manycore Architectures. In 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1105--1114.
[20]
Khronos OpenCL Working Group. 2012. The OpenCL Specification, Version 1.2, Aaftab Munshi (Ed.). https://www.khronos.org/registry/cl/specs/opencl-1.2.pdf
[21]
Lawrence Livermore National Laboratory. {n. d.}. RAJA Performance Portability Layer. https://github.com/LLNL/RAJA
[22]
Sandia National Laboratory. {n. d.}. Kokkos C++ Performance Portability Programming EcoSystem: The Programming Model -- Parallel Execution and Memory Abstraction. https://github.com/Kokkos/kokkos
[23]
Piotr R. Luszczek, David H. Bailey, Jack J. Dongarra, Jeremy Kepner, Robert F. Lucas, Rolf Rabenseifner, and Daisuke Takahashi. 2006. The HPC Challenge (HPCC) Benchmark Suite. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC '06). ACM, New York, NY, USA, Article 213.
[24]
Devin Matthews. {n. d.}. TBLIS (Tensor BLIS). https://github.com/devinamatthews/tblis
[25]
Timothy Mattson, Beverly Sanders, and Berna Massingill. 2004. Patterns for Parallel Programming (first ed.). Addison-Wesley Professional.
[26]
John McCalpin. 1995. Memory bandwidth and machine balance in high performance computers. IEEE Technical Committee on Computer Architecture Newsletter (12 1995), 19--25.
[27]
John D. McCalpin. 2015. STREAM: Sustainable Memory Bandwidth in High Performance Computers. https://www.cs.virginia.edu/stream/
[28]
Naveen Namashivayam, Bob Cernohous, Krishna Kandalla, Dan Pou, Joseph Robichaux, James Dinan, and Mark Pagel. 2018. Symmetric Memory Partitions in OpenSHMEM: A Case Study with Intel KNL. In OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, Manjunath Gorentla Venkata, Neena Imam, and Swaroop Pophale (Eds.). Springer International Publishing, Cham, 3--18.
[29]
OpenMP Architecture Review Board. 2015. OpenMP Aplication Program Interface -- Version 4.5. https://www.openmp.org/wp-content/uploads/openmp-4.5.pdf.
[30]
OpenMP Architecture Review Board. 2018. OpenMP Aplication Program Interface - Version 5.0. https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf.
[31]
S.J. Plimpton, R. Brightwell, C. Vaughan, K. Underwood, and M. Davis. 2006. A Simple Synchronous Distributed-Memory Algorithm for the HPCC RandomAccess Benchmark. In 2006 IEEE International Conference on Cluster Computing. 1--7.
[32]
Gopalakrishnan Santhanaraman, Sundeep Narravula, Amith. R. Mamidala, and Dhabaleswar K. Panda. 2007. MPI-2 One-Sided Usage and Implementation for Read Modify Write Operations: A Case Study with HPCC. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, Franck Cappello, Thomas Herault, and Jack Dongarra (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 251--259.
[33]
Tyler M. Smith, Robert A. van de Geijn, Mikhail Smelyanskiy, Jeff R. Hammond, and Field G. Van Zee. 2014. Anatomy of High-Performance Many-Threaded Matrix Multiplication. In 28th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2014).
[34]
Khronos® OpenCL™ Working Group SYCL™ subgroup. 2019. SYCL™ Specification. https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf, Ronan Keryell, Maria Rovatsou, and Lee Howes (Eds.).
[35]
Rob F. Van der Wijngaart, Evangelos Georganas, Timothy G. Mattson, and Andrew Wissink. 2017. A New Parallel Research Kernel to Expand Research on Dynamic Load-Balancing Capabilities. In High Performance Computing, Julian M. Kunkel, Rio Yokota, Pavan Balaji, and David Keyes (Eds.). Springer International Publishing, Cham, 256--274.
[36]
Rob F. Van der Wijngaart, Abdullah Kayi, Jeff R. Hammond, Gabriele Jost, Tom St. John, Srinivas Sridharan, Timothy G. Mattson, John Abercrombie, and Jacob Nelson. 2016. Comparing Runtime Systems with Exascale Ambitions Using the Parallel Research Kernels. In High Performance Computing, Julian M. Kunkel, Pavan Balaji, and Jack Dongarra (Eds.). Springer International Publishing, Cham, 321--339.
[37]
Rob F. Van der Wijngaart and Timothy G. Mattson. 2014. The Parallel Research Kernels: A tool for architecture and programming system investigation. In Proceedings of the IEEE High Performance Extreme Computing Conference. IEEE.
[38]
Rob F. Van der Wijngaart, Srinivas Sridharan, Abdullah Kayi, Gabrielle Jost, Jeff R. Hammond, Timothy G. Mattson, and Jacob E. Nelson. 2015. Using the Parallel Research Kernels to Study PGAS Models. In 2015 9th International Conference on Partitioned Global Address Space Programming Models. 76--81.
[39]
Field van Zee et al. {n. d.}. BLIS. https://github.com/flame/blis

Cited By

View all
  • (2022)Exploring the possibility of a hipSYCL-based implementation of oneAPIProceedings of the 10th International Workshop on OpenCL10.1145/3529538.3530005(1-12)Online publication date: 10-May-2022
  • (2022)Performance analysis of matrix-free conjugate gradient kernels using SYCLProceedings of the 10th International Workshop on OpenCL10.1145/3529538.3529993(1-10)Online publication date: 10-May-2022
  • (2022)CAMP: a Synthetic Micro-Benchmark for Assessing Deep Memory Hierarchies2022 IEEE/ACM International Workshop on Hierarchical Parallelism for Exascale Computing (HiPar)10.1109/HiPar56574.2022.00009(28-36)Online publication date: Nov-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
IWOCL '19: Proceedings of the International Workshop on OpenCL
May 2019
102 pages
ISBN:9781450362306
DOI:10.1145/3318170
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

In-Cooperation

  • Khronos: Khronos Group
  • Northeastern University
  • Codeplay: Codeplay Software Ltd.
  • Intel: Intel
  • The University of Bristol: The University of Bristol

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Check for updates

Author Tags

  1. C++
  2. heterogeneity
  3. parallelism

Qualifiers

  • Extended-abstract
  • Research
  • Refereed limited

Conference

IWOCL'19
IWOCL'19: International Workshop on OpenCL
May 13 - 15, 2019
MA, Boston, USA

Acceptance Rates

IWOCL '19 Paper Acceptance Rate 13 of 33 submissions, 39%;
Overall Acceptance Rate 84 of 152 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)2
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Exploring the possibility of a hipSYCL-based implementation of oneAPIProceedings of the 10th International Workshop on OpenCL10.1145/3529538.3530005(1-12)Online publication date: 10-May-2022
  • (2022)Performance analysis of matrix-free conjugate gradient kernels using SYCLProceedings of the 10th International Workshop on OpenCL10.1145/3529538.3529993(1-10)Online publication date: 10-May-2022
  • (2022)CAMP: a Synthetic Micro-Benchmark for Assessing Deep Memory Hierarchies2022 IEEE/ACM International Workshop on Hierarchical Parallelism for Exascale Computing (HiPar)10.1109/HiPar56574.2022.00009(28-36)Online publication date: Nov-2022
  • (2021)K-Athena: A Performance Portable Structured Grid Finite Volume Magnetohydrodynamics CodeIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.301001632:1(85-97)Online publication date: 1-Jan-2021
  • (2020)Evaluating Gather and Scatter Performance on CPUs and GPUsProceedings of the International Symposium on Memory Systems10.1145/3422575.3422794(209-222)Online publication date: 28-Sep-2020
  • (2019)An Approach for Indirectly Adopting a Performance Portability Layer in Large Legacy Codes2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)10.1109/P3HPC49587.2019.00009(36-49)Online publication date: Nov-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media