This release contains new features, bug fixes, and build improvements.
Please download the RAJAPerf-v2025.03.0.tar.gz file below. The others will not work due to the way RAJAPerf uses git submodules.
-
New features and usage changes:
- Added option to print algorithmic complexity of each kernel (at request of benchmarking team).
- Removed older RAJA reductions from OpenMP target offload variants of kernels with reductions. The officially supported RAJA reductions for OpenMP target offload are the newer
valop
reductions. - Added resource argument to all RAJA kernel variants for consistency and following RAJA usage recommendations.
- Added
EMPTY
kernel to Basic group that does nothing inside a loop body to measure the minimal cost of launching a kernel. - Added
FEMSWEEP
kernel, which represents a FEM-based linear sweep used in deterministic transport codes. - Added kernel launch tunings to the LTIMES and LTIMES_NOVIEW kernels that use the RAJA::launch API. This is intended to be used to understand performance differences between the RAJA::kernel and RAJA::launch APIs.
- Added RAJA Views to base variants of LTIMES kernel.
- Added citation on GitHub project page to P3HPC paper presented at SC24 on using Caliper and Thicket in RAJA Performance Suite.
- Add a command line option to enable custom scan tunings, default to on.
- For
comm
kernels, modified the MPI buffer allocation to do one large allocation and dole it out with alignment specified by the--align
option. - Added caliper configuration information to some build scripts as examples on how to use.
-
Build changes / improvements:
- The RAJA submodule has been updated to v2025.03.2.
- The BLT submodule has been updated to v0.7.0, which is the version used by the RAJA submodule version.
- Kokkos submodule updated to v3.7.02.
-
Bug fixes / improvements:
- Fixes for Windows builds.
- Fixed memory issue in SYCL variants of FIR kernel.
- Fixed issues in OpenMP target offload variants of HISTOGRAM and MULTI_REDUCE kernels.
- FIxed issue where multiple Caliper files get generated erroneously for a single run of the Suite.
- Fixed some potential race condition issue in how data copies are handled in the kernels.
- Get HIP wavefront size from RAJA configuration rather than hard code it in multiple kernels.
- Fixed hang in HIP custom scan implementation with warp size 32, on consumer cards such as Radeon 7900 XTX.
- Fixed compilation issues related to Kokkos.