A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
Integrating runtime compilation into existing CUDA applications can be tricky and time-consuming. Jitify aims to simplify this process by hiding the complexities behind a simple, high-level interface.
The latest version of Jitify is now available in jitify2.hpp
under the jitify2
namespace.
The old version is still available in jitify.hpp, but
will not be receiving further updates.
#include <jitify2.hpp>
#include <cuda_runtime_api.h>
int main() {
cudaFree(0); // Initialize CUDA context
std::string program_source = R"(
#include <cmath>
#include <cuda_fp16.h>
template <int N, typename T>
__global__ void my_kernel(T* data) { *data = std::pow(*data, T{N}); }
)";
float h_data = 3.f;
float* d_data;
cudaMalloc((void**)&d_data, sizeof(float));
cudaMemcpy(d_data, &h_data, sizeof(float), cudaMemcpyHostToDevice);
using jitify2::get_cuda_include_dir, jitify2::Program, jitify2::ProgramCache;
using jitify2::reflection::Template, jitify2::reflection::Type;
static ProgramCache<> cache(
/*max_size=*/100,
*Program("my_program", program_source)
// Preprocess source code and load all included headers.
->preprocess(
{"-I" + get_cuda_include_dir(), "-arch=sm_80", "-arch=sm_90"}));
dim3 grid(1), block(1);
cache
// Compile, link, and load the program, and obtain the loaded kernel.
.get_kernel(Template("my_kernel").instantiate(2, Type<float>()))
// Configure the kernel launch.
->configure(grid, block)
// Launch the kernel.
->launch(d_data);
return cudaDeviceSynchronize() != cudaSuccess;
}
Jitify provides/takes care of the following things:
- All NVRTC, nvJitLink, and CUDA Driver API calls.
- Simple kernel instantiation and launch syntax with unmangled names.
- Caching compiled kernels in memory and on disk.
- Convenient offline-preprocessing workflow that makes it easy to load and embed all required runtime headers into your application.
- JIT-safe standard library headers (e.g., float.h, stdint.h, limits etc.).
- Linking to pre-compiled PTX/CUBIN/FATBIN/object/library files.
- Easy error handling with optional exceptions.
- Support for all recent CUDA versions on both Linux and Windows.
- Support for pre-compiled headers, nvcc runtime compilation, source minification, and much more!
Things you can do with Jitify and NVRTC:
- Rapidly port existing code to use CUDA Runtime Compilation.
- Dramatically reduce code volume and offline-compilation times.
- Increase kernel performance by baking in runtime constants and autotuning.
Jitify is just a single header file:
#include <jitify2.hpp>
Link with: -ldl
(all cuda libraries are dynamically loaded at runtime by default)
The unit tests can be built and run using CMake as follows:
$ mkdir build && cd build && cmake ..
$ make check -j6
Note that the tests in jitify2_test.cu may also be useful as a form of documentation for many jitify features.
See jitify2_user_guide.md for a detailed guide on using jitify and all available options.
Doxygen documentation can be generated by running:
$ mkdir build && cd build && cmake ..
$ make doc
The HTML and LaTeX results are placed into the doc/ subdirectory.
BSD-3-Clause
Ben Barsdell (NVIDIA)
Kate Clark (NVIDIA)