Open
Description
A colleague of mine has used the variadic templates of C++ to mimic a kernel launcher:
template<typename F, typename... Ts>
void launch2D(const dim3 & numBlocks, const dim3 & blockDim, F & f, Ts&&... ts)
{
for (int bx=0;bx<numBlocks.x;++bx)
for (int by=0;by<numBlocks.y;++by)
{
#pragma omp parallel num_threads(blockDim.x*blockDim.y)
{
const int tn = omp_get_thread_num();
const int tx = tn % blockDim.y;
const int ty = tn / blockDim.y;
f(numBlocks, blockDim, {bx,by}, {tx,ty}, ts...);
}
}
}
// ...
const dim3 threadsperBlock {BlockSize,BlockSize};
const dim3 numBlocks{N/threadsperBlock.x,N/threadsperBlock.y};
launch2D(numBlocks, threadsperBlock, matrix_multiplication_kernel<BlockSize>, a.data(), b.data(), c.data(), N);
This is kind of like the CUDA triple chevron
launch2d<<<numBlocks,threadsperBlock>>>(matrix_multiplication_kernel<BlockSize>, a.data(), b.data(), c.data(), N)
I suppose it's possible to do something similar with Fypp, Fortran and OpenMP/OpenACC/CUDA. I came up with the following solution, but it lacks encapsulation:
#:def LAUNCH1D(kernel, n)
block
integer :: i
!$omp parallel for simd
do i = 1, ${n}$
$:kernel
end do
!$omp end parallel for simd
end block
#:enddef
#:call LAUNCH1D
y(i) = a*x(i) + y(i)
#:nextarg
n
#:endcall
Metadata
Metadata
Assignees
Labels
No labels