8000 Kernel launcher · Issue #39 · aradi/fypp · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Kernel launcher #39
Open
Open
@ivan-pi

Description

@ivan-pi

A colleague of mine has used the variadic templates of C++ to mimic a kernel launcher:

template<typename F, typename... Ts>
void launch2D(const dim3 & numBlocks, const dim3 & blockDim, F & f, Ts&&... ts)
{
	for (int bx=0;bx<numBlocks.x;++bx)
	for (int by=0;by<numBlocks.y;++by)
	{
		#pragma omp parallel num_threads(blockDim.x*blockDim.y)
		{
			const int tn = omp_get_thread_num();
			const int tx = tn % blockDim.y;
			const int ty = tn / blockDim.y;
			f(numBlocks, blockDim, {bx,by}, {tx,ty}, ts...);
		}
	}
}

// ...

	const dim3 threadsperBlock {BlockSize,BlockSize};
	const dim3 numBlocks{N/threadsperBlock.x,N/threadsperBlock.y};
	launch2D(numBlocks, threadsperBlock, matrix_multiplication_kernel<BlockSize>, a.data(), b.data(), c.data(), N);

This is kind of like the CUDA triple chevron

launch2d<<<numBlocks,threadsperBlock>>>(matrix_multiplication_kernel<BlockSize>, a.data(), b.data(), c.data(), N)

I suppose it's possible to do something similar with Fypp, Fortran and OpenMP/OpenACC/CUDA. I came up with the following solution, but it lacks encapsulation:

#:def LAUNCH1D(kernel, n)
block
integer :: i
    !$omp parallel for simd
    do i = 1, ${n}$
        $:kernel
    end do
    !$omp end parallel for simd
end block
#:enddef

#:call LAUNCH1D
y(i) = a*x(i) + y(i)
#:nextarg
n
#:endcall

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0