-
Notifications
You must be signed in to change notification settings - Fork 459
Add a new single function to execute serial code on the device #8123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Callbacks: good question now sure what the best thing is. Probably the new callbacks is the right idea. We could do something where if we don't find the new callbacks but we do find the begin/end parallel for we reuse them? Policy: we likely want one for consistency. We don't have any other patterns where you pass in an execution space instance for example. I am not sure that you need to prohibit all the other policy template arguments. And yes it's probably fine to create a RangePolicy for now and then use that construct under the hood. I.e. your implementation looks good to me for the algorithm. |
core/src/Kokkos_Parallel.hpp
Outdated
Kokkos::Tools::Impl::begin_single<ExecPolicy, FunctorType>(policy, str, kpID); | ||
|
||
auto closure = | ||
Kokkos::Impl::construct_with_shared_allocation_tracking_disabled< | ||
Impl::ParallelFor<WrapperType, ExecPolicy>>(functor_wrapper, policy); | ||
closure.execute(); | ||
|
||
Kokkos::Tools::Impl::end_single<FunctorType>(kpID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer just calling Kokkos::parallel_for
so that we can do all checks there and don't have to duplicate them. In think it's fair to use the same hooks instead of creating new ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem I see with using the same hook as parallel_for is that we want 2 different apis, one build over parallel_for
and the other over parallel_reduce
(to retrieve some value from the device).
By using a custom hook, we get the same 'single' hook for both interfaces, while if we reuse existing hooks, single
will appear as potentially two different functions in the tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need an overload that takes an execution space instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is necessary with @crtrott request of using a new policy?
If a call is
single("kernel_name", Kokkos::SinglePolicy<Kokkos::DefaultExecutionSpace, WorkTag>(), functor);
Do you think it would be useful to also offer
single("kernel_name", Kokkos::DefaultExecutionSpace(), functor);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the Kokkos::Graph
then
node, and when we'll have:
then the possibilities will be:
.then("label", Functor{});
.then("label", exec, Functor{});
.then<WorkTag>("label", Functor{})
.then<WorkTag>("label", exec, Functor{})
It seems that the single(...)
you're proposing will also showcase these 4 cases (and maybe the ones without label). Therefore, I'm not sure it's worth adding a SinglePolicy
to a single(...)
function with such few possibilities. Especially because I'm strongly against embedding the execution space instance within the execution policy(*).
(*) I'm strongly against have a SinglePolicy
with the execution space instance embedded in it. It's not the way to do it (i.e. other Kokkos
execution policies should not have had been implemented so, because the execution space instance is the where, the execution policy is the how, and the functor is the what, clearly 3 different things that must be clearly separated - not to mention having the 3 well separated better aligns with e.g. std::execution
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point regarding the exclusion of the ExecutionSpace
from the policy, and I agree, is everyone ok with that?
We also need the .then<WorkTag, ExecutionSpace>("label", Functor{})
I believe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need the
.then<WorkTag, ExecutionSpace>("label", Functor{})
I believe?
No, we don't need this one because the graph is already templated over the execution space, so the then
node is always using the one of its parent graph 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was focused on single
, where it may make sense to offer a single<WorkTag, ExecutionSpace>("label", Functor{})
overload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end, this file should not be modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The linked issue Feature request: Function to execute non-parallel code on the device #8080 mentions
Kokkos::on
. This PR introducesKokkos::single
.
Please explain clearly the rationale behind the naming. - I've already done similar things for the
then
node ofKokkos::Experimental::Graph
. It would be nice to merge this work and yours to avoid duplication.
Edit: I think it's best avoiding the on
as it could lead to misunderstanding issues in the future, e.g. with std::execution::on
(that is not doing the same thing as your - nice BTW - single
proposal).
core/src/Kokkos_Parallel.hpp
Outdated
template <class FunctorType, class WorkTag> | ||
struct Wrapper { | ||
FunctorType f; | ||
|
||
template <class W = WorkTag, | ||
class Enable = std::enable_if_t<!std::is_void_v<W>>> | ||
void KOKKOS_INLINE_FUNCTION operator()(const W& w, int) const { | ||
f(w); | ||
} | ||
|
||
template <class W = WorkTag, | ||
class Enable = std::enable_if_t<std::is_void_v<W>>> | ||
void KOKKOS_INLINE_FUNCTION operator()(int) const { | ||
f(); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please try to align this work with
kokkos/core/src/impl/Kokkos_GraphNodeThenImpl.hpp
Lines 27 to 34 in a51d897
template <typename Functor> | |
struct ThenWrapper { | |
Functor functor; | |
template <typename T> | |
KOKKOS_FUNCTION void operator()(const T) const { | |
functor(); | |
} | |
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing it out, I wasn't aware of this.
It seems ThenWrapper
doesn't take the potential WorkTag
into account however.
Should it be added or is it never relevant with then
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be relevant to then
one day.
For now, the graph API is such that then
does not take a policy argument, so there is no way to pass it a tag. But I guess it's something that should be possible to do - later.
@dalg24 What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is such a small wrapper that I would define it where it's needed for readability.
It needs to be in an Impl
namespace, though, and chnaging the name to something like Kokkos::Impl::SingleFunctorWrapper
makes it clear what it's supposed to be used for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define whatever you need, we can look at unifying as appropriate later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Within #8190, I've updated the ThenWrapper
to also allow a work tag. I think it's worth we converge on a single "wrapper".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the second interface of single (the one working like parallel_reduce), I need a second wrapper for function with an out parameter:
namespace Impl {
template <class FunctorType>
struct SingleFunctorWrapper {
FunctorType f;
template <class WorkTag>
void KOKKOS_INLINE_FUNCTION operator()(const WorkTag& w, int) const {
f(w);
}
void KOKKOS_INLINE_FUNCTION operator()(int) const { f(); }
};
template <class FunctorType>
struct SingleReductorFunctorWrapper {
FunctorType f;
template <class WorkTag, class ReturnType>
void KOKKOS_INLINE_FUNCTION operator()(const WorkTag& w, int, ReturnType& ret) const {
f(w, ret);
}
template <class ReturnType>
void KOKKOS_INLINE_FUNCTION operator()(int, ReturnType& ret) const {
f(ret);
}
};
}
(this is still WIP, there may be a better solution).
This was discussed in a previous developer meeting, @crtrott preferred to name it |
Signed-off-by: Paul Gannay <paul.gannay@cea.fr>
public: | ||
template <class... OtherProperties> | ||
SinglePolicy(const SinglePolicy<OtherProperties...>& p) | ||
: RangePolicy<Properties...>(p) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to add Kokkos::LaunchBounds<1>
😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review, but do not lose to much time reviewing the recent change, I'm moving code around because the cluster on which I was working is shutting down for maintenance and mistakenly pushed it to a public branch.
This is still WIP.
@@ -263,7 +265,7 @@ struct Kokkos_Profiling_EventSet { | |||
Kokkos_Tools_contextBeginFunction begin_tuning_context; | |||
Kokkos_Tools_contextEndFunction end_tuning_context; | |||
Kokkos_Tools_optimizationGoalDeclarationFunction declare_optimization_goal; | |||
char padding[232 * | |||
char padding[230 * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to change this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this padding is for so I may be wrong, but since I added new pointer in the structures, I tried to keep the alignment the same as before?
See here for a description of what we need.
PR opened to allow for discussion on this subject:
1 - Should we offer new callbacks (
begin_single
andend_single
) for the tools or should we rely on the already existingbegin_parallel_for
?I created the two new ones (not tested though), but we could also depends on
begin_parallel_for
andend_parallel_for
?2 - What kind of API do we want?
Accepting one of the existing policy wouldn't make a lot of sense since we are only launching a single thread. In fact, of the 5 arguments of an ExecutionPolicy (
ExecutionSpace
,Schedule
,IndexType
,LaunchBounds
, andWorkTag
), we only care for 2: theWorkTag
and theExecutionSpace
.One option would be to create a new
SinglePolicy
that takes only this two arguments.An other option would be to take this arguments without a policy directly as template arguments of
single
:single<WorkTag, ExecutionPolicy>("name", lambda);
or
single<ExecutionPolicy, WorkTag>("name", lambda);
(current implementation accepts both).3 - Should we rely on the implementation of
parallel_for
?single
can be seen as a shortcut forparallel_for(RangePolicy(0,1))
, so for the moment, this is what I implemented, but we may also want decouple the implementation by writing backend specific code forsingle
? I'm not sure it is worth the effort but I may not see all the consequences.