Add a new single function to execute serial code on the device #8123

PaulGannay · 2025-05-27T14:45:54Z

crtrott · 2025-05-27T15:06:33Z

core/src/Kokkos_Parallel.hpp

masterleinad · 2025-05-27T15:06:39Z

core/src/Kokkos_Parallel.hpp

+  Kokkos::Tools::Impl::begin_single<ExecPolicy, FunctorType>(policy, str, kpID);
+
+  auto closure =
+      Kokkos::Impl::construct_with_shared_allocation_tracking_disabled<
+          Impl::ParallelFor<WrapperType, ExecPolicy>>(functor_wrapper, policy);
+  closure.execute();
+
+  Kokkos::Tools::Impl::end_single<FunctorType>(kpID);


I would prefer just calling Kokkos::parallel_for so that we can do all checks there and don't have to duplicate them. In think it's fair to use the same hooks instead of creating new ones.

The problem I see with using the same hook as parallel_for is that we want 2 different apis, one build over parallel_for and the other over parallel_reduce (to retrieve some value from the device).
By using a custom hook, we get the same 'single' hook for both interfaces, while if we reuse existing hooks, single will appear as potentially two different functions in the tools.

masterleinad · 2025-05-27T15:12:08Z

core/src/Kokkos_Parallel.hpp

We also need an overload that takes an execution space instance.

Do you think it is necessary with @crtrott request of using a new policy?

If a call is

single("kernel_name", Kokkos::SinglePolicy<Kokkos::DefaultExecutionSpace, WorkTag>(), functor);

Do you think it would be useful to also offer

single("kernel_name", Kokkos::DefaultExecutionSpace(), functor);

For the Kokkos::Graph then node, and when we'll have:

core(graph): allow work tag for then node #8190

then the possibilities will be:

.then("label", Functor{});

.then("label", exec, Functor{});

.then<WorkTag>("label", Functor{})

.then<WorkTag>("label", exec, Functor{})

It seems that the single(...) you're proposing will also showcase these 4 cases (and maybe the ones without label). Therefore, I'm not sure it's worth adding a SinglePolicy to a single(...) function with such few possibilities. Especially because I'm strongly against embedding the execution space instance within the execution policy(*).

(*) I'm strongly against have a SinglePolicy with the execution space instance embedded in it. It's not the way to do it (i.e. other Kokkos execution policies should not have had been implemented so, because the execution space instance is the where, the execution policy is the how, and the functor is the what, clearly 3 different things that must be clearly separated - not to mention having the 3 well separated better aligns with e.g. std::execution).

I see your point regarding the exclusion of the ExecutionSpace from the policy, and I agree, is everyone ok with that?

We also need the .then<WorkTag, ExecutionSpace>("label", Functor{}) I believe?

We also need the .then<WorkTag, ExecutionSpace>("label", Functor{}) I believe?

No, we don't need this one because the graph is already templated over the execution space, so the then node is always using the one of its parent graph 😉

Sorry, I was focused on single, where it may make sense to offer a single<WorkTag, ExecutionSpace>("label", Functor{}) overload.

masterleinad · 2025-05-27T15:13:38Z

core/unit_test/default/TestDefaultDeviceDevelop.cpp

In the end, this file should not be modified.

romintomasetti

romintomasetti · 2025-05-27T18:58:43Z

core/src/Kokkos_Parallel.hpp

+template <class FunctorType, class WorkTag>
+struct Wrapper {
+  FunctorType f;
+
+  template <class W      = WorkTag,
+            class Enable = std::enable_if_t<!std::is_void_v<W>>>
+  void KOKKOS_INLINE_FUNCTION operator()(const W& w, int) const {
+    f(w);
+  }
+
+  template <class W      = WorkTag,
+            class Enable = std::enable_if_t<std::is_void_v<W>>>
+  void KOKKOS_INLINE_FUNCTION operator()(int) const {
+    f();
+  }
+};


Please try to align this work with

kokkos/core/src/impl/Kokkos_GraphNodeThenImpl.hpp

Lines 27 to 34 in a51d897

template <typename Functor>

struct ThenWrapper {

Functor functor;

template <typename T>

KOKKOS_FUNCTION void operator()(const T) const {

functor();

}

};

.

Thanks for pointing it out, I wasn't aware of this.

It seems ThenWrapper doesn't take the potential WorkTag into account however.
Should it be added or is it never relevant with then?

It might be relevant to then one day.

For now, the graph API is such that then does not take a policy argument, so there is no way to pass it a tag. But I guess it's something that should be possible to do - later.

@dalg24 What do you think ?

This is such a small wrapper that I would define it where it's needed for readability.
It needs to be in an Impl namespace, though, and chnaging the name to something like Kokkos::Impl::SingleFunctorWrapper makes it clear what it's supposed to be used for.

Define whatever you need, we can look at unifying as appropriate later.

Within #8190, I've updated the ThenWrapper to also allow a work tag. I think it's worth we converge on a single "wrapper".

With the second interface of single (the one working like parallel_reduce), I need a second wrapper for function with an out parameter:

namespace Impl { template <class FunctorType> struct SingleFunctorWrapper { FunctorType f; template <class WorkTag> void KOKKOS_INLINE_FUNCTION operator()(const WorkTag& w, int) const { f(w); } void KOKKOS_INLINE_FUNCTION operator()(int) const { f(); } }; template <class FunctorType> struct SingleReductorFunctorWrapper { FunctorType f; template <class WorkTag, class ReturnType> void KOKKOS_INLINE_FUNCTION operator()(const WorkTag& w, int, ReturnType& ret) const { f(w, ret); } template <class ReturnType> void KOKKOS_INLINE_FUNCTION operator()(int, ReturnType& ret) const { f(ret); } }; }

(this is still WIP, there may be a better solution).

PaulGannay · 2025-06-10T07:38:10Z

romintomasetti · 2025-06-23T07:57:02Z

core/src/Kokkos_ExecPolicy.hpp

+ public:
+  template <class... OtherProperties>
+  SinglePolicy(const SinglePolicy<OtherProperties...>& p)
+      : RangePolicy<Properties...>(p) {}


You'll need to add Kokkos::LaunchBounds<1> 😉

Thank you for the review, but do not lose to much time reviewing the recent change, I'm moving code around because the cluster on which I was working is shutting down for maintenance and mistakenly pushed it to a public branch.
This is still WIP.

romintomasetti · 2025-06-23T08:04:39Z

core/src/impl/Kokkos_Profiling_C_Interface.h

@@ -263,7 +265,7 @@ struct Kokkos_Profiling_EventSet {
  Kokkos_Tools_contextBeginFunction begin_tuning_context;
  Kokkos_Tools_contextEndFunction end_tuning_context;
  Kokkos_Tools_optimizationGoalDeclarationFunction declare_optimization_goal;
-  char padding[232 *
+  char padding[230 *


Any reason to change this ?

I'm not sure what this padding is for so I may be wrong, but since I added new pointer in the structures, I tried to keep the alignment the same as before?

PaulGannay added 2 commits May 27, 2025 16:31

Kokkos::single first draft

fe4882c

Correctly write single for device (CUDA)

fe818fb

masterleinad reviewed May 27, 2025

View reviewed changes

core/unit_test/default/TestDefaultDeviceDevelop.cpp Outdated

Copy link

Contributor

masterleinad May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end, this file should not be modified.

romintomasetti requested changes May 27, 2025

View reviewed changes

Introduce SinglePolicy

e9cae59

Signed-off-by: Paul Gannay <paul.gannay@cea.fr>

This was referenced Jun 19, 2025

core(graph): enforce unit launch bound #8192

Merged

core(graph): allow work tag for then node #8190

Open

romintomasetti reviewed Jun 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a new single function to execute serial code on the device #8123

Add a new single function to execute serial code on the device #8123

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add a new single function to execute serial code on the device #8123

Are you sure you want to change the base?

Add a new single function to execute serial code on the device #8123

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!