diff --git a/rfcs/proposed/task_group_dynamic_dependencies/README.md b/rfcs/proposed/task_group_dynamic_dependencies/README.md
index b2060831c2..1fe0aba95d 100755
--- a/rfcs/proposed/task_group_dynamic_dependencies/README.md
+++ b/rfcs/proposed/task_group_dynamic_dependencies/README.md
@@ -157,7 +157,7 @@ work's state.
We therefore think predecessors may be in any state when they are added,
as shown below:
-
+
There are a number of possible options to spell a function for adding
a single predecessor. Additionally, we may also want a function to allow
@@ -209,7 +209,7 @@ subsequent recursively generated tasks) must be able to create new tasks
and then update the graph for their outer merge task to wait for the
newly created subtasks to complete.
-
+
A key point of this recursive parallel algorithm is the requirement to change
the predecessors of the merge tasks. However, the merge tasks are already
@@ -260,7 +260,7 @@ to complete.
The dependency graph for this example is:
-
+
#### Predecessors in Unknown States
@@ -290,7 +290,7 @@ regardless of state of the task it represents. Any predecessor that is already
completed when it is added as a predecessor will not delay the start of the dependent
task.
-
+
#### Recursive Decomposition
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/add_dependency.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/add_dependency.png
old mode 100755
new mode 100644
similarity index 100%
rename from rfcs/proposed/task_group_dynamic_dependencies/add_dependency.png
rename to rfcs/proposed/task_group_dynamic_dependencies/assets/add_dependency.png
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/fibonacci.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/fibonacci.png
new file mode 100644
index 0000000000..2e312ad270
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/fibonacci.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/merge_sort.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/merge_sort.png
old mode 100755
new mode 100644
similarity index 100%
rename from rfcs/proposed/task_group_dynamic_dependencies/merge_sort.png
rename to rfcs/proposed/task_group_dynamic_dependencies/assets/merge_sort.png
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/n_body_rectangle_split.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/n_body_rectangle_split.png
new file mode 100644
index 0000000000..3ec6bc8b7f
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/n_body_rectangle_split.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/n_body_rectangle_split_graph.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/n_body_rectangle_split_graph.png
new file mode 100644
index 0000000000..bbdd3139e1
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/n_body_rectangle_split_graph.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/n_body_triangle.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/n_body_triangle.png
new file mode 100644
index 0000000000..41139c1da5
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/n_body_triangle.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/n_body_triangle_split.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/n_body_triangle_split.png
new file mode 100644
index 0000000000..7f1fd46d56
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/n_body_triangle_split.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/n_body_triangle_split_graph.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/n_body_triangle_split_graph.png
new file mode 100644
index 0000000000..cbe87d1490
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/n_body_triangle_split_graph.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_diagram.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_diagram.png
new file mode 100644
index 0000000000..2a1ab5de6b
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_diagram.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_tasks1.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_tasks1.png
new file mode 100644
index 0000000000..c089b99c98
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_tasks1.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_tasks2.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_tasks2.png
new file mode 100644
index 0000000000..fd18a1b3cb
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_tasks2.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_tasks3.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_tasks3.png
new file mode 100644
index 0000000000..65fdf33012
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_tasks3.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_tasks4.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_tasks4.png
new file mode 100644
index 0000000000..6817c20c49
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_tasks4.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_tasks5.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_tasks5.png
new file mode 100644
index 0000000000..a4cc158563
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_tasks5.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_tasks6.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_tasks6.png
new file mode 100644
index 0000000000..9fa7ab473c
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_tasks6.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_tasks7.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_tasks7.png
new file mode 100644
index 0000000000..752752e77d
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/parser_tasks7.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/predecessor_successor_implementation.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/predecessor_successor_implementation.png
new file mode 100644
index 0000000000..049fc19ca0
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/predecessor_successor_implementation.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/three_task_graph.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/three_task_graph.png
old mode 100755
new mode 100644
similarity index 100%
rename from rfcs/proposed/task_group_dynamic_dependencies/three_task_graph.png
rename to rfcs/proposed/task_group_dynamic_dependencies/assets/three_task_graph.png
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/transferring_between_two_basic.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/transferring_between_two_basic.png
new file mode 100644
index 0000000000..06437264af
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/transferring_between_two_basic.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/transferring_between_two_merged_tracking.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/transferring_between_two_merged_tracking.png
new file mode 100644
index 0000000000..20cd7b3ac6
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/transferring_between_two_merged_tracking.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/transferring_between_two_merged_tracking_new_successors.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/transferring_between_two_merged_tracking_new_successors.png
new file mode 100644
index 0000000000..be6296cbae
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/transferring_between_two_merged_tracking_new_successors.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/transferring_between_two_separate_tracking.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/transferring_between_two_separate_tracking.png
new file mode 100644
index 0000000000..8b2a512859
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/transferring_between_two_separate_tracking.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/transferring_between_two_separate_tracking_new_successors.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/transferring_between_two_separate_tracking_new_successors.png
new file mode 100644
index 0000000000..ded782a78e
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/transferring_between_two_separate_tracking_new_successors.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/unknown_states.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/unknown_states.png
old mode 100755
new mode 100644
similarity index 100%
rename from rfcs/proposed/task_group_dynamic_dependencies/unknown_states.png
rename to rfcs/proposed/task_group_dynamic_dependencies/assets/unknown_states.png
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront.png
new file mode 100644
index 0000000000..1d49eb8760
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_dependencies.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_dependencies.png
new file mode 100644
index 0000000000..425cea83f2
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_dependencies.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_classic_dependencies.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_classic_dependencies.png
new file mode 100644
index 0000000000..6ecbf92afa
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_classic_dependencies.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_combined_dependencies.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_combined_dependencies.png
new file mode 100644
index 0000000000..8f8d0c0d97
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_combined_dependencies.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_eager_first_division.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_eager_first_division.png
new file mode 100644
index 0000000000..de28c73341
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_eager_first_division.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_eager_second_division_complete.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_eager_second_division_complete.png
new file mode 100644
index 0000000000..25854d7cd9
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_eager_second_division_complete.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_eager_second_division_north.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_eager_second_division_north.png
new file mode 100644
index 0000000000..82a357fae4
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_eager_second_division_north.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_eager_second_division_west.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_eager_second_division_west.png
new file mode 100644
index 0000000000..d1f6519481
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_eager_second_division_west.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_eager_second_division_west_new_dependencies.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_eager_second_division_west_new_dependencies.png
new file mode 100644
index 0000000000..aa4aa5f371
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_eager_second_division_west_new_dependencies.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_no_dependencies.png b/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_no_dependencies.png
new file mode 100644
index 0000000000..5dc9863943
Binary files /dev/null and b/rfcs/proposed/task_group_dynamic_dependencies/assets/wavefront_recursive_no_dependencies.png differ
diff --git a/rfcs/proposed/task_group_dynamic_dependencies/extended_semantics.md b/rfcs/proposed/task_group_dynamic_dependencies/extended_semantics.md
new file mode 100644
index 0000000000..39208065b7
--- /dev/null
+++ b/rfcs/proposed/task_group_dynamic_dependencies/extended_semantics.md
@@ -0,0 +1,1453 @@
+# API and semantics details for task group dynamic dependencies
+
+*Note:* This document is a sub-RFC of the [umbrella RFC for task group dynamic dependencies](README.md).
+
+## Introduction
+
+This document contains a concrete API and semantics proposal for ``task_group`` extensions defined in the parent RFC.
+The following cases are covered:
+* ``task_group`` extensions allowing handling the tasks in various states: created, submitted, executing and completed.
+ Existing API only allows having a non-empty ``task_handle`` object handling the created task and an empty one that does not
+ handle any tasks.
+* API for setting the dependencies between tasks in various states.
+* API for transferring the dependencies from the executing task to the task in various states.
+
+## Semantic extension for handling tasks in various states
+
+The parent proposal for extending the task group with APIs for setting dynamic dependencies defines the following states of the task in the ``task_group``:
+* Created
+* Submitted
+* Executing
+* Completed
+
+Practically, the task is in `created` state when the ``task_group::defer`` was called and the task was registered in the ``task_group``, but one of the submission methods (such as ``task_group::run``) was not yet called.
+
+The task state is changed to `submitted` when one of the submission methods was called and the task may be scheduled for the execution if all the
+dependent tasks are completed.
+
+The state is changed to `executing` when all of the dependent tasks are completed and some thread takes the task for execution is executing the
+corresponding task body.
+
+Once the thread finishes executing the task body, the state of the task is changed to `completed`.
+
+The parent RFC proposes to extend possible tasks states that can be handled by ``task_handle`` object to be tasks in any states defined above. Such a change
+requires changing the semantics of the task submission methods (such as ``task_group::run`` and others) while working with various states handled by ``task_handle``.
+
+Unlike this approach, this document proposes keeping the ``task_handle`` as a unique-owning handle that owns the task in created state or does not owns the task.
+For handling the task in other states, it is proposed to add a new weak-owning handle ``task_tracker`` that can handles the task in any state.
+
+``task_tracker`` object can be obtained by constructing from the ``task_handle`` object owning the created task. ``task_handle`` cannot be constructed using the
+``task_tracker`` argument.
+
+```cpp
+tbb::task_group tg;
+
+tbb::task_handle handle = tg.defer([] {/*...*/}); // task is in created state, handle owns the task
+tbb::task_tracker tracker = handle; // create the tracker for the task owned by the handle
+
+tg.run(std::move(handle)); // task is in submitted state, handle is left in an empty state
+
+// tracker is non empty and can be used to set dependencies
+```
+
+In this case, no semantic changes are needed for the submission methods, because the ``task_handle`` semantics is not changed.
+
+Alternative approaches for handling tasks in different states are described in the [separate section](#alternative-approaches).
+
+## Semantics for setting dependencies between tasks
+
+Lets consider creating a predecessor-successor dependency between the task tracked or owned by ``predecessor``
+and the task handled by ``successor_task_handle`` -
+``task_group::make_edge(predecessor_task_handle, successor_task_handle)``.
+
+As it was stated in the parent RFC document, we would like to allow adding predecessors in any state described above and to limit the successor to be a task in created state since it can be too late to add predecessors to
+the task in executing or completed state.
+
+The second limitation is handled by limiting the successor argument to be only ``task_handle``.
+
+Lets consider the different states of the task tracked or handled by ``predecessor``.
+
+If the predecessor task is in any state except the completed one (created/scheduled/running), the API registers the successor task
+in the list of successors on the predecessor side and increase the corresponding reference counter on the successor side to ensure it
+would not be executed before the predecessor task. The successor task can only start executing once the associated reference counter is equal to 0.
+
+If the predecessor task is in `completed` state, the API has no effect in terms of modifying the list of successors and reference counters since no additional
+dependencies are required and the successor task can be executed if all other dependent tasks are executed as well.
+
+If the predecessor task state has changed while registering the task as a predecessor for any task, the API should react accordingly to make sure
+adding dependencies and increasing the corresponding reference counters are not done for completed tasks.
+
+Implementation-wise, this API requires adding a list of successors into the predecessor task and adding the new vertex instance that corresponds
+to the successor task. This vertex would contain the reference counter and a pointer to the successor task itself. Each element in the task successor list
+is a pointer to the vertex instance of the successor task.
+
+The vertex instance is created once the first task is registered as a predecessor and is reused by any other predecessors.
+
+Once the predecessor task is completed, it should go through the list of successor vertices and decrement the reference counter. Once the successor's
+reference counter is equal to 0, the successor task can be scheduled for execution.
+
+API-wise, the function that decreases the reference counter may also return the pointer to the task. If the reference counter is not equal to 0, the
+returned pointer is ``nullptr``. Otherwise, the successor task pointer is returned. It is required to allow bypassing one of the successor tasks
+if the body of the predecessor task did not return other task that should be bypassed.
+
+This implementation approach is illustrated in the picture below:
+
+
+
+## Semantics for transferring successors from the currently executing task to the other task
+
+Lets consider the use-case where the successors of the task ``current`` are transferred to the task ``target`` owned or tracked by the ``target_handler``.
+In this case, the API ``tbb::task_group::current_task::transfer_successors_to(target_handler)`` should be called from the body of ``current``.
+
+As it was mentioned in the parent RFC, if ``transfer_successors_to`` is called outside of task belonging to the same ``task_group``, the behavior is
+undefined.
+
+Current proposal limits the applicability of this API and allows only transferring from the executing task to the created task.
+Transferring to tasks in `scheduled`, `executing` or `completed` states are described in a [separate section](#transferring-successors-to-the-task-in-any-state)
+and can be added in the future.
+
+The call to `transfer_successors_to` should merge together the successors list of ``current`` and ``target`` and set ``target`` to have the merged
+successors list. If should be thread-safe to add new successors to ``current`` by using the ``make_edge`` API.
+
+
+
+It is clear that while transferring from ``current`` to ``target`` the successors list of ``target`` should contain both previous successors of ``target``
+and the successors of ``current``.
+
+Interesting aspect is what should be done with the successors list of ``current``.
+
+The first option is to consider ``current`` and ``target`` a separate tasks even after the transferring the successors from one to another.
+
+In this case, after the transfer, the task ``current`` will have an empty successors list, and ``target`` will have a merged successors list:
+
+
+
+After the transfer, the successors of ``current`` and ``target`` are tracked separately and adding new successors to one of them would only
+affect the successors list of one task:
+
+
+
+Alternative approach is to keep tracking ``current`` and ``target`` together after transferring. This requires introducing a special state
+of ``task_tracker` - a kind of `proxy` state. If the executing task calls `transfer_successors_to` from the body, all trackers, initially created
+to track this task are changing the state to `proxy`.
+
+Tracker to the ``current`` and ``target`` are sharing the single merged list of successors if ``current`` tracker is in `proxy` state.
+
+
+
+Any changes in the successors list operated on ``current`` or ``target`` will modify the same list of successors - adding or transferring will modify the
+state of both "real" and proxy tasks:
+
+
+
+Racing use-case should be considered for each approach - when new successors are added to task ``A`` while it is transferring it's successors to ``B``.
+
+There are two options how the actual modifications of the ``A`` successors list can be linearized - the new successor can be added before actual transferring
+the entire list to ``B`` or after that.
+
+If the successors of ``A`` and ``B`` are tracked separately (the first option described), if the new successor was added before the transfer, the new successor
+would be transferred to ``B`` together with the entire list.
+
+If the transferring was done before adding the successor - the new successor would be added to ``A`` only and would not appear in the successors list of ``B``.
+
+Hence, the deterministic predecessor-successor relationships cannot be guaranteed in this case, because of the logical race between ``make_edge`` and ``transfer_successors_to``.
+
+If the successors of ``A`` and ``B`` are tracked together (the second option described) in both linearization case the newly added successor will appear in both
+successors list of ``B`` ("real" task) and ``A`` (task in `proxy` state).
+
+As it is stated in [one of the advanced examples topic](#combination-of-eager-and-classic-approaches), there are real-life scenarios where the combined tracking
+should be implemented. Generally, these are examples where one thread is setting a predecessor-successor dependency between a task that can be in the running state
+(using a `task_tracker` to such a task). Other thread can execute the tracked task and can replace it in the task graph with the subtree by calling `transfer_successors_to`.
+
+Hence, the merged successors tracking approach is proposed by this document.
+
+## Functionality removed from the proposal
+
+### Adding successors to the currently executing task
+
+Initially, it was proposed to allow adding successor tasks to the currently executing task:
+
+```cpp
+namespace oneapi {
+namespace tbb {
+class task_group {
+public:
+ struct current_task {
+ static void add_successor(task_handle& successor);
+ };
+};
+} // namespace tbb
+} // namespace oneapi
+```
+
+The usage of such a function is the following:
+
+```cpp
+tbb::task_group tg;
+
+tg.run_and_wait([&tg] {
+ tbb::task_handle succ1 = tg.defer(...);
+ tbb::task_handle succ2 = tg.defer(...);
+
+ tbb::task_group::current_task::add_successor(succ1);
+ tbb::task_group::current_task::add_successor(succ2);
+
+ tg.run(std::move(succ1));
+ tg.run(std::move(succ2));
+
+ // Run current task's computations
+ run_computations();
+});
+```
+
+The `add_successor` functionality was removed from the proposal because the scenario above can be covered by changing the ordering of operations and
+run successors for execution (or even bypass one of them) after running the computations:
+
+```cpp
+tbb::task_group tg;
+
+tg.run_and_wait([&tg] -> tbb::task_handle {
+ tbb::task_handle succ1 = tg.defer(...);
+ tbb::task_handle succ2 = tg.defer(...);
+
+ // Run current task's computations
+ run_computations();
+
+ tg.run(std::move(succ1)); // run
+ return std::move(succ2); // bypass
+})
+```
+
+### Functions for tracking the task progress
+
+The earlier version of this proposal included special member functions for ``task_tracker`` class allowing to get the information about the current state
+of the tracked task:
+
+```cpp
+namespace oneapi {
+namespace tbb {
+class task_tracker {
+public:
+ bool was_submitted() const;
+ bool is_completed() const;
+};
+} // namespace tbb
+} // namespace oneapi
+```
+
+``was_submitted`` returns ``true`` if the tracked task was submitted for execution using one of the APIs in ``task_group`` or ``task_arena`` (e.g. ``task_group::run``).
+``is_completed`` returns ``true`` if the execution of the tracked task is completed by some TBB thread.
+
+These APIs were removed from the proposal because of the following reasons:
+* Lack of use-cases for these functions. All of the considered examples could be implemented without using these functions.
+* Unclear semantics for these function when the tracked task is executing ``task_group::current_task::transfer_successors_to``. Consider a task ``A`` that transfers
+ it's successors to the task ``B`` and exits the body. On the one hand, task ``A`` was already submitted and completed and both described functions should return ``true``.
+ On the other hand, the task ``A`` replaces itself in the task graph with the task ``B`` and it may be expected that the tracker would now track ``B`` (similarly to how
+ `make_edge` is done - new edges added to ``A`` after transferring are added to ``B``).
+
+## Potential future enhancements
+
+### ``empty_task`` API
+
+Since it is only allowed to transfer successors to a single task only, for some use-cases (see [N-body example](#n-bodies-problem) as a reference) it
+is needed to create a single synchronization-only empty task to combine separate end-tasks in the subgraph and transfer successors to this task:
+
+```cpp
+// N-body rectangle split
+tbb::task_handle left_lower_rectangle_task = tg.defer(...);
+tbb::task_handle right_upper_rectangle_task = tg.defer(...);
+tbb::task_handle left_upper_rectangle_task = tg.defer(...);
+tbb::task_handle right_lower_rectangle_task = tg.defer(...);
+
+tbb::task_handle empty_sync_task = tg.defer([] {});
+
+tbb::task_group::make_edge(left_lower_rectangle_task, left_upper_rectangle_task);
+tbb::task_group::make_edge(left_lower_rectangle_task, right_lower_rectangle_task);
+tbb::task_group::make_edge(right_upper_rectangle_task, left_upper_rectangle_task);
+tbb::task_group::make_edge(right_upper_rectangle_task, right_lower_rectangle_task);
+
+tbb::task_group::make_edge(left_upper_rectangle_task, empty_sync_task);
+tbb::task_group::make_edge(right_lower_rectangle_task, empty_sync_task);
+
+tbb::task_group::current_task::transfer_successors_to(empty_sync_task);
+```
+
+In the current TBB implementation, ``empty_sync_task`` would be spawned (or bypassed) and executed anyway that can add some overhead for scheduling.
+
+Possible extension is to add some special API for `task_group` to provide an `empty_task` of a special type that would not be scheduled for execution
+and skipped when the dependent tasks are completed:
+
+```cpp
+auto empty_sync_task = tg.get_empty_task();
+
+...
+
+tbb::task_group::make_edge(left_upper_rectangle_task, empty_sync_task);
+tbb::task_group::make_edge(right_upper_rectangle_task, empty_sync_task);
+
+// empty_sync_task would not be spawned and only used for dependency tracking
+```
+
+### Transferring successors to the task in any state
+
+Current proposal limits the functionality of `transfer_successors_to` by accepting only a task in created state as a recipient. It is done by allowing
+only `task_handle` as an argument to `transfer_successors_to`.
+
+Currently considered use-cases did not shown the necessity to transfer the successors to the task in any state, but if such a use case appear in the future,
+the semantics of `transfer_successors_to` can be extended by providing the new overload:
+
+```cpp
+namespace oneapi {
+namespace tbb {
+
+class task_group {
+public:
+ struct current_task {
+ static void transfer_successors_to(task_tracker& task);
+ };
+};
+
+} // namespace tbb
+} // namespace oneapi
+```
+
+Implementation-wise, it means that merged successors tracking should not only be applied to two tasks ("sender" of the successors and the recipient), but to
+the set of tasks, since one task can transfer it's successors to the running task that also performing `transfer_successors_to` and so on.
+
+### Using a ``task_tracker`` as a successor
+
+Current proposal only allows the tasks in created state as a successors. It is implemented by allowing only
+the ``task_handle`` as a second argument for ``make_edge``:
+
+```cpp
+namespace oneapi {
+namespace tbb {
+class task_group {
+ static void make_edge(task_handle& pred, task_handle& succ);
+ static void make_edge(task_tracker& pred, task_handle& succ);
+};
+} // namespace tbb
+} // namespace oneapi
+```
+
+Two enhancements for this API can be considered:
+* Allow ``task_tracker`` tracking the task in created state to be used as a successor.
+* Allow ``task_tracker`` tracking the task `T` in submitted state to be used as a successor if other non-submitted
+ predecessors of ``T`` prevent it from being scheduled for execution.
+
+These enhancements require the following APIs to be added:
+
+```cpp
+namespace oneapi {
+namespace tbb {
+class task_group {
+ static void make_edge(task_handle& pred, task_tracker& succ);
+ static void make_edge(task_tracker& pred, task_tracker& succ);
+};
+} // namespace tbb
+} // namespace oneapi
+```
+
+If the ``task_tracker`` tracking the task that is scheduled for execution, executing or completed, is used
+as a second argument, the behavior of APIs above is undefined.
+
+## Advanced examples
+
+This section describes the applicability of the proposal to the advanced scenarios that were not considered as part of the parent RFC.
+
+### Recursive Fibonacci
+
+The Recursive Fibonacci is an example of parallel computation of N-th Fibonacci number. Computation of each *N*-th Fibonacci number is done by computing *N-1*-th number and *N-2*-th number
+in parallel and computing a sum of these numbers after completing the computations.
+
+
+
+Possible implementation of this example is part of [oneTBB migration examples](https://github.com/uxlfoundation/oneTBB/tree/master/examples/migration).
+
+The example can be easily implement also using the proposed API. Each task computing *N*-th Fibonacci number should create two subtasks `subtask1` and `subtask2` for computations of *N-1*-th and *N-2*-th numbers
+respectfully, and the `merge_sum` task for calculating the sum of `subtask1` and `subtask2` results.
+
+Since `merge_sum` needs to be executed only after the computations of `subtask1` and `subtask2` are completed, there should be a predecessor-successor dependency between `subtask1` and `merge_sum` and
+between `subtask2` and `merge_sum`.
+
+To preserve the predecessor-successor relations of the entire task graph, currently executing task should transfer it's successors to `merge_sum` to replace itself in the graph with this task.
+
+```cpp
+int serial_fibonacci(int n) {
+ if (n == 0 || n == 1) { return n; }
+ return serial_fibonacci(n - 1) + serial_fibonacci(n - 2);
+}
+
+constexpr int serial_fibonacci_cutoff = 25;
+
+void recursive_fibonacci_number(tbb::task_group& tg, int n, int* result_placeholder) {
+ if (n <= serial_fibonacci_cutoff) {
+ *result_placeholder = serial_fibonacci(n);
+ } else {
+ int* subtask1_placeholder = new int(0);
+ int* subtask2_placeholder = new int(0);
+
+ tbb::task_handle subtask1 = tg.defer([&tg, n, subtask1_placeholder] {
+ recursive_fibonacci_number(tg, n - 1, subtask1_placeholder);
+ });
+
+ tbb::task_handle subtask2 = tg.defer([&tg, n, subtask2_placeholder] {
+ recursive_fibonacci_number(tg, n - 2, subtask2_placeholder);
+ });
+
+ tbb::task_handle merge_sum = tg.defer([=] {
+ *result_placeholder = *subtask1_placeholder + *subtask2_placeholder;
+ delete subtask1_placeholder;
+ delete subtask2_placeholder;
+ });
+
+ tbb::task_group::make_edge(subtask1, merge_sum);
+ tbb::task_group::make_edge(subtask2, merge_sum);
+
+ tbb::task_group::current_task::transfer_successors_to(merge_sum);
+
+ tg.run(std::move(subtask1));
+ tg.run(std::move(subtask2));
+ tg.run(std::move(merge_sum));
+ }
+}
+
+int fibonacci_number(int n) {
+ tbb::task_group tg;
+ int result = 0;
+ tg.run_and_wait([&tg, &result, n] {
+ recursive_fibonacci_number(tg, n, &result);
+ });
+ return result;
+}
+
+```
+
+### N-bodies problem
+
+Parallel N-bodies simulation is an example simulating a set of N planetary bodies drifting in space. The algorithm is computing successive positions of each body
+for successive moments in time. It first computes the gravitational force acting on each body as the sum of forces exerted on it by all of the other bodies, and
+adjusts the velocity of each body according to the computed force.
+
+Given the fact that each *i*th body is acting on the *j*th body with force of the same magnitude as *j*th body acting on *i*th, but with the reversed direction, we can
+consider the initial problem as traversing the triangle shown below and calculating the force for each point in it.
+
+
+
+For effective parallel computations, we can follow the algorithm explained in
+[Parallel Programming with Cilk I](https://dspace.mit.edu/bitstream/handle/1721.1/122680/6-172-fall-2010/contents/projects/MIT6_172F10_proj4_1.pdf) paper.
+
+For parallelizing the computation on the triangle shown above, the algorithm splits it into two smaller triangles and one rectangle as it shown on the picture below:
+
+
+
+As it described in the paper, parallel computations can only be done on non-intersecting *i* and *j* indices. Hence, computations on two yellow triangles can
+be done in parallel and the computations of the blue rectangle can be only started after computations on them.
+
+Each sub-triangle can be divided using the same splitting algorithm for triangles.
+
+Computations on the rectangles can also be parallelized if we split each rectangle as it shown in the picture below:
+
+
+
+Computations on black sub-rectangles can be done in parallel as well as the gray ones.
+
+This problem can also be implemented using the proposed algorithm. Let's consider the triangle split first.
+Each task that is doing the computation on the triangle creates two subtasks `triangle_subtask1` and `triangle_subtask2` to
+do computations on the sub-triangles, and a subtask `rectangle_subtask` to do computation on the sub-rectangle.
+
+As it was mentioned above, only the sub-triangles can be computed in parallel, so a predecessor-successor relationship
+should be set between `triangle_subtask1` and `rectangle_subtask` and between `triangle_subtask2` and `rectangle_subtask`
+to execute the rectangle split only after doing computations on sub-triangles.
+
+Then each task that is doing the split should replace itself with the `rectangle_subtask` by transferring the successors
+to this subtask:
+
+
+
+Sub-tasks computing the triangle split are following the same logic.
+
+Each task computing the rectangle, creates 2 subtasks `left_lower_subtask` and `right_upper_subtask` to do computations
+on the black sub-rectangles on the picture above, and 2 subtasks `left_upper_subtask` and `right_lower_subtask` to do
+computations on the gray sub-rectangles.
+
+To ensure that the black and gray sub-rectangles are not computed in parallel, a predecessor-successor relationship should
+be set between each pair of black and gray sub-tasks.
+
+Similarly to the triangle split, the current task should also replace itself in the task graph with another task that
+synchronize the work. Since current API only allows transferring the successors to the single task, an empty task should be
+created as a synchronization point and the successors of the current task should be transferred to this task:
+
+
+
+```cpp
+class Body;
+void calculate_force(double& fx, double& fy, Body& body1, Body& body2);
+void add_force(Body* body, double fx, double fy);
+
+constexpr int threshold = 16;
+
+void serial_calculate_and_update_forces(int i0, int i1, int j0, int j1,
+ Body* bodies0)
+{
+ for (int i = i0; i < i1; ++i) {
+ for (int j = j0; j < j1; ++j) {
+ // update the force vector on bodies[i] exerted by bodies[j]
+ // and, symmetrically, the force vector on bodies[j] exerted
+ // by bodies[i]
+ if (i == j) continue;
+
+ double fx, fy;
+ calculate_force(&fx, &fy, bodies[i], bodies[j]);
+ add_force(&bodies[i], fx, fy);
+ add_force(&bodies[i], -fx, -fy);
+ }
+ }
+}
+
+// traverse the rectangle i0 <= i <= i1, j0 <= j <= J1
+void n_body_rectangle(tbb::task_group& tg,
+ int i0, int i1, int j0, int j1,
+ Body* bodies)
+{
+ int di = i1 - i0;
+ int dj = j1 - j0;
+ if (di <= threshold || dj <= threshold) {
+ serial_calculate_and_update_forces(i0, i1, j0, j1, bodies);
+ } else {
+ int im = i0 + dj / 2;
+ int jm = j0 + dj / 2;
+
+ tbb::task_handle left_lower_subtask = tg.defer([&tg, =] {
+ n_body_rectangle(tg, i0, im, j0, jm, bodies);
+ });
+
+ tbb::task_handle right_upper_subtask = tg.defer([&tg, =] {
+ n_body_rectangle(tg, im, i1, jm, j1, bodies);
+ });
+
+ tbb::task_handle left_upper_subtask = tg.defer([&tg, =] {
+ n_body_rectangle(tg, i0, im, jm, j1, bodies);
+ });
+
+ tbb::task_handle right_lower_subtask = tg.defer([&tg, =] {
+ n_body_rectangle(tg, im, i1, j0, jm, bodies);
+ });
+
+ tbb::task_handle sync = tg.defer([] {});
+
+ tbb::task_group::make_edge(left_lower_subtask, left_upper_subtask);
+ tbb::task_group::make_edge(left_lower_subtask, right_lower_subtask);
+ tbb::task_group::make_edge(right_upper_subtask, left_upper_subtask);
+ tbb::task_group::make_edge(right_upper_subtask, right_lower_subtask);
+
+ tbb::task_group::make_edge(left_upper_subtask, sync);
+ tbb::task_group::make_edge(right_lower_subtask, sync);
+
+ tbb::task_group::current_task::transfer_successors_to(sync);
+
+ tg.run(std::move(left_lower_subtask));
+ tg.run(std::move(right_upper_subtask));
+ tg.run(std::move(left_upper_subtask));
+ tg.run(std::move(right_lower_subtask));
+ tg.run(std::move(sync));
+ }
+}
+
+// traverse the triangle n0 <= i <= j <= n1
+void n_body_triangle(tbb::task_group& tg, int n0, int n1, Body* bodies) {
+ int dn = n1 - n0;
+ // If dn == 1, do nothing since a single body has no
+ // interaction with itself
+ if (dn != 1) {
+ int nm = n0 + dn / 2;
+ tbb::task_handle triangle_subtask1 = tg.defer([&tg, =] {
+ n_body_triangle(tg, n0, nm, bodies);
+ });
+
+ tbb::task_handle triangle_subtask2 = tg.defer([&tg, =] {
+ n_body_triangle(tg, nm, n1, bodies);
+ });
+
+ tbb::task_handle rectangle_subtask = tg.defer([&tg, =] {
+ n_body_rectangle(tg, n0, nm, nm, n1, bodies);
+ });
+
+ tbb::task_group::make_edge(triangle_subtask1, rectangle_subtask);
+ tbb::task_group::make_edge(triangle_subtask2, rectangle_subtask);
+
+ tbb::task_group::current_task::transfer_successors_to(rectangle_subtask);
+
+ tg.run(std::move(triangle_subtask1));
+ tg.run(std::move(triangle_subtask2));
+ tg.run(std::move(rectangle_subtask));
+ }
+}
+
+void calculate_forces(int n_bodies, Body* bodies) {
+ tbb::task_group tg;
+ tg.run_and_wait([&tg, n_bodies, bodies] {
+ n_body_triangle(tg, 0, n_bodies, bodies);
+ });
+}
+```
+
+### Wavefront
+
+Wavefront is a scientific programming pattern in which data elements are distributed on multidimensional grids. Elements should be computed in order
+because of dependencies between them.
+
+Let's consider a 2 dimensional wavefront example where the computation starts on the upper left corner of the grid:
+
+
+
+Computations of each element *(i,j)* in the grid can start when two dependent sub-elements are completed:
+* North dependency - computation of the element *(i, j - 1)* if present
+* West dependency - computation of the element *(i - 1, j)* if present
+
+Hence all antidiagonal elements in the grid can be computed in parallel due to lack of dependencies between them (showed as cells with the
+same colour on the picture).
+
+#### Non-recursive approach
+
+The easiest approach to implement the wavefront is to create all tasks for computations on the cells and set all of the dependencies between them
+before running the computation.
+
+
+
+```cpp
+void compute_cell(int i, int j);
+
+void non_recursive_wavefront(int in, int jn) {
+ tbb::task_group tg;
+
+ std::vector> cell_tasks;
+
+ cell_tasks.reserve(in);
+
+ for (int i = 0; i < in; ++i) {
+ cell_tasks[i].reserve(jn);
+
+ for (int j = 0; j < jn; ++j) {
+ cell_tasks[i].emplace_back(tg.defer([=] { compute_cell(i, j); }));
+
+ // north dependency
+ if (j != 0) { tbb::task_group::make_edge(cell_tasks[i, j - 1], cell_tasks[i][j]); }
+
+ // west dependency
+ if (i != 0) { tbb::task_group::make_edge(cell_tasks[i - 1][j], cell_tasks[i][j]); }
+ }
+ }
+
+ // Run the graph
+ for (int i = 0; i < in; ++i) {
+ for (int j = 0; j < jn; ++j) {
+ tg.run(std::move(cell_tasks[i][j]));
+ }
+ }
+}
+```
+
+#### Classic recursive approach
+
+Another approach is to implement a wavefront as a divide-and-conquer pattern with recursive splitting of the grid.
+
+In this case, the initial task would split the grid into 4 subtasks, each of which splits each of the sub-area into 4 subtasks, etc:
+
+
+
+In the "classic" recursive approach, each task creates 4 subtasks `north_subtask`, `west_subtask`, `east_subtask` and `south_subtask` and
+sets the following predecessor-successor dependencies:
+* Between `north_subtask` and `west_subtask`,
+* Between `north_subtask` and `east_subtask`,
+* Between `west_subtask` and `south_subtask`,
+* Between `east_subtask` and `south_subtask`.
+
+To support the structure of the task graph, currently executing task should replace itself with `south_subtask` by transferring the successors.
+
+```cpp
+void compute_cell(int i, int j);
+
+void serial_wavefront(int i0, int in, int j0, int jn) {
+ for (int i = i0; i < in; ++i) {
+ for (int j = j0; j < jn; ++j) {
+ compute_cell(i, j);
+ }
+ }
+}
+
+constexpr int threshold = 4;
+
+void classic_recursive_wavefront_task(tbb::task_group& tg, int i0, int in, int j0, int jn) {
+ int di = in - i0;
+ int dj = jn - j0;
+
+ if (di <= threshold || dj <= threshold) {
+ serial_wavefront(i0, in, j0, jn);
+ } else {
+ int im = i0 + di / 2;
+ int jm = j0 + dj / 2;
+
+ tbb::task_handle north_subtask = tg.defer([&tg, =] {
+ classic_recursive_wavefront_task(tg, i0, im, j0, jm);
+ });
+
+ tbb::task_handle west_subtask = tg.defer([&tg, =] {
+ classic_recursive_wavefront_task(tg, i0, im, jm, jn);
+ });
+
+ tbb::task_handle east_subtask = tg.defer([&tg, =] {
+ classic_recursive_wavefront_task(tg, im, in, j0, jn);
+ });
+
+ tbb::task_handle south_subtask = tg.defer([&tg, =] {
+ classic_recursive_wavefront_task(tg, im, in, jm, jn);
+ });
+
+ tbb::task_group::make_edge(north_subtask, west_subtask);
+ tbb::task_group::make_edge(north_subtask, east_subtask);
+ tbb::task_group::make_edge(west_subtask, south_subtask);
+ tbb::task_group::make_edge(east_subtask, south_subtask);
+
+ tbb::task_group::current_task::transfer_successors_to(south_subtask);
+
+ tg.run(std::move(north_subtask));
+ tg.run(std::move(west_subtask));
+ tg.run(std::move(east_subtask));
+ tg.run(std::move(south_subtask));
+ }
+}
+
+void classic_recursive_wavefront(int in, int jn) {
+ tbb::task_group tg;
+ tg.run_and_wait([&tg, =] {
+ classic_recursive_wavefront_task(tg, 0, in, 0, jn);
+ });
+}
+```
+
+The task graph for the classic approach is illustrated in the picture below:
+
+
+
+#### Eager recursive approach
+
+The main downside of the classic approach is setting an extra dependencies between tasks that are not actually depend on each other.
+
+E.g. in the task graph above, the task `21` would start executing only after the task `14` is completed, but algorithm-wise it only depends
+on the task `12` and should not wait until `14` to complete.
+
+The improved approach is described in the [Cache-Oblivious Wavefront](https://people.csail.mit.edu/yuantang/Docs/PPoPP15-yuantang.pdf) paper as eager recursion.
+
+In such a recursion each dividing task knows that it's dependent tasks would split as well and notifies them about the sub-tasks required to make only the real
+dependencies between tasks.
+
+Let's consider a wavefront calculations on the square grid ``[0, n)`` on ``x`` and ``y`` dimensions.
+
+Similarly to the classic algorithm, the eager one splits the calculated area into 4 sub-areas by creating 4
+subtasks - ``north_subtask``, ``wast_subtask``, ``east_subtask`` and ``south_subtask``.
+
+The algorithm also sets the same dependencies as a classic one:
+* ``west_subtask`` and ``east_subtask`` are successors of ``north_subtask``
+* ``south_subtask`` is a successor of ``west_subtask`` and ``east_subtask``.
+
+During the first division, no extra actions needed since the task that splits calculation of the
+whole grid ``[0, n))`` can't have successors.
+
+But during the second division - when the north, west, east and south are executed, each task needs to notify it's
+successors about the subtasks to make dependencies.
+
+Let's look into the second division in details. Initially, we have 4 subtasks created during the first division:
+
+
+
+The north subtask is executed first because dependencies prevents west, east and south tasks from executing.
+Similarly to the previous step, it creates the north, east, west and south subtasks:
+
+
+
+The second step is to notify the successors about the subtasks:
+* ``W`` should be notified about west and south subtasks (``NW`` and ``NS``),
+* ``E`` should be notified about east and south subtasks (``NE`` and ``NS``).
+
+Notifying the successor is made by creating ``task_tracker`` objects tracking the corresponding tasks and storing them
+in the container visible for the successors.
+
+North subtask ``NN`` can start execution and division after completing the split of the previous level and notifying
+the successors.
+
+Since ``N`` task is completed, ``W`` and ``E`` can start executing. ``W`` would also split itself into subtasks and
+notify ``S`` about the successors:
+
+
+
+Additionally, ``W`` should add dependencies between it's subtasks ``WN`` and ``WE`` and the subtasks of ``N`` created
+on the previous step (``NW`` and ``NS``):
+
+
+
+It is important to mention that at the moment of making additional edges, ``NW`` and ``NS`` are at least
+in `submitted` state and even can be completed.
+
+That highlights the importance of ability to make edges between tasks in any state and a task in created state.
+
+``E`` and ``S`` tasks are following the same logic as described above. That results in the following task graph:
+
+
+
+The splitting continues until the desired depth.
+
+The following code shows how it can be implemented using the API proposed:
+
+```cpp
+void compute_cell(std::size_t x, std::size_t y);
+
+void serial_wavefront(std::size_t x0, std::size_t xn, std::size_t y0, std::size_t yn) {
+ for (std::size_t x = x0; x < xn; ++x) {
+ for (std::size_t y = y0; y < yn; ++y) {
+ compute_cell(x, y);
+ }
+ }
+}
+
+class eager_wavefront_task {
+public:
+ static void prepare_predecessors_container(std::size_t num_divisions) {
+ // Reserve element in the vector for each level of division
+ predecessors_container.resize(num_divisions);
+
+ int num_subtasks = 4; // number of subtasks on the first level
+
+ for (std::size_t i = 0; i < num_divisions; ++i) {
+ predecessors_container[i].reserve(num_subtasks);
+ num_subtasks *= 4;
+ }
+ }
+
+ static tbb::task_tracker find_predecessor(std::size_t n_division, std::size_t x_index, std::size_t y_index) {
+ // Number of elements in the grid on each level of division
+ // On the first level - 4 subtasks total (2 on each dimension)
+ // on the second level - 16 subtasks total (4 on each dimension), etc.
+ std::size_t n_grid = (2 << n_division);
+
+ auto it = predecessors[n_division].find(x_index * n_grid + y_index);
+ return it != predecessors[n_division].end() ? it->second : tbb::task_tracker{};
+ }
+
+ static void publish_predecessor(std::size_t n_division, std::size_t x_index, std::size_t y_index,
+ tbb::task_tracker&& pred)
+ {
+ std::size_t n_grid = (2 << n_division>>);
+
+ [[maybe_unused]] auto it = predecessors[n_division].emplace(x_index * n_grid + y_index, std::move(pred));
+ assert(it != predecessors[n_division].end());
+ }
+
+ eager_wavefront_task(tbb::task_group& tg, std::size_t x_index, std::size_t y_index,
+ std::size_t x0, std::size_t xn, std::size_t y0, std::size_t yn,
+ std::size_t n_div)
+ : m_tg(tg), m_x_index(x_index), m_y_index(y_index)
+ , m_x0(x0), m_xn(xn), m_y0(y0), m_yn(yn)
+ , m_n_division(n_div)
+ {}
+
+ void operator()() const {
+ std::size_t x_size = m_xn - m_x0;
+ std::size_t y_size = m_yn - m_y0;
+
+ // Do serial wavefront if the grainsize reached
+ if (i_size <= wavefront_grainsize) {
+ assert(j_size <= wavefront_grainsize);
+ serial_wavefront(m_i0, m_in, m_j0, m_jn);
+ } else {
+ std::size_t x_mid = m_x0 + x_size / 2;
+ std::size_t y_mid = m_y0 + y_size / 2;
+
+ // Calculate indices of subtasks in the next level grid
+ std::size_t north_x_index = m_x0 / (x_mid - m_x0);
+ std::size_t north_y_index = m_y0 / (y_mid - m_y0);
+
+ std::size_t west_x_index = north_x_index;
+ std::size_t west_y_index = north_y_index + 1;
+
+ std::size_t east_x_index = north_x_index + 1;
+ std::size_t east_y_index = north_y_index;
+
+ std::size_t south_x_index = east_x_index;
+ std::size_t south_y_index = west_y_index;
+
+ // Create subtasks
+ tbb::task_handle north_subtask = tg.defer(eager_wavefront_task(m_tg,
+ /*indices in the next level grid = */north_x_index, north_y_index,
+ /*area to process = */m_x0, x_mid, m_y0, y_mid,
+ /*n_division = */m_n_division + 1));
+
+ tbb::task_handle west_subtask = tg.defer(eager_wavefront_task(m_tg,
+ /*indices in the next level grid = */west_x_index, west_y_index,
+ /*area to process = */m_x0, x_mid, y_mid, m_yn,
+ /*n_division = */m_n_division + 1));
+
+ tbb::task_handle east_subtask = tg.defer(eager_wavefront_task(m_tg,
+ /*indices in the next level grid = */east_x_index, east_y_index,
+ /*area to process = */x_mid, m_xn, m_y0, y_mid,
+ /*n_division = */m_n_division + 1));
+
+ tbb::task_handle south_subtask = tg.defer(eager_wavefront_task(m_tg,
+ /*indices in the next level grid = */south_x_index, south_y_index,
+ /*area to process = */x_mid, m_xn, y_mid, m_yn,
+ /*n_division = */m_n_division + 1));
+
+ // Make dependencies between subtasks
+ tbb::task_group::make_edge(north_subtask, west_subtask);
+ tbb::task_group::make_edge(north_subtask, east_subtask);
+ tbb::task_group::make_edge(west_subtask, south_subtask);
+ tbb::task_group::make_edge(east_subtask, south_subtask);
+
+ // Add extra dependencies with predecessor's subtasks
+ tbb::task_tracker west_predecessor_east_subtask;
+ tbb::task_tracker west_predecessor_south_subtask;
+ tbb::task_tracker north_predecessor_west_subtask;
+ tbb::task_tracker north_predecessor_south_subtask;
+
+ if (north_x_index != 0) {
+ west_predecessor_east_subtask = find_predecessor(m_n_division + 1, north_x_index - 1, north_y_index);
+ }
+ if (west_x_index != 0) {
+ west_predecessor_south_subtask = find_predecessor(m_n_division + 1, west_x_index - 1, west_y_index);
+ }
+ if (north_y_index != 0) {
+ north_predecessor_west_subtask = find_predecessor(m_n_division + 1, north_x_index, north_y_index - 1);
+ }
+ if (east_y_index != 0) {
+ north_predecessor_south_subtask = find_predecessor(m_n_division + 1, east_x_index, east_y_index - 1);
+ }
+
+ if (west_predecessor_east_subtask) {
+ tbb::task_group::make_edge(west_predecessor_east_subtask, north_subtask);
+ }
+ if (west_predecessor_south_subtask) {
+ tbb::task_group::make_edge(west_predecessor_south_subtask, west_subtask);
+ }
+ if (north_predecessor_west_subtask) {
+ tbb::task_group::make_edge(north_predecessor_west_subtask, north_subtask);
+ }
+ if (north_predecessor_south_subtask) {
+ tbb::task_group::make_edge(north_predecessor_south_subtask, east_subtask);
+ }
+
+ // Save trackers to subtasks for future publishing
+ tbb::task_tracker north_subtask_tracker = north_subtask;
+ tbb::task_tracker west_subtask_tracker = west_subtask;
+ tbb::task_tracker east_subtask_tracker = east_subtask;
+ tbb::task_tracker south_subtask_tracker = south_subtask;
+
+ // Run subtasks
+ tg.run(std::move(north_subtask));
+ tg.run(std::move(west_subtask));
+ tg.run(std::move(east_subtask));
+ tg.run(std::move(south_subtask));
+
+ // Publish subtasks for successors
+ publish_predecessor(m_n_division + 1, north_x_index, north_y_index, std::move(north_subtask_tracker));
+ publish_predecessor(m_n_division + 1, west_x_index, west_y_index, std::move(west_subtask_tracker));
+ publish_predecessor(m_n_division + 1, east_x_index, east_y_index, std::move(east_subtask_tracker));
+ publish_predecessor(m_n_division + 1, south_x_index, south_y_index, std::move(south_subtask_tracker));
+ }
+ }
+
+private:
+ static constexpr std::size_t wavefront_grainsize = 5;
+
+ // Each element e[i] in the vector represents a map of additional predecessors on the i-th level of division
+ // Each element in the hash table maps linearized index in the grid (x_index * n + y_index) with the
+ // tracker to the corresponding task
+ using predecessors_container_type = std::vector>;
+
+ static predecessors_container_type predecessors;
+
+ tbb::task_group& m_tg;
+
+ // Indices in the grid of current level of division
+ // On the first division level indices are [0, 2) on x and y
+ // On the second division level indices are [0, 4) on x and y, etc.
+ std::size_t m_x_index;
+ std::size_t m_y_index;
+
+ // Begin and end indices of the globally computed area
+ std::size_t m_x0;
+ std::size_t m_xn;
+ std::size_t m_y0;
+ std::size_t m_yn;
+
+ // Division number
+ std::size_t m_n_division;
+}; // class eager_wavefront_task
+
+void eager_wavefront(std::size_t n) {
+ tbb::task_group tg;
+
+ std::size_t num_divisions = log2(n / eager_wavefront_task::wavefront_grainsize) + 1;
+ eager_wavefront_task::prepare_predecessors_container(num_divisions);
+
+ tg.run_and_wait(eager_wavefront_task(tg,
+ /*x_index = */0, /*y_index = */0,
+ /*area to process = */0, n, 0, n,
+ /*n_division = */0));
+}
+```
+
+#### Combination of eager and classic approaches
+
+If we consider a combination of two approaches described above:
+1. First and second-level eager splits
+2. Classic split of each subtask created during the second-level split
+
+The split algorithm is illustrated in the picture below:
+
+
+
+As it is described in the eager wavefront algorithm section above, on the "Eager 2" stage, the dependencies between
+blue and red subtasks are created using the `task_tracker` to the blue subtasks. The blue subtasks can be executed
+during making this dependency (purple arrows on the picture).
+
+During the classic split, each subtask of "Eager 2" replaces itself in the task graph with the south classic subtask
+using transferring the successors.
+
+Because the red subtasks on "Eager 2" knows only a tracker to the original blue subtasks (before transferring
+the successors) and because blue subtasks can be executed before the edge between blue and red subtasks are created,
+red subtask can use a ``task_tracker`` to the originally created blue task *after* this task replaces itself in the
+graph with the "Classic" subtask (by transferring it's successors).
+
+Hence, to create a correct edge on "Eager 2" level, all of the edges added to ``task_tracker`` that is associated
+with the task that transferred it's successors, should be added to the task that receives the successors.
+To maintain such a behavior, the "merged successors tracking" approach, described in the
+[section above](#semantics-for-transferring-successors-from-the-currently-executing-task-to-the-other-task)
+should be implemented.
+
+#### File Parser
+
+File Parser is an example emulating the XML or C++ headers parser. Consider having a set of files *file1...fileN*
+each of them can include one or several files from the set. We assume that include paths cannot have loops.
+
+The parsing starts from one file *fileN*.
+
+Processing of each file is done in two stages:
+* *parsing* the file - includes opening and reading symbols from the file, determining the list of includes and their processing
+* *finalizing* the file - writing the result of processing in the resulting file and other finalize actions
+
+Consider the following example:
+
+
+
+The parsing here starts from *File 5*. Arrows indicates includes for each file.
+
+Serial implementation of example is shown below:
+
+```cpp
+struct FileInfo {
+ using map_type = std::unordered_map;
+ static map_type fileMap;
+
+ FileInfo(std::string name) : m_name(name) {}
+
+ std::string m_name;
+ std::vector m_includes;
+
+ void read();
+ void write();
+
+ FileInfo* try_process() {
+ auto [it, b] = fileMap.emplace(m_name, this);
+ if (b) process();
+ return it->second;
+ }
+
+ void process() {
+ parse();
+ finalize();
+ }
+
+ void parse() {
+ read(); // fills m_includes
+ for (auto& i : m_includes) {
+ FileInfo* maybe = new FileInfo(i);
+ FileInfo* actual = maybe->try_process();
+
+ // i was parsed previously
+ if (maybe != actual) delete maybe;
+ }
+ }
+
+ void finalize() { write(); }
+};
+
+int main() {
+ FileInfo* info = new FileInfo("File 5");
+ info->try_process();
+ delete info;
+}
+```
+
+The class `FileInfo` describes a processing of each file in the system. `fileMap` is a shared hash map that maps
+the name of the file to the `FileInfo` representing it.
+
+Fields `m_name` and `m_includes` indicate the name of the file and the list of includes respectfully.
+
+The function `read()` opens the file, reads the content, determines the list of includes and fills the `m_includes` vector.
+
+Function `write()` opens the result file and writes the output into it (finalization stage).
+
+`try_process()` tries to add current `FileInfo` into a shared map. If the insertion succeeds, it runs processing the file.
+Otherwise, the file was already processed and the function returns the info from the shared map.
+
+Since the implementation is serial, the `process()` function is doing both `parse()` and `finalize()` stages.
+
+`parse()` stage reads the file, traverses the list of includes and runs processing of each include.
+
+The code below shows how the implementation can be modified to be parallel using the API described in this document:
+
+```cpp
+struct FileInfo {
+ using map_type = tbb::concurrent_unordered_map;
+ static map_type fileMap;
+
+ std::string m_name;
+ std::vector m_includes;
+
+ tbb::task_group& m_tg;
+ tbb::task_handle m_parse_task;
+ tbb::task_tracker m_tracker;
+
+ FileInfo(std::string n, tbb::task_group& tg)
+ : m_name(n), m_tg(tg)
+ {
+ m_tracker = m_parse_task = m_tg.defer([this] { parse(); });
+ }
+
+ void read();
+ void write();
+
+ FileInfo* try_process() {
+ auto [it, b] = fileMap.emplace(m_name, this);
+ if (b) process();
+ return it->second;
+ }
+
+ void process() {
+ m_tg.run(std::move(m_parse_task));
+ }
+
+ void parse() {
+ read();
+
+ tbb::task_handle finalize_task = m_tg.defer([this] { finalize(); });
+
+ for (auto& i : m_includes) {
+ FileInfo* maybe = new FileInfo(i);
+ FileInfo* actual = maybe->try_process(); // runs parse task for include
+
+ tbb::task_group::make_edge(actual->m_tracker, finalize_task);
+
+ // i was parsed previously
+ if (maybe != actual) delete maybe;
+ }
+
+ tbb::task_group::current_task::transfer_successors_to(finalize_task);
+ m_tg.run(std::move(finalize_task));
+ }
+
+ void finalize() { write(); }
+};
+
+int main() {
+ tbb::task_group tg;
+ FileInfo* info = new FileInfo("File 5", tg);
+ info->try_process();
+ tg.wait();
+ delete info;
+}
+```
+
+The type of the `map_type` is modified to be `concurrent_unordered_map` to allow concurrent insertion and lookup of `FileInfo`s during
+processing.
+
+Each `FileInfo` contains additional reference to `task_group`, a `task_handle` owning the parsing task and `task_tracker` initially set
+to track parsing task.
+
+`process()` function submits the task owned by `m_parse_task` for execution.
+
+`parse()` function reads the file, creates a task for finalization, transfers the list of dependencies and creates an edge between the
+corresponding `task_tracker` and the finalization task.
+
+After creating all of the edges, the parsing task replaces itself in the task graph with finalize_task by
+calling `transfer_successors_to(finalize_task)`.
+
+Since `maybe->try_process()` runs the parsing task for execution, at the moment of creating the edge, the task may be in one of the following
+states - `submitted`, `executing` or `completed`.
+
+The parser task may also transfer it's successors to the finalize task, so `make_edge` should add a successor to the finalize task that
+illustrates the necessity of merged successors tracking after transferring since `m_tracker` was initially set to track the parsing task.
+
+Let's consider the task and edges while parsing the *File 5* from the picture above.
+
+Processing the *File 5* would create a finalize task `f5f`, two tasks parsing tasks for each include - `f4p` and `f3p`:
+
+
+
+`f4p` and `f3p` are at least in `submitted` state when the edge with `f5f` is created.
+
+Let's assume that the task `f3p` is executed next. It creates the finalize task `f3f` and two tasks for includes - `f2p` and `f1p`:
+
+
+
+After creating the edges between includes and `f3f` are added, `f3p` transfers it's successors to `f3f` and destroys:
+
+
+
+Assume that `f4p` is taken for execution next. It creates finalize task `f4f` and should create edges for with *File 3* and *File 2*.
+
+Task for *File 2* was already created when parsing *File 3*, so the tracker to `f2p` would be found in the `fileMap`.
+
+The tracker for *File 3* in the `fileMap` was initially set to track `f3p`, but it have already transferred it's successors to `f3f`, so
+new edges needed to be added to `f3f`, but using the tracker to `f3p`:
+
+
+
+Similar to `f3p`, `f4p` transfers it's successor `f5f` to finalize task `f4f`:
+
+
+
+When `f2p` is executed, it would create a finalize task `f2f`, connects it to `f1p` and transfers it's successors to `f2f`:
+
+
+
+The last task to be executed is `f1p` that just replaces itself with the finalize task since there are no dependencies:
+
+
+
+After all of the dependencies are made, *File 1* is finalized, than the *File 2*, *File 3*, *File 4* and *File 5*
+
+## API proposal summary
+
+```cpp
+namespace oneapi {
+namespace tbb {
+
+// Existing API
+class task_handle;
+
+// New APIs
+class task_tracker {
+public:
+ task_tracker();
+
+ task_tracker(const task_tracker& other);
+ task_tracker(task_tracker&& other);
+
+ task_tracker(const task_handle& handle);
+
+ ~task_tracker();
+
+ task_tracker& operator=(const task_tracker& other);
+ task_tracker& operator=(task_tracker&& other);
+
+ task_tracker& operator=(const task_handle& handle);
+
+ explicit operator bool() const noexcept;
+}; // class task_tracker
+
+bool operator==(const task_tracker& t, std::nullptr_t) noexcept;
+bool operator==(std::nullptr_t, const task_tracker& t) noexcept;
+
+bool operator!=(const task_tracker& t, std::nullptr_t) noexcept;
+bool operator!=(std::nullptr_t, const task_tracker& t) noexcept;
+
+bool operator==(const task_tracker& t, const task_tracker& rhs) noexcept;
+bool operator!=(const task_tracker& t, const task_tracker& rhs) noexcept;
+
+class task_group {
+ static void make_edge(task_handle& pred, task_handle& succ);
+ static void make_edge(task_tracker& pred, task_handle& succ);
+
+ struct current_task {
+ static void transfer_successors_to(task_handle& h);
+ };
+}; // class task_group
+
+} // namespace tbb
+} // namespace oneapi
+```
+
+### ``task_tracker`` class
+
+#### Construction, destruction and assignment
+
+``task_tracker()``
+
+Creates an empty ``task_tracker`` object that does not track any task.
+
+``task_tracker(const task_tracker& other)``
+
+Copies ``other`` to ``*this``. After this, ``*this`` and ``other`` track the same task.
+
+``task_tracker(task_tracker&& other)``
+
+Moves ``other`` to ``*this``. After this, ``*this`` tracks the task previously tracked by ``other``.
+Other is left in empty state.
+
+``task_tracker(const task_handle& handle)``
+
+Creates ``task_tracker`` object that tracks the task owned by ``handle``.
+
+``~task_tracker()``
+
+Destroys the ``task_tracker`` object.
+
+``task_tracker& operator=(const task_tracker& other)``
+
+Replaces the task tracked by ``*this`` to be a task tracked by ``other``.
+After this, ``*this`` and ``other`` track the same task.
+
+Returns a reference to ``*this``.
+
+``task_tracker& operator=(task_tracker&& other)``
+
+Replaces the task tracked by ``*this`` to be a task tracked by ``other``.
+After this, ``*this`` tracks the task previously tracked by ``other``.
+Other is left in empty state.
+
+Returns a reference to ``*this``.
+
+``task_tracker& operator=(const task_handle& handle)``
+
+Replaces the task tracked by ``*this`` to be a task owned by ``handle``.
+
+Returns a reference to ``*this``.
+
+#### Observers
+
+``explicit operator bool() const noexcept``
+
+Checks if `this`` tracks any task object.
+
+Returns ``true`` if ``*this`` is a non-empty tracker, ``false`` otherwise.
+
+#### Comparison operators
+
+``bool operator==(const task_tracker& t, std::nullptr_t) noexcept``
+
+Returns ``true`` if ``t`` is a non-empty tracker, ``false`` otherwise.
+
+``bool operator==(std::nullptr_t, const task_tracker& t) noexcept``
+
+Equivalent to ``t == nullptr``.
+
+``bool operator!=(const task_tracker& t, std::nullptr_t) noexcept``
+
+Equivalent to ``!(t == nullptr)``.
+
+``bool operator!=(std::nullptr_t, const task_tracker& t) noexcept``
+
+Equivalent to ``!(t == nullptr)``.
+
+``bool operator==(const task_tracker& lhs, const task_tracker& rhs) noexcept``
+
+Returns ``true`` if ``lhs`` tracks the same task as ``rhs``, ``false`` otherwise.
+
+``bool operator!=(const task_tracker& lhs, const task_tracker& rhs) noexcept``
+
+Equivalent to ``!(lhs == rhs)``.
+
+### Member functions of ``task_group`` class
+
+``static void task_group::make_edge(task_handle& pred, task_handle& succ)``
+
+Registers the task owned by ``pred`` to be a predecessor that must complete before the task owned
+by ``succ`` can start executing.
+
+The behavior is undefined in the following cases:
+* ``pred`` is an empty ``task_handle``, or ``succ`` is an empty ``task_handle``,
+* If that task handled by ``pred`` and the task handled by ``succ`` belongs to different ``task_group``s.
+
+It is safe to add multiple predecessors to the same successor and add the same predecessor for multiple successor tasks.
+
+It is safe to add successors to the task that currently transfers it's successors to another task and
+to the task to which the successors are transferred.
+
+``static void task_group::make_edge(task_tracker& pred, task_handle& succ)``
+
+Registers the task tracked by ``pred`` to be a predecessor that must complete before the task owned
+by ``succ`` can start executing.
+
+The behavior is undefined in the following cases:
+* ``pred`` is an empty ``task_tracker``, or ``succ`` is an empty ``task_handle``,
+* If that task tracked by ``pred`` and the task handled by ``succ`` belongs to different ``task_group``s.
+
+It is safe to add multiple predecessors to the same successor and add the same predecessor for multiple successor tasks.
+
+It is safe to add successors to the task that currently transfers it's successors to another task and
+to the task to which the successors are transferred.
+
+### Member functions of ``task_group::current_task``
+
+``static void transfer_successors_to(task_handle& h)``
+
+Transfers all of the successors of the currently executing task to the task handled by ``h``.
+
+The behavior is undefined in any of the following cases:
+* ``h`` is an empty ``task_handle``,
+* The function is called outside of the ``task_group`` task body,
+* The currently executed task and the task handled by ``h`` belongs to different ``task_group``s.
+
+It is safe to transfer successors to the task while adding the successors to ``h`` or while other threads
+are adding successors to the currently executed task.
+
+It is guaranteed that in any case the successors added to the current task would appear as a successors of ``h`` even if
+they are added after calling the ``transfer_successors_to``.
+
+## Alternative approaches
+
+The alternative approaches are to keep only the ``task_handle`` as the only way to track the task, set the
+dependencies and submit the task for execution.
+
+### ``task_handle`` as a unique owner
+
+The first option is to keep the ``task_handle`` to be a unique owner of the task in various states and to allow
+to use non-empty handle for setting or transferring the dependencies.
+
+Since the current API allows submitting the ``task_handle`` for execution only as rvalue, having any usage of ``task_handle`` object after submitting for execution
+(e.g. using ``task_group::run(std::move(task_handle))``) looks misleading even if some guarantees are provided for the referred handle object.
+
+To handle this, it would be needed to extend all functions that take the task handled by the ``task_handle``
+with the new overload taking an non-const lvalue reference and provide the following guarantees:
+* Overloads accepting rvalue reference to ``task_handle`` take a non-empty handle and leave the handle in an empty state in the end (current behavior is preserved).
+* New overloads accepting lvalue references to ``task_handle`` also take a non-empty handle object but does not leave it in an empty state after submission. Hence, the ``task_handle`` can be
+ used after execution of the method to represent a task in submitted, executing or completed state and to set the dependencies on such tasks.
+ Using such a task handle once again as an argument to the submission function results in undefined behavior.
+
+Extension for all of the submission functions would be required:
+* ``task_group::run`` and ``task_group::run_and_wait``
+* ``task_arena::enqueue`` and ``this_task_arena::enqueue``
+
+Also, the submission functions would work only with the handles of the tasks in created states.
+Submitting the ``task_handle`` handling tasks in any other state results in undefined behavior.
+
+When the ``task_group`` preview extensions are enabled, returning a non-empty ``task_handle`` handling a task
+in the state other than created results in undefined behavior.
+
+### ``task_handle`` as a shared owner
+
+Another approach is to have ``task_handle`` to be a shared owner on the task allowing multiple instances of
+``task_handle`` to co-own the task. But since the task can only be submitted for execution once, using a
+``task_handle`` as an argument to one of the submission functions would invalidate all copies or set them
+in the "weak" state that allows only to set dependencies between tasks.
+
+## Exit criteria & open questions:
+* Are concrete names of APIs good enough and reflects the purpose of the methods?
+* The performance targets for this feature should be defined
+* API improvements and enhancements should be considered (may be a criteria for moving the feature to `supported`):
+ * Should comparison functions between ``task_tracker`` and ``task_handle`` be defined?
+ * Should ``task_tracker`` tracking the task in created or submitted state (not always) be allowed as a
+ successor in ``make_edge``? See [separate section](#using-a-task_tracker-as-a-successor) for more details.
+ * Should ``empty_task`` helper API be provided to optimize creating and spawning the empty sync-only tasks. See
+ [separate section](#empty_task-api) for more details.