fix: support repeat policy for nested DAGs #1022
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes issue #1013 where RepeatPolicy did not work properly with nested DAGs (sub-workflows). The bug caused repeated runs to fail after the first execution, with sub-workflows remaining in "not started" status and creating zombie processes.
Changes
repeated
flag toGenerateChildDAGRunID()
that includes randomness when generating IDs for repeated nested DAG runs, preventing ID collisionsRepeated
field toNodeState
to identify when a step is being repeatedlimit
field in RepeatPolicy to cap the maximum number of executionsWhy
The root cause was that repeated nested DAG runs were generating the same run ID on subsequent executions, causing conflicts in the execution tracking system. By adding randomness to the run ID generation for repeated steps, each execution gets a unique identifier, allowing proper tracking and preventing zombie processes.
The addition of the repeat limit feature provides users with better control over long-running repeat policies, preventing infinite loops and resource exhaustion.
Reported-by: jeremydelattre59
Github-Issue: #1013