8000 [FIX] Error when using different primary container name in driver and executor by machichima · Pull Request #6363 · flyteorg/flyte · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[FIX] Error when using different primary container name in driver and executor #6363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

machichima
Copy link
Member
@machichima machichima commented Mar 22, 2025

Tracking issue

Why are the changes needed?

There's a bug in Spark after this PR: #6262. For example, When setting primary container name in driver pod named "driver-primary", the error of invalid TaskSpecification, container [driver-primary] not defined is raised.

image

What changes were proposed in this pull request?

Use MergeBasePodSpecOntoTemplate that enables setting primaryContainerName and primaryInitContainerName.

How was this patch tested?

Unittest is added to check if setting primary container name that's not "primary" in driver and executor, spark can perform as expected.

Labels

Please add one or more of the following labels to categorize your PR:

  • added: For new features.
  • changed: For changes in existing functionality.
  • deprecated: For soon-to-be-removed features.
  • removed: For features being removed.
  • fixed: For any bug fixed.
  • security: In case of vulnerabilities

This is important to improve the readability of release notes.

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Summary by Bito

This PR fixes a critical bug related to different primary container names in Spark driver and executor pods causing 'container not defined' errors. It refactors MergeBasePodSpecOntoTemplate to improve conditional logic, adds an else branch for container name comparisons, and consolidates init container merging procedures.

Unit tests added: True

Estimated effort to review (1-5, lower is better): 2

Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
@flyte-bot
Copy link
Collaborator
flyte-bot commented Mar 22, 2025

Code Review Agent Run #6af42f

Actionable Suggestions - 0
Review Details
  • Files reviewed - 4 · Commit Range: 3bc7844..26f210a
    • flyteplugins/go/tasks/pluginmachinery/flytek8s/pod_helper.go
    • flyteplugins/go/tasks/pluginmachinery/flytek8s/pod_helper_test.go
    • flyteplugins/go/tasks/plugins/k8s/spark/spark.go
    • flyteplugins/go/tasks/plugins/k8s/spark/spark_test.go
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

Refer to the documentation for additional commands.

Configuration

This repository uses code_review_bito You can customize the agent settings here or contact your Bito workspace admin at haytham@union.ai.

Documentation & Help

AI Code Review powered by Bito Logo

Copy link
codecov bot commented Mar 22, 2025

Codecov Report

Attention: Patch coverage is 69.69697% with 10 lines in your changes missing coverage. Please review.

Project coverage is 58.49%. Comparing base (2e67b23) to head (b127658).
Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
...ns/go/tasks/pluginmachinery/flytek8s/pod_helper.go 60.00% 8 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6363      +/-   ##
==========================================
+ Coverage   58.48%   58.49%   +0.01%     
==========================================
  Files         940      940              
  Lines       71567    71571       +4     
==========================================
+ Hits        41857    41868      +11     
+ Misses      26526    26521       -5     
+ Partials     3184     3182       -2     
Flag Coverage Δ
unittests-datacatalog 59.03% <ø> (ø)
unittests-flyteadmin 56.28% <ø> (ø)
unittests-flytecopilot 30.99% <ø> (ø)
unittests-flytectl 64.70% <ø> (ø)
unittests-flyteidl 76.12% <ø> (ø)
unittests-flyteplugins 60.94% <69.69%> (+0.05%) ⬆️
unittests-flytepropeller 54.81% <ø> (ø)
unittests-flytestdlib 64.04% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@flyte-bot
Copy link
Collaborator
flyte-bot commented Mar 22, 2025

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
Bug Fix - Primary Container Name Bug Fix

pod_helper.go - Renamed the constant for the primary init container and updated merge logic to correctly match container names during pod spec merging.

spark.go - Replaced the outdated MergeOverlay function with MergeBasePodSpecOntoTemplate to properly handle different primary container names in driver and executor pods.

Testing - Enhanced Testing for Container Names

pod_helper_test.go - Adjusted tests to use the updated constant name ensuring the test expectations align with the code changes.

spark_test.go - Added comprehensive unit tests to verify that custom primary container names for driver and executor are handled correctly, with detailed checks on environmental variables and resource configurations.

if templateContainer.Name != container.Name {
continue
}
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems okay to me. Can we have the same behavior for init containers to keep it consistent?

@@ -35,7 +35,7 @@ const unsignedSIGKILL = 247
const defaultContainerTemplateName = "default"
const defaultInitContainerTemplateName = "default-init"
const primaryContainerTemplateName = "primary"
const primaryInitContainerTemplateName = "primary-init"
const PrimaryInitContainerTemplateName = "primary-init"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are reserved container names, I don't think they should be exported and used in the spark plugin logic

@Sovietaced
Copy link
Contributor

There's a bug in Spark after this PR: #6262. For example, When setting primary container name in driver pod named "driver-primary", the error of invalid TaskSpecification, container [driver-primary] not defined is raised.

I'm not sure I understand the issue. The way that the overlay is intended to work is that it matches on container names and does not honor any magic container names.

Are you unable to specify the custom driver and selector pod specs with container names that match the primary driver or executor containers? If those container names are not deterministic and you wanted to support some magic values like primary-driver or primary-executor you could just handle that in the plugin itself and dynamically rename the custom pod specs and use the overlay method. Or the plugin could be modified so that the containers for the driver and executor are deterministic like we do in the Ray plugin (ray-head, ray-worker).

What are you actually trying to do with the custom pod specs?

@machichima
Copy link
Member Author
machichima commented Mar 23, 2025

In Spark plugin, user can set driver and executor pod with PodTemplate, and inside PodTemplate, there's an option to set primary_container_name. This PR is actually fixing that if user set primary_container_name different than "primary", than we should then use the name they set as the primary container name.

Using magic value like primary-driver or primary-executor works too, but as we used PodTemplate, I think we can keep this flexibility of setting the primary_container_name user want.

Also, as only setting primary_container_name is supported, I need to pass the reserved init container name (PrimaryInitContainerTemplateName) to function MergeBasePodSpecOntoTemplate

Example,

@task(
    task_config=Spark(
        driver_pod=PodTemplate(...),
    
8000
    executor_pod=PodTemplate(...),
        ...
    ),
    ...
)

Signed-off-by: machichima <nary12321@gmail.com>
@flyte-bot
Copy link
Collaborator
flyte-bot commented Mar 23, 2025

Code Review Agent Run #1adb75

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: 26f210a..b127658
    • flyteplugins/go/tasks/pluginmachinery/flytek8s/pod_helper.go
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

Refer to the documentation for additional commands.

Configuration

This repository uses code_review_bito You can customize the agent settings here or contact your Bito workspace admin at haytham@union.ai.

Documentation & Help

AI Code Review powered by Bito Logo

@Sovietaced
Copy link
Contributor

This PR is actually fixing that if user set primary_container_name different than "primary", than we should then use the name they set as the primary container name.

Why do you need to set the primary container name in the pod template at all? If the goal is to customize the driver or executor pods all you need to do is pass the pod spec with a container name that matches the target container name.

@machichima
Copy link
Member Author

Why do you need to set the primary container name in the pod template at all? If the goal is to customize the driver or executor pods all you need to do is pass the pod spec with a container name that matches the target container name.

Yes, we don’t necessarily need to set the primary_container_name here. However, since Spark already uses a PodTemplate for specifying the driver_pod and executor_pod, which provides a field for users to specify the primary_container_name, I think we could allow users to set it. If they don’t specify it, it would simply default to 'primary', and user can pass pod spec with a container name 'primary'.

It’s more of an additional feature, not strictly necessary, but an option for flexibility.

@Sovietaced
Copy link
Contributor

Yes, we don’t necessarily need to set the primary_container_name here. However, since Spark already uses a PodTemplate for specifying the driver_pod and executor_pod, which provides a field for users to specify the primary_container_name, I think we could allow users to set it. If they don’t specify it, it would simply default to 'primary', and user can pass pod spec with a container name 'primary'.

That makes me think, what is the value in being able to specify the primary_container_name at all. I'm not sure it's relevant in this use case.

But still I'm curious what is your end goal with being able to configure it? What do you expect it to do that simply configuring a container in the pod spec to match the existing driver or executor container name won't do?

@Sovietaced
Copy link
Contributor

What do you think @pingsutw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0