8000 Add default Ray node label info to Ray Pod environment by ryanaoleary · Pull Request #3699 · ray-project/kuberay · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add default Ray node label info to Ray Pod environment #3699

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

ryanaoleary
Copy link
Contributor

Add default Ray node label info to Ray Pod env 8000 ironment

Why are these changes needed?

This PR adds market-type information for different cloud providers to the Ray pod environment based on the provided nodeSelector value. This PR also adds environment variables to pass region and zone information using downward API (kubernetes/kubernetes#127092). These environment variables will be used in Ray core to set default Ray node labels.

I'll add a comment below with my manual test results with propagating topology.k8s.io/region and topology.k8s.io/zone on a GKE v1.33 alpha cluster.

Related issue number

ray-project/ray#51564

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@ryanaoleary ryanaoleary self-assigned this May 27, 2025
@ryanaoleary
Copy link
Contributor Author

cc: @MengjinYan

@@ -493,6 +493,9 @@ func BuildPod(ctx context.Context, podTemplateSpec corev1.PodTemplateSpec, rayNo
initLivenessAndReadinessProbe(&pod.Spec.Containers[utils.RayContainerIndex], rayNodeType, creatorCRDType)
}

// add downward API environment variables for Ray default node labels
addDefaultRayNodeLabels(&pod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we guard this logic with a Ray version check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code doesn't rely on any API change from Ray, it just sets some env vars and the actual node labels get set in Ray core using those vars, but I can add a version guard here (I guess for whatever version ray-project/ray#53360 is included in) if we don't want it setting any unused vars for users on older versions of Ray.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From offline discussion with @MengjinYan we were leaning towards not including a version guard, since users are not required to specify the Ray version they're using in the CR spec

Copy link
@MengjinYan MengjinYan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LTGM @kevin85421 Can take a look from Kuberay's perspective?

pod.Spec.Containers[utils.RayContainerIndex].Env,
// used to set the ray.io/market-type node label
corev1.EnvVar{
Name: "RAY_NODE_MARKET_TYPE",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan for Ray Core to check these env vars? Can you link the PR that includes this change?

Copy link
Contributor Author

Yeah that's correct - this is the related PR: ray-project/ray#53360

@ryanaoleary ryanaoleary requested a review from andrewsykim June 3, 2025 22:03
@ryanaoleary
Copy link
Contributor Author

LTGM @kevin85421 Can take a look from Kuberay's perspective?

@kevin85421 Bumping this to see if we can include it in 1.4

@kevin85421
Copy link
Member

@ryanaoleary I chatted with @MengjinYan, and my understanding is that this doesn’t need to be included in v1.4.0. Could you sync with @MengjinYan and let me know if I’m mistaken? Thanks!

@ryanaoleary
Copy link
Contributor Author

ding is that this doesn’t need to be included in v1.4.0. Could you sync with @MengjinYan and let me know if I’m mistaken? Thanks!

Synced offline with @MengjinYan and yeah there's no urgency to include this in v1.4.0, we can wait to include it in the next release. My thought was just that it'd be useful to have this functionality in the soonest stable release for testing, but I can just use the nightly image.

@kevin85421
Copy link
Member

My thought was just that it'd be useful to have this functionality in the soonest stable release for testing, but I can just use the nightly image.

This makes sense. I’ll review it for now. We’ll make a best-effort attempt to include this PR, but there’s no guarantee.


// getRayMarketTypeFromNodeSelector is a helper function to determine the ray.io/market-type label
// based on user-provided Kubernetes nodeSelector values.
func getRayMarketTypeFromNodeSelector(pod *corev1.Pod) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess some schedulers or webhooks may update the node selector after the Pod is created. We should take it into consideration.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that the value of the default labels might change after the ray node started?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might change after the ray node started?

No, I mean it may be changed after KubeRay constructs the Pod spec but before the Pod is created or scheduled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any webhooks that modify the value of cloud.google.com/gke-spot on a Pod? I'm having difficulty finding a reference. I think we should default to adding Ray node labels based on what the user specifies in their Pod spec.

ryanaoleary and others added 7 commits June 6, 2025 05:55
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Add default Ray node label info to Ray Pod environment

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Co-authored-by: Kai-Hsun Chen <kaihsun@apache.org>
Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ryanaoleary ryanaoleary force-pushed the pass-worker-group-ray-label branch from daebac4 to 604c162 Compare June 6, 2025 05:56
@ryanaoleary ryanaoleary requested a review from andrewsykim June 6, 2025 06:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

5 participants
0