8000 1:1 job to topology assignment · Issue #27 · kubernetes-sigs/jobset · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1:1 job to topology assignment #27

Closed
ahg-g opened this issue Apr 14, 2023 · 1 comment · Fixed by #36
Closed

1:1 job to topology assignment #27

ahg-g opened this issue Apr 14, 2023 · 1 comment · Fixed by #36
Assignees

Comments

@ahg-g
Copy link
Contributor
ahg-g commented Apr 14, 2023

Consider the case where in a replicatedJob each job needs to land on a separate topology, for example one per network domain, like a rack.

This can be achieved using a combination of affinity and anti-affinity constraints on the pod template:

      affinity:
        podAffinity:  # ensures the pods of this job land on the same rack
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: job-name
                operator: In
                values:
                - my-job
              topologyKey: rack
        podAntiAffinity: # ensures only this job lands on the rack
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: job-name
                operator: NotIn
                values:
                - my-job
              - key: job-name
                operator: Exists
            namespaceSelector: {}
            topologyKey: rack

The problem is that the affinity terms above require setting the value of the job-name label explicitly, which in a replicatedJob is different for each job replica.

The MatchLabelKeys enhancement in upstream k8s should address this problem, but it is not yet available.

Until this is available, we can have this injected by JobSet, triggered by an api we add to ReplicatedJob type that look like :

exclusive
  topologyKey: rack
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants
0