8000 Push images to GHCR for release-1.9 by saileshd1402 · Pull Request #2491 · kubeflow/trainer · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Push images to GHCR for release-1.9 #2491

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

saileshd1402
Copy link
Contributor

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #

Checklist:

  • Docs included if any changes are user facing

@juliusvonkohout
Copy link
Member

/ok-to-test
you can use /retest to trigger the ci/cd again

@juliusvonkohout
Copy link
Member

Please also sign your commits according to the DCO

# Sometimes we forget to sign commits and have to rebase
# sign the latest X commits with HEAD~X
git rebase --signoff HEAD~1
# You have to force push afterwards
git push -f

And please make sure that you sign with the email that is in the signing key that you have uploaded to github.

@juliusvonkohout
Copy link
Member

Should this PR not be against the master branch @andreyvelich and then cherry picked to the release branch?

@andreyvelich
Copy link
Member
andreyvelich commented Mar 10, 2025

Should this PR not be against the master branch @andreyvelich and then cherry picked to the release branch?

You are right. We removed only the Training Operator code from the master branch.

@saileshd1402 Please update the images for Kubeflow Trainer in the master branch, but for Training Operator images you need to create separate PR to update release-1.9 branch.

@coveralls
Copy link
coveralls commented Mar 10, 2025

Pull Request Test Coverage Report for Build 13823612117

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 100.0%

Totals Coverage Status
Change from base Build 13635027331: 0.0%
Covered Lines: 85
Relevant Lines: 85

💛 - Coveralls

@saileshd1402
Copy link
Contributor Author
saileshd1402 commented Mar 10, 2025

Should this PR not be against the master branch @andreyvelich and then cherry picked to the release branch?

You are right. We removed only the Training Operator code from the master branch.

@saileshd1402 Please update the images for Kubeflow Trainer in the master branch, but for Training Operator images you need to create separate PR to update release-1.9 branch.

That's what this PR is for correct? I thought this PR would be for 1.9 only. Should I do the same in master as well?

@saileshd1402 saileshd1402 force-pushed the ghcr-images-release-v1.9 branch from 38bf74c to ec3b1aa Compare March 10, 2025 22:45
@saileshd1402 saileshd1402 marked this pull request as ready for review March 10, 2025 22:45
Copy link
Member
@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @saileshd1402!

id: publish
uses: ./.github/workflows/template-publish-image
with:
image: ghcr.io/kubeflow/${{ inputs.component-name }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we discussed it here, can we keep /trainer/ prefix in the image name: #2455 (comment)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for v1.9, isn't it a good idea to keep training-operator as prefix instead of trainer?

Copy link
Member
@andreyvelich andreyvelich Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, great point!
Yes, let's use ghcr.io/kubeflow/training-operator/...

Copy link
Contributor Author
@saileshd1402 saileshd1402 Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From documentation, looks like ghcr only allows to push to the same repository name (i.e ghcr.io/kubeflow/trainer). But I think there is no way to test publish to ghcr with training-operator or trainer until merge. Should I keep it as trainer itself?

@@ -48,7 +48,18 @@ jobs:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}

- name: Publish Component ${{ inputs.component-name }}
- name: GHCR Login
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update this in separate PR? I don't think I'm able to push the images from CI without it being merged first

10000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we publish images only when PR is merged ?
I thought on pull-request we only verify build.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you mean integration tests are failing due to invalid images in the manifests ?

Copy link
Contributor Author
@saileshd1402 saileshd1402 Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we publish only on merge, that's why I can't tell exact image tag that will be published, so what should I put in manifest?

Comment on lines -29 to -40
- component-name: training-operator-v2
dockerfile: cmd/training-operator.v2alpha1/Dockerfile
platforms: linux/amd64,linux/arm64,linux/ppc64le
tag-prefix: v2alpha1
- component-name: model-initializer-v2
dockerfile: cmd/initializer_v2/model/Dockerfile
platforms: linux/amd64,linux/arm64
tag-prefix: v2
- component-name: dataset-initializer-v2
dockerfile: cmd/initializer_v2/dataset/Dockerfile
platforms: linux/amd64,linux/arm64
tag-prefix: v2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something. Why do we remove these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the idea was to not include v2 images in the 1.9 branch

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, we are publishing v2 manager as part of KF release as we can see: https://github.com/kubeflow/manifests/tree/master/apps/training-operator/upstream/v2

@andreyvelich Do you want to remove v2 manifests from KF v1.9 release?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think since there are not stable, I prefer to remove them.
We don't have e2e tests or working example in the release branch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member
@juliusvonkohout juliusvonkohout Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V2 is not yet installed by default in Kubeflow Platform 1.9.1 or 1.10.0, we are still on V1. So there is no harm in having them there. Especially once I add integration tests for v2 before enabling v2 by default, we need the v2 manifests there anyway.

Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
@saileshd1402 saileshd1402 force-pushed the ghcr-images-release-v1.9 branch from 873597d to 884bfcd Compare March 12, 2025 22:47
Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
@saileshd1402 saileshd1402 changed the title Push images to GHCR Push images to GHCR for release-1.9 Mar 12, 2025
Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
id: publish-ghcr
uses: ./.github/workflows/template-publish-image
with:
image: ghcr.io/kubeflow/trainer/${{ inputs.component-name }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use the training-operator prefix since we renamed the repo, right ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author
@saileshd1402 saileshd1402 Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use the training-operator prefix since we renamed the repo, right ?

yes, from what I understand. but we can't check unless PR is merged I think

What about this image, can we re-use it ?

I can make the changes, but i'm not sure how to test if it will work

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can try to merge it, and see if the post-merge job succeed.
We ignore the pushing action in the PR.

Let's try to existing repo and see if that works:

ghcr.io/kubeflow/training/training-operator:<SHA>
ghcr.io/kubeflow/training/storage-initializer:<SHA>

Let's also use the v1.9 as a tag prefix, since it would be more easier to understand for users what is the version of Training Operator it is.

@kubeflow/wg-training-leads @thesuperzapper @juliusvonkohout Any objections ?

Copy link
Member
@juliusvonkohout juliusvonkohout Mar 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ghcr.io/kubeflow/training/training-operator:v1.9.1
ghcr.io/kubeflow/training/storage-initializer:v1.9.1

or

ghcr.io/kubeflow/training/training-operator:SHA
ghcr.io/kubeflow/training/storage-initializer:SHA

makes sense, but i am not sure how you want to mix them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes speed is more important if i am supposed to release RC.3 tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay. I have updated the prefix as suggested PTAL

Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Publish Component ${{ inputs.component-name }} to Dockerhub
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consolidate these 2 actions to one similar to this PR: #2537 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for mentioning, I've updated

Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
Copy link
Member
@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you
/approve

I will leave final approval on @andreyvelich

Copy link
Member
@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @saileshd1402!
/lgtm
/approve

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [andreyvelich,tenzen-y]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit f654b1e into kubeflow:release-1.9 Mar 18, 2025
53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0