-
Notifications
You must be signed in to change notification settings - Fork 785
Insights: kubeflow/trainer
Overview
There hasn’t been any commit activity on kubeflow/trainer in the last week.
Want to help out?
1 Pull request merged by 1 person
-
chore(ci): Add more workaround no space left on device
#2677 merged
Jun 20, 2025
1 Pull request opened by 1 person
-
Fix - Add certificate and issuer resources to manifests and helm chart
#2678 opened
Jun 21, 2025
1 Issue closed by 1 person
-
Distributed training with mutliple pods, with multi-gpu in each pod
#2456 closed
Jun 18, 2025
2 Issues opened by 1 person
-
Add schedulingGates to PodSpecOverrides
#2680 opened
Jun 23, 2025 -
Mutable PodSpecOverrides for suspended TrainJob
#2679 opened
Jun 23, 2025
14 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
KEP-2437: Support Volcano Scheduler in Kubeflow Trainer V2
#2672 commented on
Jun 24, 2025 • 24 new comments -
[proposal] GSoC Project 8: JAX Runtime for V2
#2643 commented on
Jun 24, 2025 • 11 new comments -
Add provision to provide local-queue for the training job in SDKv1 an…
#2636 commented on
Jun 23, 2025 • 9 new comments -
KEP-2628: Support KAI Scheduler in Kubeflow Trainer
#2663 commented on
Jun 21, 2025 • 1 new comment -
[GSoC] Project 7: GPU Testing for LLM Blueprints
#2674 commented on
Jun 18, 2025 • 0 new comments -
Flaky Test: TestDatasetIntegration.test_dataset_download[HuggingFace - Public dataset-huggingface-test_case0]
#2460 commented on
Jun 18, 2025 • 0 new comments -
Get and Use TrainingRuntime ApplyConfiguration throughout KF PipelineFramework
#2515 commented on
Jun 19, 2025 • 0 new comments -
Export Models to Kubeflow Model Registry
#2438 commented on
Jun 20, 2025 • 0 new comments -
KEP-2401: Determine the tag for torchtune trainer & Add support for multiple accelerators
#2518 commented on
Jun 22, 2025 • 0 new comments -
KEP-2170: Support hundreds and thousands worker nodes for a single training Job
#2318 commented on
Jun 22, 2025 • 0 new comments -
KEP-2170: Add Kubeflow Trainer Pipeline Framework Concept page to Documentation
#2458 commented on
Jun 23, 2025 • 0 new comments -
Add migration guide from Training Operator to Kubeflow Trainer V2
#2412 commented on
Jun 24, 2025 • 0 new comments -
Fix Prometheus metrics counter
#2553 commented on
Jun 22, 2025 • 0 new comments -
feat(scheduler):add support for kai scheduler
#2649 commented on
Jun 19, 2025 • 0 new comments