8000 Releases · kubeflow/trainer · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Releases: kubeflow/trainer

v2.0.0-rc.0

12 Jun 12:00
Compare
Choose a tag to compare
v2.0.0-rc.0 Pre-release
Pre-release

This is the Kubeflow Trainer v2.0.0-rc.0 pre-release.

Breaking Changes

New Features

LLM Trainer V2

Runtime Framework

  • feat(runtimes): Support MLX Distributed Runtime with OpenMPI (#2565 by @andreyvelich)
  • feat(runtimes): Support DeepSpeed Runtime with OpenMPI (#2559 by @andreyvelich)
  • feat(runtime): remove needless Launcher chainer. (#2558 by @IRONICBo)
  • Store the TrainingRuntime numNodes as runtime.Info.PodSet.Count (#2539 by @tenzen-y)
  • Add dependencies to RuntimeRegistrar (#2476 by @tenzen-y)
  • KEP: 2170: Adding cel validations on TrainingRuntime/ClusterTrainingRuntime CRDs (#2313 by @akshaychitneni)
  • Implement trainer.kubeflow.org/resource-in-use finalizer mechanism to ClusterTrainingRuntime (#2625 by @tenzen-y)
  • Implement trainer.kubeflow.org/resource-in-use finalizer mechanism to TrainingRuntime (#2608 by @tenzen-y)

MPI Plugin

JobSet

New Examples

SDK Updates

Bug Fixes

Misc

Read more

v1.9.2

03 May 02:43
bde9c20
Compare
Choose a tag to compare

This is the Training Operator v1.9.2 release.

New Features

Bug Fixes

v1.9.1 release

31 Mar 23:09
17077e3
Compare
Choose a tag to compare

This is the Training Operator v1.9.1 release.

Breaking Changes

New Features

  • Add volume and volume mounts arguments to TrainingClient.create_job API (#2449 by @astefanutti)
  • Add configurable QPS and burst settings for kube API client (#2411 by @ronk21runai)

Bug Fixes

v1.9.0 release

28 Jan 15:58
6f74c7f
Compare
Choose a tag to compare

This is the Training Operator v1.9.0 release.

This release introduces a new JAXJob, enabling seamless distributed training with JAX.

Additionally, it adds the managedBy API to streamline the orchestration of training Jobs in multi-cluster environment using MultiKueue.

Breaking Changes

New Features

Distributed JAX

New Examples

Control Plane Updates

SDK Updates

Kubeflow Trainer V2

Bug Fixes

Misc

Read more

v1.9.0-rc.0 release

10 Jan 23:27
a0ae3b1
Compare
Choose a tag to compare
v1.9.0-rc.0 release Pre-release
Pre-release

This is the Training Operator v1.9.0-rc.0 pre-release.

Breaking Changes

New Features

Distributed JAX

New Examples

Control Plane Updates

SDK Updates

Kubeflow Training V2

Bug Fixes

Misc

Read more

v1.8.1 release

10 Sep 15:14
Compare
Choose a tag to compare

This is the Training Operator v1.8.1 release.

Bug Fixes

  • [Bug] Finish CleanupJob early if the job is suspended (#2243 by @mszadkow)
  • [SDK] Fix trainer error: Update the version of base image and add "num_labels" for downloading pretrained models (#2230 by @helenxie-bit)
  • Update huggingface_hub Version in the storage initializer to fix ImportError (#2180 by @helenxie-bit)

New Contributors

v1.8.0 release

23 Jul 18:10
f8687ca
Compare
Choose a tag to compare

This is the Training Operator v1.8.0 release.

This release introduces a new Python API for LLMs Fine-Tuning that simplifies the ability to fine-tune foundational models using distributed PyTorch nodes.

Install the Kubeflow Training SDK as follows to try it:

pip install -U "kubeflow-training[huggingface]"

LLMs Fine-Tuning API

Breaking Changes

New Features

Control Plane Updates

SDK Improvements

Bug Fixes

Misc

Read more

v1.8.0-rc.0 release

28 Apr 18:37
643af3d
Compare
Choose a tag to compare
v1.8.0-rc.0 release Pre-release
Pre-release

New features

Bug fixes

Misc

v1.7.0 release

01 Nov 07:49
5525468
Compare
Choose a tag to compare

Breaking Changes

  • Make scheduler-plugins the default gang scheduler. #1747 (Syulin7)
  • Upgrade the kubernetes dependencies to v1.27 #1834 (tenzen-y)

New features

Bug fixes

  • Fix a bug that XGBoostJob's running condition isn't updated when the job is resumed #1866 (tenzen-y)
  • Set a Running condition when the XGBoostJob is completed and doesn't have a Running condition #1789 (tenzen-y)
  • Avoid to depend on local env when installing the code-generators #1810 (tenzen-y)

Misc

v1.7.0-rc.0 release

07 Aug 13:00
434cef7
Compare
Choose a tag to compare
v1.7.0-rc.0 release Pre-release
Pre-release

Breaking Changes

  • Make scheduler-plugins the default gang scheduler. #1747 (Syulin7)
  • Upgrade the kubernetes dependencies to v1.27 #1834 (tenzen-y)

New features

Bug fixes

  • Fix a bug that XGBoostJob's running condition isn't updated when the job is resumed #1866 (tenzen-y)
  • Set a Running condition when the XGBoostJob is completed and doesn't have a Running condition #1789 (tenzen-y)
  • Avoid to depend on local env when installing the code-generators #1810 (tenzen-y)

Misc

0