A GPipe implementation in PyTorch
-
Updated
Jul 25, 2024 - Python
A GPipe implementation in PyTorch
Very-Low Overhead Checkpointing System
Keras wrapper that autosaves what ModelCheckpoint cannot.
Extending DOLFINx with checkpointing functionality
This FLINK project will consume streams from an azure event-hub and produce to a different event-hub ,and the config files for deploying the same in kubernetes
Code and tutorial on integrating wandb sweeps with Slurm pre-emption
This is a standalone flink producer using for testing the flink-consume-produce-ek repo contents
A shared library to help test your code with failure-injection
A digital album face recognition manager, that isolates images of a specified person from a digital album.
Koo and Toueg’s checkpointing and recovery protocol
DMTCP scripts to get Python scripts working with SLURM.
A lightweight checkpointing program written in C.
Add a description, image, and links to the checkpointing topic page so that developers can more easily learn about it.
To associate your repository with the checkpointing topic, visit your repo's landing page and select "manage topics."