A powerful training toolkit for image generation models using Flow Matching technique.
- Streaming dataloader that works directly from S3 or your local drive
- Flexible configuration via JSON
- Multi-GPU training support with automatic device detection
- Configurable inference during training
- Wandb and Hugging Face integration
- Parameter efficient training with layer rotation and offloading
# Clone the repository
git clone https://github.com/lodestone-rock/flow.git
cd flow
# Install dependencies
pip install -r requirements.txt
- Prepare your training data in JSONL format (see Data Format section)
- Configure your training settings in
training_config.json
- Run the training script:
python train_mp.py # Automatically detects available GPUs
The trainer uses JSONL files for dataset metadata. Each line represents a single training example:
{"filename": "https://your/s3/url", "caption_or_tags": "your image captions", "width": 816, "height": 1456, "is_tag_based": false, "is_url_based": true, "loss_weight": 0.5}
{"filename": "path/to/image/folder", "caption_or_tags": "tags, for, your, image", "width": 1024, "height": 1456, "is_tag_based": true, "is_url_based": false, "loss_weight": 0.3}
filename
: Either an S3 URL or a local path to the imagecaption_or_tags
: Text description or comma-separated tags for the imagewidth
/height
: Image dimensionsis_tag_based
: Set totrue
if using tags instead of captionsis_url_based
: Set totrue
if the image is hosted on S3/URLloss_weight
: Optional weighting factor for this sample (0.0-1.0)
The trainer is configured via a JSON file with the following sections:
"training": {
"master_seed": 0,
"cache_minibatch": 2,
"train_minibatch": 1,
"offload_param_count": 5000000000,
"lr": 1e-05,
"weight_decay": 0.0001,
"warmup_steps": 1,
"change_layer_every": 3,
"trained_single_blocks": 2,
"trained_double_blocks": 2,
"save_every": 6,
"save_folder": "checkpoints",
"wandb_key": null,
"wandb_project": null,
"wandb_run": "chroma",
"wandb_entity": null,
"hf_repo_id": null,
"hf_token": null
}
Parameter | Description |
---|---|
master_seed |
Random seed for reproducibility |
cache_minibatch |
T5 and VAE latent minibatch size (for text and image embedding generation) |
train_minibatch |
Model training minibatch size |
offload_param_count |
Frozen parameter count to offload to CPU during optimizer updates |
lr |
Learning rate |
weight_decay |
Weight decay for regularization |
warmup_steps |
Number of warmup steps for the learning rate scheduler |
change_layer_every |
Change randomly trained layers every X parameter updates |
trained_single_blocks |
Number of trainable transformer single blocks |
trained_double_blocks |
Number of trainable transformer double blocks |
save_every |
Save model checkpoint every X steps |
save_folder |
Directory to save model checkpoints |
wandb_key |
Weights & Biases API key (optional) |
wandb_project |
Weights & Biases project name (optional) |
wandb_run |
Weights & Biases run name (optional) |
wandb_entity |
Weights & Biases entity name (optional) |
hf_repo_id |
Hugging Face repository ID for pushing models (optional) |
hf_token |
Hugging Face API token (optional) |
"inference": {
"inference_every": 2,
"inference_folder": "inference_folder",
"steps": 20,
"guidance": 3,
"cfg": 1,
"prompts": [
"a cute cat sat on a mat while receiving a head pat from his owner called Matt",
"baked potato, on the space floating orbiting around the earth"
],
"first_n_steps_wo_cfg": -1,
"image_dim": [512, 512],
"t5_max_length": 512
}
Parameter | Description |
---|---|
inference_every |
Run inference every X training steps |
inference_folder |
Directory to save generated images |
steps |
Number of sampling steps for generation |
guidance |
Guidance scale for classifier-free guidance |
cfg |
Classifier-free guidance scale |
prompts |
List of text prompts for test generation |
first_n_steps_wo_cfg |
Number of initial steps without classifier-free guidance (-1 to disable) |
image_dim |
Output image dimensions [width, height] |
t5_max_length |
Maximum token length for text encoder |
You can add multiple inference configurations using the extra_inference_config
array.
"dataloader": {
"batch_size": 8,
"jsonl_metadata_path": "test_training_data.jsonl",
"image_folder_path": "/images",
"base_resolution": [256],
"shuffle_tags": true,
"tag_drop_percentage": 0.0,
"uncond_percentage": 0.0,
"resolution_step": 64,
"num_workers": 2,
"prefetch_factor": 2,
"ratio_cutoff": 2.0,
"thread_per_worker": 2
}
Parameter | Description |
---|---|
batch_size |
Total batch size per training step (must be divisible by number of GPUs) |
jsonl_metadata_path |
Path to the JSONL metadata file |
image_folder_path |
Base path for local images (if used) |
base_resolution |
Base training resolution for images (can be multi res) |
shuffle_tags |
Whether to randomly shuffle tags (for tag-based training) |
tag_drop_percentage |
Percentage of tags to randomly drop during training |
uncond_percentage |
Percentage of samples to use as unconditional samples |
resolution_step |
Step size for resolution buckets |
num_workers |
Number of dataloader workers |
prefetch_factor |
Prefetch factor for dataloader |
ratio_cutoff |
Maximum aspect ratio cutoff |
thread_per_worker |
Number of threads per worker |
"model": {
"chroma_path": "models/flux/FLUX.1-schnell/chroma-8.9b.safetensors",
"vae_path": "models/flux/ae.safetensors",
"t5_path": "models/flux/text_encoder_2",
"t5_config_path": "models/flux/text_encoder_2/config.json",
"t5_tokenizer_path": "models/flux/tokenizer_2",
"t5_to_8bit": true,
"t5_max_length": 512
}
Parameter | Description |
---|---|
chroma_path |
Path to the base model checkpoint to resume from |
vae_path |
Path to the VAE model |
t5_path |
Path to the T5 text encoder model |
t5_config_path |
Path to the T5 config file |
t5_tokenizer_path |
Path to the T5 tokenizer |
t5_to_8bit |
Whether to load T5 in 8-bit precision to save memory |
t5_max_length |
Maximum token length for text encoder |
The trainer supports efficient training by focusing on specific transformer blocks:
trained_single_blocks
: Number of single transformer blocks to traintrained_double_blocks
: Number of double transformer blocks to trainchange_layer_every
: Frequency of rotating which layers are being trained
This approach allows training large models on limited hardware by only updating a subset of parameters at once.
You can set up multiple inference configurations to test different settings during training:
"extra_inference_config":[
{
"inference_every": 2,
"inference_folder": "inference_folder_cfg4",
"steps": 20,
"guidance": 3,
"cfg": 4,
"prompts": [
"a cute cat sat on a mat while receiving a head pat from his owner called Matt",
"baked potato, on the space floating orbiting around the earth"
],
"first_n_steps_wo_cfg": 0,
"image_dim": [512, 512],
"t5_max_length": 512
}
]
{
"training": {
"master_seed": 0,
"cache_minibatch": 2,
"train_minibatch": 1,
"offload_param_count": 5000000000,
"lr": 1e-05,
"weight_decay": 0.0001,
"warmup_steps": 1,
"change_layer_every": 3,
"trained_single_blocks": 2,
"trained_double_blocks": 2,
"save_every": 6,
"save_folder": "testing",
"wandb_key": null,
"wandb_project": null,
"wandb_run": "chroma",
"wandb_entity": null,
"hf_repo_id": null,
"hf_token": null
},
"inference": {
"inference_every": 2,
"inference_folder": "inference_folder",
"steps": 20,
"guidance": 3,
"cfg": 1,
"prompts": [
"a cute cat sat on a mat while receiving a head pat from his owner called Matt",
"baked potato, on the space floating orbiting around the earth"
],
"first_n_steps_wo_cfg": -1,
"image_dim": [512, 512],
"t5_max_length": 512
},
"extra_inference_config":[
{
"inference_every": 2,
"inference_folder": "inference_folder",
"steps": 20,
"guidance": 3,
"cfg": 4,
"prompts": [
"a cute cat sat on a mat while receiving a head pat from his owner called Matt",
"baked potato, on the space floating orbiting around the earth"
],
"first_n_steps_wo_cfg": 0,
"image_dim": [512, 512],
"t5_max_length": 512
}
],
"dataloader": {
"batch_size": 8,
"jsonl_metadata_path": "test_training_data.jsonl",
"image_folder_path": "/images",
"base_resolution": [512],
"shuffle_tags": true,
"tag_drop_percentage": 0.0,
"uncond_percentage": 0.0,
"resolution_step": 64,
"num_workers": 2,
"prefetch_factor": 2,
"ratio_cutoff": 2.0,
"thread_per_worker": 100
},
"model": {
"chroma_path": "models/flux/FLUX.1-schnell/chroma-8.9b.safetensors",
"vae_path": "models/flux/ae.safetensors",
"t5_path": "models/flux/text_encoder_2",
"t5_config_path": "models/flux/text_encoder_2/config.json",
"t5_tokenizer_path": "models/flux/tokenizer_2",
"t5_to_8bit": true,
"t5_max_length": 512
}
}
-
Out of Memory Errors: Try reducing
batch_size
, increasingoffload_param_count
, or reducingtrained_single_blocks
andtrained_double_blocks
. -
Slow Training: Check
num_workers
andprefetch_factor
settings for your dataloader. Also consider increasingcache_minibatch
if you have sufficient GPU memory. -
Poor Generation Quality: Adjust
cfg
andguidance
parameters in your inference settings, or increase the number ofsteps
.
Apache 2.0
@misc{rock2025flow,
author = {Lodestone Rock},
title = {{Flow}},
year = {2025},
note = {Github repository},
howpublished = {\url{https://github.com/lodestone-rock/flow}},
}