diff --git a/.ai/claude.prompt.md b/.ai/claude.prompt.md new file mode 100644 index 00000000..7f38f575 --- /dev/null +++ b/.ai/claude.prompt.md @@ -0,0 +1,9 @@ +## About This File + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## 1. Project Context +Here is the essential context for our project. Please read and understand it thoroughly. + +### Project Overview +@./context/01-overview.md diff --git a/.ai/context/01-overview.md b/.ai/context/01-overview.md new file mode 100644 index 00000000..41133e98 --- /dev/null +++ b/.ai/context/01-overview.md @@ -0,0 +1,101 @@ +This file provides the overview and guidance for developers working with the codebase, including setup instructions, architecture details, and common commands. + +## Project Architecture + +### Core Training Framework +The codebase is built around a **strategy pattern architecture** that supports multiple diffusion model families: + +- **`library/strategy_base.py`**: Base classes for tokenization, text encoding, latent caching, and training strategies +- **`library/strategy_*.py`**: Model-specific implementations for SD, SDXL, SD3, FLUX, etc. +- **`library/train_util.py`**: Core training utilities shared across all model types +- **`library/config_util.py`**: Configuration management with TOML support + +### Model Support Structure +Each supported model family has a consistent structure: +- **Training script**: `{model}_train.py` (full fine-tuning), `{model}_train_network.py` (LoRA/network training) +- **Model utilities**: `library/{model}_models.py`, `library/{model}_train_utils.py`, `library/{model}_utils.py` +- **Networks**: `networks/lora_{model}.py`, `networks/oft_{model}.py` for adapter training + +### Supported Models +- **Stable Diffusion 1.x**: `train*.py`, `library/train_util.py`, `train_db.py` (for DreamBooth) +- **SDXL**: `sdxl_train*.py`, `library/sdxl_*` +- **SD3**: `sd3_train*.py`, `library/sd3_*` +- **FLUX.1**: `flux_train*.py`, `library/flux_*` + +### Key Components + +#### Memory Management +- **Block swapping**: CPU-GPU memory optimization via `--blocks_to_swap` parameter, works with custom offloading. Only available for models with transformer architectures like SD3 and FLUX.1. +- **Custom offloading**: `library/custom_offloading_utils.py` for advanced memory management +- **Gradient checkpointing**: Memory reduction during training + +#### Training Features +- **LoRA training**: Low-rank adaptation networks in `networks/lora*.py` +- **ControlNet training**: Conditional generation control +- **Textual Inversion**: Custom embedding training +- **Multi-resolution training**: Bucket-based aspect ratio handling +- **Validation loss**: Real-time training monitoring, only for LoRA training + +#### Configuration System +Dataset configuration uses TOML files with structured validation: +```toml +[datasets.sample_dataset] + resolution = 1024 + batch_size = 2 + + [[datasets.sample_dataset.subsets]] + image_dir = "path/to/images" + caption_extension = ".txt" +``` + +## Common Development Commands + +### Training Commands Pattern +All training scripts follow this general pattern: +```bash +accelerate launch --mixed_precision bf16 {script_name}.py \ + --pretrained_model_name_or_path model.safetensors \ + --dataset_config config.toml \ + --output_dir output \ + --output_name model_name \ + [model-specific options] +``` + +### Memory Optimization +For low VRAM environments, use block swapping: +```bash +# Add to any training command for memory reduction +--blocks_to_swap 10 # Swap 10 blocks to CPU (adjust number as needed) +``` + +### Utility Scripts +Located in `tools/` directory: +- `tools/merge_lora.py`: Merge LoRA weights into base models +- `tools/cache_latents.py`: Pre-cache VAE latents for faster training +- `tools/cache_text_encoder_outputs.py`: Pre-cache text encoder outputs + +## Development Notes + +### Strategy Pattern Implementation +When adding support for new models, implement the four core strategies: +1. `TokenizeStrategy`: Text tokenization handling +2. `TextEncodingStrategy`: Text encoder forward pass +3. `LatentsCachingStrategy`: VAE encoding/caching +4. `TextEncoderOutputsCachingStrategy`: Text encoder output caching + +### Testing Approach +- Unit tests focus on utility functions and model loading +- Integration tests validate training script syntax and basic execution +- Most tests use mocks to avoid requiring actual model files +- Add tests for new model support in `tests/test_{model}_*.py` + +### Configuration System +- Use `config_util.py` dataclasses for type-safe configuration +- Support both command-line arguments and TOML file configuration +- Validate configuration early in training scripts to prevent runtime errors + +### Memory Management +- Always consider VRAM limitations when implementing features +- Use gradient checkpointing for large models +- Implement block swapping for models with transformer architectures +- Cache intermediate results (latents, text embeddings) when possible \ No newline at end of file diff --git a/.ai/gemini.prompt.md b/.ai/gemini.prompt.md new file mode 100644 index 00000000..6047390b --- /dev/null +++ b/.ai/gemini.prompt.md @@ -0,0 +1,9 @@ +## About This File + +This file provides guidance to Gemini CLI (https://github.com/google-gemini/gemini-cli) when working with code in this repository. + +## 1. Project Context +Here is the essential context for our project. Please read and understand it thoroughly. + +### Project Overview +@./context/01-overview.md diff --git a/.gitignore b/.gitignore index e492b1ad..b991f6db 100644 --- a/.gitignore +++ b/.gitignore @@ -6,3 +6,5 @@ venv build .vscode wandb +CLAUDE.md +GEMINI.md \ No newline at end of file diff --git a/README.md b/README.md index 497969ab..149f453b 100644 --- a/README.md +++ b/README.md @@ -16,6 +16,9 @@ If you are using DeepSpeed, please install DeepSpeed with `pip install deepspeed ### Recent Updates +Jul 10, 2025: +- [AI Coding Agents](#for-developers-using-ai-coding-agents) section is added to the README. This section provides instructions for developers using AI coding agents like Claude and Gemini to understand the project context and coding standards. + May 1, 2025: - The error when training FLUX.1 with mixed precision in flux_train.py with DeepSpeed enabled has been resolved. Thanks to sharlynxy for PR [#2060](https://github.com/kohya-ss/sd-scripts/pull/2060). Please refer to the PR for details. - If you enable DeepSpeed, please install DeepSpeed with `pip install deepspeed==0.16.7`. @@ -54,46 +57,30 @@ Jan 25, 2025: - It will be added to other scripts as well. - As a current limitation, validation loss is not supported when `--block_to_swap` is specified, or when schedule-free optimizer is used. -Dec 15, 2024: +## For Developers Using AI Coding Agents -- RAdamScheduleFree optimizer is supported. PR [#1830](https://github.com/kohya-ss/sd-scripts/pull/1830) Thanks to nhamanasu! - - Update to `schedulefree==1.4` is required. Please update individually or with `pip install --use-pep517 --upgrade -r requirements.txt`. - - Available with `--optimizer_type=RAdamScheduleFree`. No need to specify warm up steps as well as learning rate scheduler. +This repository provides recommended instructions to help AI agents like Claude and Gemini understand our project context and coding standards. -Dec 7, 2024: +To use them, you need to opt-in by creating your own configuration file in the project root. -- The option to specify the model name during ControlNet training was different in each script. It has been unified. Please specify `--controlnet_model_name_or_path`. PR [#1821](https://github.com/kohya-ss/sd-scripts/pull/1821) Thanks to sdbds! - +**Quick Setup:** -- Fixed an issue where the saved model would be corrupted (pos_embed would not be saved) when `--enable_scaled_pos_embed` was specified in `sd3_train.py`. +1. Create a `CLAUDE.md` and/or `GEMINI.md` file in the project root. +2. Add the following line to your `CLAUDE.md` to import the repository's recommended prompt: -Dec 3, 2024: + ```markdown + @./.ai/claude.prompt.md + ``` --`--blocks_to_swap` now works in FLUX.1 ControlNet training. Sample commands for 24GB VRAM and 16GB VRAM are added [here](#flux1-controlnet-training). + or for Gemini: -Dec 2, 2024: + ```markdown + @./.ai/gemini.prompt.md + ``` -- FLUX.1 ControlNet training is supported. PR [#1813](https://github.com/kohya-ss/sd-scripts/pull/1813). Thanks to minux302! See PR and [here](#flux1-controlnet-training) for details. - - Not fully tested. Feedback is welcome. - - 80GB VRAM is required for 1024x1024 resolution, and 48GB VRAM is required for 512x512 resolution. - - Currently, it only works in Linux environment (or Windows WSL2) because DeepSpeed is required. - - Multi-GPU training is not tested. +3. You can now add your own personal instructions below the import line (e.g., `Always respond in Japanese.`). -Dec 1, 2024: - -- Pseudo Huber loss is now available for FLUX.1 and SD3.5 training. See PR [#1808](https://github.com/kohya-ss/sd-scripts/pull/1808) for details. Thanks to recris! - - Specify `--loss_type huber` or `--loss_type smooth_l1` to use it. `--huber_c` and `--huber_scale` are also available. - -- [Prodigy + ScheduleFree](https://github.com/LoganBooker/prodigy-plus-schedule-free) is supported. See PR [#1811](https://github.com/kohya-ss/sd-scripts/pull/1811) for details. Thanks to rockerBOO! - -Nov 14, 2024: - -- Improved the implementation of block swap and made it available for both FLUX.1 and SD3 LoRA training. See [FLUX.1 LoRA training](#flux1-lora-training) etc. for how to use the new options. Training is possible with about 8-10GB of VRAM. -- During fine-tuning, the memory usage when specifying the same number of blocks has increased slightly, but the training speed when specifying block swap has been significantly improved. -- There may be bugs due to the significant changes. Feedback is welcome. +This approach ensures that you have full control over the instructions given to your agent while benefiting from the shared project context. Your `CLAUDE.md` and `GEMINI.md` are already listed in `.gitignore`, so it won't be committed to the repository. ## FLUX.1 training