mirror of
https://github.com/kohya-ss/sd-scripts.git
synced 2026-04-06 13:47:06 +00:00
Add prompt guidance files for Claude and Gemini, and update README for AI coding agents
This commit is contained in:
9
.ai/claude.prompt.md
Normal file
9
.ai/claude.prompt.md
Normal file
@@ -0,0 +1,9 @@
|
||||
## About This File
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## 1. Project Context
|
||||
Here is the essential context for our project. Please read and understand it thoroughly.
|
||||
|
||||
### Project Overview
|
||||
@./context/01-overview.md
|
||||
101
.ai/context/01-overview.md
Normal file
101
.ai/context/01-overview.md
Normal file
@@ -0,0 +1,101 @@
|
||||
This file provides the overview and guidance for developers working with the codebase, including setup instructions, architecture details, and common commands.
|
||||
|
||||
## Project Architecture
|
||||
|
||||
### Core Training Framework
|
||||
The codebase is built around a **strategy pattern architecture** that supports multiple diffusion model families:
|
||||
|
||||
- **`library/strategy_base.py`**: Base classes for tokenization, text encoding, latent caching, and training strategies
|
||||
- **`library/strategy_*.py`**: Model-specific implementations for SD, SDXL, SD3, FLUX, etc.
|
||||
- **`library/train_util.py`**: Core training utilities shared across all model types
|
||||
- **`library/config_util.py`**: Configuration management with TOML support
|
||||
|
||||
### Model Support Structure
|
||||
Each supported model family has a consistent structure:
|
||||
- **Training script**: `{model}_train.py` (full fine-tuning), `{model}_train_network.py` (LoRA/network training)
|
||||
- **Model utilities**: `library/{model}_models.py`, `library/{model}_train_utils.py`, `library/{model}_utils.py`
|
||||
- **Networks**: `networks/lora_{model}.py`, `networks/oft_{model}.py` for adapter training
|
||||
|
||||
### Supported Models
|
||||
- **Stable Diffusion 1.x**: `train*.py`, `library/train_util.py`, `train_db.py` (for DreamBooth)
|
||||
- **SDXL**: `sdxl_train*.py`, `library/sdxl_*`
|
||||
- **SD3**: `sd3_train*.py`, `library/sd3_*`
|
||||
- **FLUX.1**: `flux_train*.py`, `library/flux_*`
|
||||
|
||||
### Key Components
|
||||
|
||||
#### Memory Management
|
||||
- **Block swapping**: CPU-GPU memory optimization via `--blocks_to_swap` parameter, works with custom offloading. Only available for models with transformer architectures like SD3 and FLUX.1.
|
||||
- **Custom offloading**: `library/custom_offloading_utils.py` for advanced memory management
|
||||
- **Gradient checkpointing**: Memory reduction during training
|
||||
|
||||
#### Training Features
|
||||
- **LoRA training**: Low-rank adaptation networks in `networks/lora*.py`
|
||||
- **ControlNet training**: Conditional generation control
|
||||
- **Textual Inversion**: Custom embedding training
|
||||
- **Multi-resolution training**: Bucket-based aspect ratio handling
|
||||
- **Validation loss**: Real-time training monitoring, only for LoRA training
|
||||
|
||||
#### Configuration System
|
||||
Dataset configuration uses TOML files with structured validation:
|
||||
```toml
|
||||
[datasets.sample_dataset]
|
||||
resolution = 1024
|
||||
batch_size = 2
|
||||
|
||||
[[datasets.sample_dataset.subsets]]
|
||||
image_dir = "path/to/images"
|
||||
caption_extension = ".txt"
|
||||
```
|
||||
|
||||
## Common Development Commands
|
||||
|
||||
### Training Commands Pattern
|
||||
All training scripts follow this general pattern:
|
||||
```bash
|
||||
accelerate launch --mixed_precision bf16 {script_name}.py \
|
||||
--pretrained_model_name_or_path model.safetensors \
|
||||
--dataset_config config.toml \
|
||||
--output_dir output \
|
||||
--output_name model_name \
|
||||
[model-specific options]
|
||||
```
|
||||
|
||||
### Memory Optimization
|
||||
For low VRAM environments, use block swapping:
|
||||
```bash
|
||||
# Add to any training command for memory reduction
|
||||
--blocks_to_swap 10 # Swap 10 blocks to CPU (adjust number as needed)
|
||||
```
|
||||
|
||||
### Utility Scripts
|
||||
Located in `tools/` directory:
|
||||
- `tools/merge_lora.py`: Merge LoRA weights into base models
|
||||
- `tools/cache_latents.py`: Pre-cache VAE latents for faster training
|
||||
- `tools/cache_text_encoder_outputs.py`: Pre-cache text encoder outputs
|
||||
|
||||
## Development Notes
|
||||
|
||||
### Strategy Pattern Implementation
|
||||
When adding support for new models, implement the four core strategies:
|
||||
1. `TokenizeStrategy`: Text tokenization handling
|
||||
2. `TextEncodingStrategy`: Text encoder forward pass
|
||||
3. `LatentsCachingStrategy`: VAE encoding/caching
|
||||
4. `TextEncoderOutputsCachingStrategy`: Text encoder output caching
|
||||
|
||||
### Testing Approach
|
||||
- Unit tests focus on utility functions and model loading
|
||||
- Integration tests validate training script syntax and basic execution
|
||||
- Most tests use mocks to avoid requiring actual model files
|
||||
- Add tests for new model support in `tests/test_{model}_*.py`
|
||||
|
||||
### Configuration System
|
||||
- Use `config_util.py` dataclasses for type-safe configuration
|
||||
- Support both command-line arguments and TOML file configuration
|
||||
- Validate configuration early in training scripts to prevent runtime errors
|
||||
|
||||
### Memory Management
|
||||
- Always consider VRAM limitations when implementing features
|
||||
- Use gradient checkpointing for large models
|
||||
- Implement block swapping for models with transformer architectures
|
||||
- Cache intermediate results (latents, text embeddings) when possible
|
||||
9
.ai/gemini.prompt.md
Normal file
9
.ai/gemini.prompt.md
Normal file
@@ -0,0 +1,9 @@
|
||||
## About This File
|
||||
|
||||
This file provides guidance to Gemini CLI (https://github.com/google-gemini/gemini-cli) when working with code in this repository.
|
||||
|
||||
## 1. Project Context
|
||||
Here is the essential context for our project. Please read and understand it thoroughly.
|
||||
|
||||
### Project Overview
|
||||
@./context/01-overview.md
|
||||
2
.gitignore
vendored
2
.gitignore
vendored
@@ -6,3 +6,5 @@ venv
|
||||
build
|
||||
.vscode
|
||||
wandb
|
||||
CLAUDE.md
|
||||
GEMINI.md
|
||||
49
README.md
49
README.md
@@ -16,6 +16,9 @@ If you are using DeepSpeed, please install DeepSpeed with `pip install deepspeed
|
||||
|
||||
### Recent Updates
|
||||
|
||||
Jul 10, 2025:
|
||||
- [AI Coding Agents](#for-developers-using-ai-coding-agents) section is added to the README. This section provides instructions for developers using AI coding agents like Claude and Gemini to understand the project context and coding standards.
|
||||
|
||||
May 1, 2025:
|
||||
- The error when training FLUX.1 with mixed precision in flux_train.py with DeepSpeed enabled has been resolved. Thanks to sharlynxy for PR [#2060](https://github.com/kohya-ss/sd-scripts/pull/2060). Please refer to the PR for details.
|
||||
- If you enable DeepSpeed, please install DeepSpeed with `pip install deepspeed==0.16.7`.
|
||||
@@ -54,46 +57,30 @@ Jan 25, 2025:
|
||||
- It will be added to other scripts as well.
|
||||
- As a current limitation, validation loss is not supported when `--block_to_swap` is specified, or when schedule-free optimizer is used.
|
||||
|
||||
Dec 15, 2024:
|
||||
## For Developers Using AI Coding Agents
|
||||
|
||||
- RAdamScheduleFree optimizer is supported. PR [#1830](https://github.com/kohya-ss/sd-scripts/pull/1830) Thanks to nhamanasu!
|
||||
- Update to `schedulefree==1.4` is required. Please update individually or with `pip install --use-pep517 --upgrade -r requirements.txt`.
|
||||
- Available with `--optimizer_type=RAdamScheduleFree`. No need to specify warm up steps as well as learning rate scheduler.
|
||||
This repository provides recommended instructions to help AI agents like Claude and Gemini understand our project context and coding standards.
|
||||
|
||||
Dec 7, 2024:
|
||||
To use them, you need to opt-in by creating your own configuration file in the project root.
|
||||
|
||||
- The option to specify the model name during ControlNet training was different in each script. It has been unified. Please specify `--controlnet_model_name_or_path`. PR [#1821](https://github.com/kohya-ss/sd-scripts/pull/1821) Thanks to sdbds!
|
||||
<!--
|
||||
Also, the ControlNet training script for SD has been changed from `train_controlnet.py` to `train_control_net.py`.
|
||||
- `train_controlnet.py` is still available, but it will be removed in the future.
|
||||
-->
|
||||
**Quick Setup:**
|
||||
|
||||
- Fixed an issue where the saved model would be corrupted (pos_embed would not be saved) when `--enable_scaled_pos_embed` was specified in `sd3_train.py`.
|
||||
1. Create a `CLAUDE.md` and/or `GEMINI.md` file in the project root.
|
||||
2. Add the following line to your `CLAUDE.md` to import the repository's recommended prompt:
|
||||
|
||||
Dec 3, 2024:
|
||||
```markdown
|
||||
@./.ai/claude.prompt.md
|
||||
```
|
||||
|
||||
-`--blocks_to_swap` now works in FLUX.1 ControlNet training. Sample commands for 24GB VRAM and 16GB VRAM are added [here](#flux1-controlnet-training).
|
||||
or for Gemini:
|
||||
|
||||
Dec 2, 2024:
|
||||
```markdown
|
||||
@./.ai/gemini.prompt.md
|
||||
```
|
||||
|
||||
- FLUX.1 ControlNet training is supported. PR [#1813](https://github.com/kohya-ss/sd-scripts/pull/1813). Thanks to minux302! See PR and [here](#flux1-controlnet-training) for details.
|
||||
- Not fully tested. Feedback is welcome.
|
||||
- 80GB VRAM is required for 1024x1024 resolution, and 48GB VRAM is required for 512x512 resolution.
|
||||
- Currently, it only works in Linux environment (or Windows WSL2) because DeepSpeed is required.
|
||||
- Multi-GPU training is not tested.
|
||||
3. You can now add your own personal instructions below the import line (e.g., `Always respond in Japanese.`).
|
||||
|
||||
Dec 1, 2024:
|
||||
|
||||
- Pseudo Huber loss is now available for FLUX.1 and SD3.5 training. See PR [#1808](https://github.com/kohya-ss/sd-scripts/pull/1808) for details. Thanks to recris!
|
||||
- Specify `--loss_type huber` or `--loss_type smooth_l1` to use it. `--huber_c` and `--huber_scale` are also available.
|
||||
|
||||
- [Prodigy + ScheduleFree](https://github.com/LoganBooker/prodigy-plus-schedule-free) is supported. See PR [#1811](https://github.com/kohya-ss/sd-scripts/pull/1811) for details. Thanks to rockerBOO!
|
||||
|
||||
Nov 14, 2024:
|
||||
|
||||
- Improved the implementation of block swap and made it available for both FLUX.1 and SD3 LoRA training. See [FLUX.1 LoRA training](#flux1-lora-training) etc. for how to use the new options. Training is possible with about 8-10GB of VRAM.
|
||||
- During fine-tuning, the memory usage when specifying the same number of blocks has increased slightly, but the training speed when specifying block swap has been significantly improved.
|
||||
- There may be bugs due to the significant changes. Feedback is welcome.
|
||||
This approach ensures that you have full control over the instructions given to your agent while benefiting from the shared project context. Your `CLAUDE.md` and `GEMINI.md` are already listed in `.gitignore`, so it won't be committed to the repository.
|
||||
|
||||
## FLUX.1 training
|
||||
|
||||
|
||||
Reference in New Issue
Block a user