Commit Graph

2523 Commits

Author SHA1 Message Date
Kohya S.
51435f1718 Merge pull request #2303 from kohya-ss/sd3
fix: improve numerical stability by conditionally using float32 in Anima with fp16 training
2026-04-02 12:40:48 +09:00
Kohya S.
fa53f71ec0 fix: improve numerical stability by conditionally using float32 in Anima (#2302)
* fix: improve numerical stability by conditionally using float32 in block computations

* doc: update README for improvement stability for fp16 training on Anima in version 0.10.3
2026-04-02 12:36:29 +09:00
Kohya S.
1dae34b0af Merge pull request #2298 from kohya-ss/sd3
Merge development changes into main
v0.10.2
2026-03-30 07:53:22 +09:00
Kohya S.
dd7a666727 Merge pull request #2301 from woct0rdho/resize-lora-logging
Print verbose info while resizing LoRA is running
2026-03-30 07:50:15 +09:00
woctordho
b2c330407b Print verbose info while extracting 2026-03-29 21:36:33 +08:00
Kohya S.
c018765583 Merge pull request #2300 from kohya-ss/doc/re-update-README
doc: update change history in README files to include LECO training
2026-03-29 22:10:53 +09:00
Kohya S
3cb9025b4b doc: update change history in README files to include LECO training support for SD/SDXL 2026-03-29 22:07:52 +09:00
Kohya S.
adf4b7b9c0 Merge pull request #2299 from kohya-ss/docs/fix-missing-docs
Improve clarity of README table of contents and change history
2026-03-29 22:02:43 +09:00
Kohya S
b637c31365 fix: update table of contents and change history in README files for clarity 2026-03-29 21:58:38 +09:00
Kohya S.
7cbae516c1 Merge pull request #2297 from kohya-ss/fix-anima-fp16-nan-issue
fix: AdaLN modulation to use float32 for numerical stability in fp16
2026-03-29 21:31:19 +09:00
Kohya S
5fb3172baf fix: AdaLN modulation to use float32 for numerical stability in fp16 2026-03-29 21:25:53 +09:00
Kohya S.
5cdad10de5 Fix/leco cleanup (#2294)
* feat: SD1.x/2.x と SDXL 向けの LECO 学習スクリプトを追加 (#2285)

* Add LECO training script and associated tests

- Implemented `sdxl_train_leco.py` for training with LECO prompts, including argument parsing, model setup, training loop, and weight saving functionality.
- Created unit tests for `load_prompt_settings` in `test_leco_train_util.py` to validate loading of prompt configurations in both original and slider formats.
- Added basic syntax tests for `train_leco.py` and `sdxl_train_leco.py` to ensure modules are importable.

* fix: use getattr for safe attribute access in argument verification

* feat: add CUDA device compatibility validation and corresponding tests

* Revert "feat: add CUDA device compatibility validation and corresponding tests"

This reverts commit 6d3e51431b.

* feat: update predict_noise_xl to use vector embedding from add_time_ids

* feat: implement checkpointing in predict_noise and predict_noise_xl functions

* feat: remove unused submodules and update .gitignore to exclude .codex-tmp

---------

Co-authored-by: Kohya S. <52813779+kohya-ss@users.noreply.github.com>

* fix: format

* fix: LECO PR #2285 のレビュー指摘事項を修正

- train_util.py/deepspeed_utils.py の getattr 化を元に戻し、LECO パーサーにダミー引数を追加
- sdxl_train_util のモジュールレベルインポートをローカルインポートに変更
- PromptEmbedsCache.__getitem__ でキャッシュミス時に KeyError を送出するよう修正
- 設定ファイル形式を YAML から TOML に変更(リポジトリの規約に統一)
- 重複コード (build_network_kwargs, get_save_extension, save_weights) を leco_train_util.py に統合
- _expand_slider_target の冗長な PromptSettings 構築を簡素化
- add_time_ids 用に専用の batch_add_time_ids 関数を追加

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: LECO 学習ガイドを大幅に拡充

コマンドライン引数の全カテゴリ別解説、プロンプト TOML の全フィールド説明、
2つの guidance_scale の違い、推奨設定表、YAML からの変換ガイド等を追加。
英語本文と日本語折り畳みの二言語構成。

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: apply_noise_offset の dtype 不一致を修正

torch.randn のデフォルト float32 により latents が暗黙的にアップキャストされる問題を修正。
float32/CPU で生成後に latents の dtype/device へ変換する安全なパターンを採用。

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Umisetokikaze <52318966+umisetokikaze@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 20:41:43 +09:00
Kohya S.
89b246f3f6 Merge pull request #2296 from kohya-ss/add-svd-lowrank-niter
Add --svd_lowrank_niter option to resize_lora.py
2026-03-29 20:40:55 +09:00
woctordho
4be0e94fad Merge pull request #2194 from woct0rdho/rank1
Fix the 'off by 1' problem in dynamically resized LoRA rank
2026-03-29 20:35:00 +09:00
Kohya S
0e168dd1eb add --svd_lowrank_niter option to resize_lora.py
Allow users to control the number of iterations for torch.svd_lowrank
on large matrices. Default is 2 (matching PR #2240 behavior). Set to 0
to disable svd_lowrank and use full SVD instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 20:33:33 +09:00
Kohya S.
2723a75f91 Merge pull request #2240 from woct0rdho/svd-lowrank
Use `torch.svd_lowrank` for large matrices in `resize_lora.py`
2026-03-29 19:54:44 +09:00
Kohya S.
5f793fb0f4 Log d*lr for ProdigyPlusScheduleFree (#2289) 2026-03-29 18:47:09 +09:00
Kohya S.
feb38356ea Merge pull request #2291 from kohya-ss/fix-anima-validation-with-text-encoder-output-cache
fix: Anima validation dataset not working with Text Encoder output cache
2026-03-22 22:23:27 +09:00
Kohya S
cdb49f9fe7 fix: Anima validation dataset not working with Text Encoder output caching due to caption dropout 2026-03-22 22:19:47 +09:00
Kohya S
bd19e4c15d Merge branch 'main' into sd3 2026-03-22 21:10:51 +09:00
woctordho
343c929e39 Log d*lr for ProdigyPlusScheduleFree 2026-03-21 11:09:56 +08:00
Kohya S.
b2abe873a5 Merge pull request #2283 from kozistr/deps/pytorch-optimizer
Bump `pytorch-optimizer` into 3.10.0
2026-03-19 09:18:06 +09:00
Kohya S.
7c159291e9 docs: add skip_image_resolution to config README (#2288)
* docs: add skip_image_resolution option to config README

Document the skip_image_resolution dataset option added in PR #2273.
Add option description, multi-resolution dataset TOML example, and
command-line argument entry to both Japanese and English config READMEs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: clarify `skip_image_resolution` functionality in dataset config

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 09:17:29 +09:00
woctordho
1cd95b2d8b Add skip_image_resolution to deduplicate multi-resolution dataset (#2273)
* Add min_orig_resolution and max_orig_resolution

* Rename min_orig_resolution to skip_image_resolution; remove max_orig_resolution

* Change skip_image_resolution to tuple

* Move filtering to __init__

* Minor fix
2026-03-19 08:43:39 +09:00
kozistr
1bd0b0faf1 build(deps): bump pytorch-optimizer into 3.10.0 2026-03-02 14:39:48 +09:00
Kohya S
d633b51126 Merge branch 'dev' into sd3 2026-02-26 08:22:30 +09:00
Kohya S.
1a3ec9ea74 Merge pull request #2280 from kohya-ss/fix-main-wd14-tagger-unbound-local-error
fix: rename character_tags to img_character_tags to fix UnboundLocalError
2026-02-26 08:21:42 +09:00
Kohya S
e1aedceffa fix: rename character_tags to img_character_tags to fix unboundlocalerror 2026-02-26 08:18:45 +09:00
Kohya S.
2217704ce1 feat: Support LoKr/LoHa for SDXL and Anima (#2275)
* feat: Add LoHa/LoKr network support for SDXL and Anima

- networks/network_base.py: shared AdditionalNetwork base class with architecture auto-detection (SDXL/Anima) and generic module injection
- networks/loha.py: LoHa (Low-rank Hadamard Product) module with HadaWeight custom autograd, training/inference classes, and factory functions
- networks/lokr.py: LoKr (Low-rank Kronecker Product) module with factorization, training/inference classes, and factory functions
- library/lora_utils.py: extend weight merge hook to detect and merge LoHa/LoKr weights alongside standard LoRA

Linear and Conv2d 1x1 layers only; Conv2d 3x3 (Tucker decomposition) support will be added separately.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Enhance LoHa and LoKr modules with Tucker decomposition support

- Added Tucker decomposition functionality to LoHa and LoKr modules.
- Implemented new methods for weight rebuilding using Tucker decomposition.
- Updated initialization and weight handling for Conv2d 3x3+ layers.
- Modified get_diff_weight methods to accommodate Tucker and non-Tucker modes.
- Enhanced network base to include unet_conv_target_modules for architecture detection.

* fix: rank dropout handling in LoRAModule for Conv2d and Linear layers, see #2272 for details

* doc: add dtype comment for load_safetensors_with_lora_and_fp8 function

* fix: enhance architecture detection to support InferSdxlUNet2DConditionModel for gen_img.py

* doc: update model support structure to include Lumina Image 2.0, HunyuanImage-2.1, and Anima-Preview

* doc: add documentation for LoHa and LoKr fine-tuning methods

* Update networks/network_base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/loha_lokr.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix: refactor LoHa and LoKr imports for weight merging in load_safetensors_with_lora_and_fp8 function

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-23 22:09:00 +09:00
Kohya S.
f90fa1a89a feat: backward compatibility for SD/SDXL latent cache (#2276)
* fix: improve handling of legacy npz files and add logging for fallback scenarios

* fix: simplify fallback handling in SdSdxlLatentsCachingStrategy
2026-02-23 21:44:51 +09:00
Kohya S.
98a42e4cd6 Merge pull request #2277 from kohya-ss/feat-stability-with-fp16-for-anima
feat: Stability with fp16 for anima
2026-02-23 21:15:49 +09:00
Kohya S
892f8be78f fix: cast input tensor to float32 for improved numerical stability in residual connections 2026-02-23 21:12:57 +09:00
woctordho
50694df3cf Multi-resolution dataset for SD1/SDXL (#2269)
* Multi-resolution dataset for SD1/SDXL

* Add fallback to legacy key without resolution suffix

* Support numpy 2.2
2026-02-23 15:30:36 +09:00
duongve13112002
609d1292f6 Fix the LoRA dropout issue in the Anima model during LoRA training. (#2272)
* Support network_reg_alphas and fix bug when setting rank_dropout in training lora for anima model

* Update anima_train_network.md

* Update anima_train_network.md

* Remove network_reg_alphas

* Update document
2026-02-23 15:13:40 +09:00
Kohya S.
48d368fa55 Merge pull request #2268 from kohya-ss/sd3
merge sd3 to main
2026-02-16 08:07:29 +09:00
Kohya S.
3265f2edfb Merge pull request #2267 from kohya-ss/fix-github-actions-error
fix: `str is not "no"` to `str != "no"`
2026-02-16 08:01:20 +09:00
Kohya S
ef051427df fix: str is not "no" to str != "no" 2026-02-16 07:58:15 +09:00
Kohya S.
573a7fa06c Merge pull request #2262 from duongve13112002/fix_lumina
Fix bug and optimization for Lumina model
2026-02-16 07:54:49 +09:00
Kohya S.
ae72efb92b Merge pull request #2264 from kohya-ss/release/v0.10.1
Release 0.10.1
v0.10.1
2026-02-13 08:34:42 +09:00
Kohya S
449e70b4cf README: Update change history for version 0.10.1 with Anima model support 2026-02-13 08:31:22 +09:00
Kohya S.
b237b8deb3 Merge pull request #2263 from kohya-ss/sd3
feat: Anima support
2026-02-13 08:20:44 +09:00
Kohya S.
34e7138b6a Add/modify some implementation for anima (#2261)
* fix: update extend-exclude list in _typos.toml to include configs

* fix: exclude anima tests from pytest

* feat: add entry for 'temperal' in extend-words section of _typos.toml for Qwen-Image VAE

* fix: update default value for --discrete_flow_shift in anima training guide

* feat: add Qwen-Image VAE

* feat: simplify encode_tokens

* feat: use unified attention module, add wrapper for state dict compatibility

* feat: loading with dynamic fp8 optimization and LoRA support

* feat: add anima minimal inference script (WIP)

* format: format

* feat: simplify target module selection by regular expression patterns

* feat: kept caption dropout rate in cache and handle in training script

* feat: update train_llm_adapter and verbose default values to string type

* fix: use strategy instead of using tokenizers directly

* feat: add dtype property and all-zero mask handling in cross-attention in LLMAdapterTransformerBlock

* feat: support 5d tensor in get_noisy_model_input_and_timesteps

* feat: update loss calculation to support 5d tensor

* fix: update argument names in anima_train_utils to align with other archtectures

* feat: simplify Anima training script and update empty caption handling

* feat: support LoRA format without `net.` prefix

* fix: update to work fp8_scaled option

* feat: add regex-based learning rates and dimensions handling in create_network

* fix: improve regex matching for module selection and learning rates in LoRANetwork

* fix: update logging message for regex match in LoRANetwork

* fix: keep latents 4D except DiT call

* feat: enhance block swap functionality for inference and training in Anima model

* feat: refactor Anima training script

* feat: optimize VAE processing by adjusting tensor dimensions and data types

* fix: wait all block trasfer before siwtching offloader mode

* feat: update Anima training guide with new argument specifications and regex-based module selection. Thank you Claude!

* feat: support LORA for Qwen3

* feat: update Anima SAI model spec metadata handling

* fix: remove unused code

* feat: split CFG processing in do_sample function to reduce memory usage

* feat: add VAE chunking and caching options to reduce memory usage

* feat: optimize RMSNorm forward method and remove unused torch_attention_op

* Update library/strategy_anima.py

Use torch.all instead of all.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update library/safetensors_utils.py

Fix duplicated new_key for concat_hook.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update anima_minimal_inference.py

Remove unused code.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update anima_train.py

Remove unused import.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update library/anima_train_utils.py

Remove unused import.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix: review with Copilot

* feat: add script to convert LoRA format to ComfyUI compatible format (WIP, not tested yet)

* feat: add process_escape function to handle escape sequences in prompts

* feat: enhance LoRA weight handling in model loading and add text encoder loading function

* feat: improve ComfyUI conversion script with prefix constants and module name adjustments

* feat: update caption dropout documentation to clarify cache regeneration requirement

* feat: add clarification on learning rate adjustments

* feat: add note on PyTorch version requirement to prevent NaN loss

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-13 08:15:06 +09:00
Kohya S
9144463f7b Merge branch 'dev' into sd3 2026-02-13 08:14:21 +09:00
Duoong
1640e53392 Fix bug and optimization Lumina training 2026-02-12 22:52:28 +07:00
duongve13112002
e21a7736f8 Support Anima model (#2260)
* Support Anima model

* Update document and fix bug

* Fix latent normlization

* Fix typo

* Fix cache embedding

* fix typo in tests/test_anima_cache.py

* Remove redundant argument apply_t5_attn_mask

* Improving caching with argument caption_dropout_rate

* Fix W&B logging bugs

* Fix discrete_flow_shift default value
2026-02-08 10:18:55 +09:00
Kohya S.
8b5ce3e641 Merge pull request #2255 from cgcalatrava/fix-diffusers-unet-import
Fix AttributeError for UNet2DConditionModel with newer diffusers versions
2026-01-20 07:50:04 +09:00
cgcalatrava
da07e4c617 Make UNet2DConditionModel import compatible with old and new diffusers versions 2026-01-19 20:53:00 +01:00
Kohya S.
966e9d7f6b Merge pull request #2254 from kohya-ss/dev
Merge the changes from the sd3 branch into main
v0.10.0
2026-01-19 22:00:25 +09:00
Kohya S.
2a2760e702 Merge pull request #1374 from kohya-ss/sd3
support SD3
2026-01-19 21:50:22 +09:00
Kohya S.
b996440c5f Doc update sd3 branch documentation (#2253)
* doc: move sample prompt file documentation, and remove history for branch

* doc: remove outdated FLUX.1 and SD3 training information from README

* doc: update README and training documentation for clarity and structure
2026-01-19 21:38:46 +09:00