Merge pull request #770 from kohya-ss/dev

update doc and minor fix
2026-04-06 21:52:27 +00:00 · 2023-08-20 18:33:50 +09:00
parent 9947197a84 9be19ad777
commit 014c4b47c9
3 changed files with 13 additions and 2 deletions
--- a/docs/train_lllite_README-ja.md
+++ b/docs/train_lllite_README-ja.md
@@ -32,16 +32,21 @@ conditioning_data_dir = "path/to/conditioning/image/dir"

 現時点の制約として、random_cropは使用できません。

+学習データとしては、元のモデルで生成した画像を学習用画像として、そこから加工した画像をconditioning imageとするのが良いようです。元モデルと異なる画風の画像を学習用画像とすると、制御に加えて、その画風についても学ぶ必要が生じます。ControlNet-LLLiteは容量が少ないため、画風学習には不向きです。
+
+もし生成画像以外を学習用画像とする場合には、後述の次元数を多めにしてください。
+
 ### 学習
 スクリプトで生成する場合は、`sdxl_train_control_net_lllite.py` を実行してください。`--cond_emb_dim` でconditioning image embeddingの次元数を指定できます。`--network_dim` でLoRA的モジュールのrankを指定できます。その他のオプションは`sdxl_train_network.py`に準じますが、`--network_module`の指定は不要です。

+学習時にはメモリを大量に使用しますので、キャッシュやgradient checkpointingなどの省メモリ化のオプションを有効にしてください。また`--full_bf16` オプションで、BFloat16を使用するのも有効です（RTX 30シリーズ以降のGPUが必要です）。24GB VRAMで動作確認しています。
+
 conditioning image embeddingの次元数は、サンプルのCannyでは32を指定しています。LoRA的モジュールのrankは同じく64です。対象とするconditioning imageの特徴に合わせて調整してください。

 （サンプルのCannyは恐らくかなり難しいと思われます。depthなどでは半分程度にしてもいいかもしれません。）

 ### 推論

-
 スクリプトで生成する場合は、`sdxl_gen_img.py` を実行してください。`--control_net_lllite_models` でLLLiteのモデルファイルを指定できます。次元数はモデルファイルから自動取得します。

 `--guide_image_path`で推論に用いるconditioning imageを指定してください。なおpreprocessは行われないため、たとえばCannyならCanny処理を行った画像を指定してください（背景黒に白線）。`--control_net_preps`, `--control_net_weights`, `--control_net_ratios` には未対応です。
--- a/docs/train_lllite_README.md
+++ b/docs/train_lllite_README.md
@@ -37,10 +37,16 @@ conditioning_data_dir = "path/to/conditioning/image/dir"

 At the moment, random_crop cannot be used.

+As a training data, it seems to be better to use the images generated by the original model as training images and the images processed from them as conditioning images. If you use images with a different style from the original model as training images, the model will have to learn not only the control but also the style. ControlNet-LLLite is not suitable for style learning because of its small capacity.
+
+If you use images other than the generated images as training images, please increase the dimension as described below.
+
 ### Training

 Run `sdxl_train_control_net_lllite.py`. You can specify the dimension of the conditioning image embedding with `--cond_emb_dim`. You can specify the rank of the LoRA-like module with `--network_dim`. Other options are the same as `sdxl_train_network.py`, but `--network_module` is not required.

+Since a large amount of memory is used during training, please enable memory-saving options such as cache and gradient checkpointing. It is also effective to use BFloat16 with the `--full_bf16` option (requires RTX 30 series or later GPU). It has been confirmed to work with 24GB VRAM.
+
 For the sample Canny, the dimension of the conditioning image embedding is 32. The rank of the LoRA-like module is also 64. Adjust according to the features of the conditioning image you are targeting.

 (The sample Canny is probably quite difficult. It may be better to reduce it to about half for depth, etc.)
--- a/library/train_util.py
+++ b/library/train_util.py
@@ -1976,7 +1976,7 @@ def debug_dataset(train_dataset, show_input_ids=False):
                    if "conditioning_images" in example:
                        cond_img = example["conditioning_images"][j]
                        print(f"conditioning image size: {cond_img.size()}")
-                        cond_img = (cond_img.numpy() * 255.0).astype(np.uint8)
+                        cond_img = ((cond_img.numpy() + 1.0) * 127.5).astype(np.uint8)
                        cond_img = np.transpose(cond_img, (1, 2, 0))
                        cond_img = cond_img[:, :, ::-1]
                        if os.name == "nt":