From ceb19bebf849df1d9d6d5928eba777c95bfda8c4 Mon Sep 17 00:00:00 2001
From: Kohya S <ykumeykume@gmail.com>
Date: Sun, 13 Apr 2025 22:06:58 +0900
Subject: [PATCH] update docs. sdxl is transltaed, flux.1 is corrected

---
 docs/flux_train_network.md | 224 ++++++++++++++++++++++++++++++++++---
 docs/sd3_train_network.md  |   8 +-
 docs/sdxl_train_network.md | 195 +++++++++++++++++++++++++++++---
 3 files changed, 388 insertions(+), 39 deletions(-)
diff --git a/docs/flux_train_network.md b/docs/flux_train_network.md
index d28d5877..46eee3e7 100644
--- a/docs/flux_train_network.md
+++ b/docs/flux_train_network.md
@@ -6,7 +6,7 @@
 
 `flux_train_network.py`は、FLUX.1モデルに対してLoRAなどの追加ネットワークを学習させるためのスクリプトです。FLUX.1はStable Diffusionとは異なるアーキテクチャを持つ画像生成モデルであり、このスクリプトを使用することで、特定のキャラクターや画風を再現するLoRAモデルを作成できます。
 
-このガイドは、基本的なLoRA学習の手順を理解しているユーザーを対象とし、`train_network.py`での学習経験があることを前提としています。基本的な使い方や共通のオプションについては、[`train_network.py`のガイド](how_to_use_train_network.md)を参照してください。
+このガイドは、基本的なLoRA学習の手順を理解しているユーザーを対象としています。基本的な使い方や共通のオプションについては、[`train_network.py`のガイド](train_network.md)を参照してください。また一部のパラメータは [`sdxl_train_network.py`](sdxl_train_network.md) と同様のものがあるため、そちらも参考にしてください。
 
 **前提条件:**
 
@@ -30,9 +30,9 @@
 1.  **学習スクリプト:** `flux_train_network.py`
 2.  **FLUX.1モデルファイル:** 学習のベースとなるFLUX.1モデルの`.safetensors`ファイル（例: `flux1-dev.safetensors`）。
 3.  **Text Encoderモデルファイル:**
-    *   CLIP-Lモデルの`.safetensors`ファイル。
-    *   T5-XXLモデルの`.safetensors`ファイル。
-4.  **AutoEncoderモデルファイル:** FLUX.1に対応するAEモデルの`.safetensors`ファイル。
+    *   CLIP-Lモデルの`.safetensors`ファイル。例として`clip_l.safetensors`を使用します。
+    *   T5-XXLモデルの`.safetensors`ファイル。例として`t5xxl.safetensors`を使用します。
+4.  **AutoEncoderモデルファイル:** FLUX.1に対応するAEモデルの`.safetensors`ファイル。例として`ae.safetensors`を使用します。
 5.  **データセット定義ファイル (.toml):** 学習データセットの設定を記述したTOML形式のファイル。（詳細は[データセット設定ガイド](link/to/dataset/config/doc)を参照してください）。
 
     *   例として`my_flux_dataset_config.toml`を使用します。
@@ -53,7 +53,7 @@ accelerate launch --num_cpu_threads_per_process 1 flux_train_network.py
  --output_dir="<output directory for training results>" 
  --output_name="my_flux_lora" 
  --save_model_as=safetensors 
- --network_module=networks.lora 
+ --network_module=networks.lora_flux 
  --network_dim=16 
  --network_alpha=1 
  --learning_rate=1e-4 
@@ -64,15 +64,18 @@ accelerate launch --num_cpu_threads_per_process 1 flux_train_network.py
  --save_every_n_epochs=1 
  --mixed_precision="fp16" 
  --gradient_checkpointing 
- --apply_t5_attn_mask 
+ --guidance_scale=1.0 
+ --timestep_sampling="flux_shift" 
  --blocks_to_swap=18
+ --cache_text_encoder_outputs 
+ --cache_latents
 ```
 
 ※実際には1行で書くか、適切な改行文字（`\` または `^`）を使用してください。
 
 ### 4.1. 主要なコマンドライン引数の解説（`train_network.py`からの追加・変更点）
 
-[`train_network.py`のガイド](how_to_use_train_network.md)で説明されている引数に加え、以下のFLUX.1特有の引数を指定します。共通の引数（`--output_dir`, `--output_name`, `--network_module`, `--network_dim`, `--network_alpha`, `--learning_rate`など）については、上記ガイドを参照してください。
+[`train_network.py`のガイド](train_network.md)で説明されている引数に加え、以下のFLUX.1特有の引数を指定します。共通の引数（`--output_dir`, `--output_name`, `--network_module`, `--network_dim`, `--network_alpha`, `--learning_rate`など）については、上記ガイドを参照してください。
 
 #### モデル関連 [必須]
 
@@ -87,26 +90,26 @@ accelerate launch --num_cpu_threads_per_process 1 flux_train_network.py
 
 #### FLUX.1 学習パラメータ
 
-*   `--t5xxl_max_token_length=<integer>`
-    *   T5-XXL Text Encoderで使用するトークンの最大長を指定します。省略した場合、モデルがschnell版なら256、dev版なら512が自動的に設定されます。データセットのキャプション長に合わせて調整が必要な場合があります。
-*   `--apply_t5_attn_mask`
-    *   T5-XXLの出力とFLUXモデル内部（Double Block）のアテンション計算時に、パディングトークンに対応するアテンションマスクを適用します。精度向上が期待できる場合がありますが、わずかに計算コストが増加します。
 *   `--guidance_scale=<float>`
-    *   FLUX.1 dev版は特定のガイダンススケール値で蒸留されているため、学習時にもその値を指定します。デフォルトは`3.5`です。schnell版では通常無視されます。
+    *   FLUX.1 dev版は特定のガイダンススケール値で蒸留されていますが、学習時には `1.0` を指定してガイダンススケールを無効化します。デフォルトは`3.5`ですので、必ず指定してください。schnell版では通常無視されます。
 *   `--timestep_sampling=<choice>`
-    *   学習時に使用するタイムステップ（ノイズレベル）のサンプリング方法を指定します。`sigma`, `uniform`, `sigmoid`, `shift`, `flux_shift` から選択します。デフォルトは `sigma` です。
+    *   学習時に使用するタイムステップ（ノイズレベル）のサンプリング方法を指定します。`sigma`, `uniform`, `sigmoid`, `shift`, `flux_shift` から選択します。デフォルトは `sigma` です。推奨は `flux_shift` です。
 *   `--sigmoid_scale=<float>`
-    *   `timestep_sampling` に `sigmoid` または `shift`, `flux_shift` を指定した場合のスケール係数です。デフォルトは`1.0`です。
+    *   `timestep_sampling` に `sigmoid` または `shift`, `flux_shift` を指定した場合のスケール係数です。デフォルトおよび推奨値は`1.0`です。
 *   `--model_prediction_type=<choice>`
-    *   モデルが何を予測するかを指定します。`raw` (予測値をそのまま使用), `additive` (ノイズ入力に加算), `sigma_scaled` (シグマスケーリングを適用) から選択します。デフォルトは `sigma_scaled` です。
+    *   モデルが何を予測するかを指定します。`raw` (予測値をそのまま使用), `additive` (ノイズ入力に加算), `sigma_scaled` (シグマスケーリングを適用) から選択します。デフォルトは `sigma_scaled` です。推奨は `raw` です。
 *   `--discrete_flow_shift=<float>`
-    *   Flow Matchingで使用されるスケジューラのシフト値を指定します。デフォルトは`3.0`です。
+    *   Flow Matchingで使用されるスケジューラのシフト値を指定します。デフォルトは`3.0`です。`timestep_sampling`に`flux_shift`を指定した場合は、この値は無視されます。
 
 #### メモリ・速度関連
 
 *   `--blocks_to_swap=<integer>` **[実験的機能]**
     *   VRAM使用量を削減するために、モデルの一部（Transformerブロック）をCPUとGPU間でスワップする設定です。スワップするブロック数を整数で指定します（例: `18`）。値を大きくするとVRAM使用量は減りますが、学習速度は低下します。GPUのVRAM容量に応じて調整してください。`gradient_checkpointing`と併用可能です。
     *   `--cpu_offload_checkpointing`とは併用できません。
+* `--cache_text_encoder_outputs`
+    *   CLIP-LおよびT5-XXLの出力をキャッシュします。これにより、メモリ使用量が削減されます。
+* `--cache_latents`, `--cache_latents_to_disk`
+    *   AEの出力をキャッシュします。[sdxl_train_network.py](sdxl_train_network.md)と同様の機能です。
 
 #### 非互換・非推奨の引数
 
@@ -116,7 +119,7 @@ accelerate launch --num_cpu_threads_per_process 1 flux_train_network.py
 
 ### 4.2. 学習の開始
 
-必要な引数を設定し、コマンドを実行すると学習が開始されます。基本的な流れやログの確認方法は[`train_network.py`のガイド](how_to_use_train_network.md#32-starting-the-training--学習の開始)と同様です。
+必要な引数を設定し、コマンドを実行すると学習が開始されます。基本的な流れやログの確認方法は[`train_network.py`のガイド](train_network.md#32-starting-the-training--学習の開始)と同様です。
 
 ## 5. 学習済みモデルの利用
 
@@ -124,4 +127,189 @@ accelerate launch --num_cpu_threads_per_process 1 flux_train_network.py
 
 ## 6. その他
 
-`flux_train_network.py`には、サンプル画像の生成 (`--sample_prompts`など) や詳細なオプティマイザ設定など、`train_network.py`と共通の機能も多く存在します。これらについては、[`train_network.py`のガイド](how_to_use_train_network.md#5-other-features--その他の機能)やスクリプトのヘルプ (`python flux_train_network.py --help`) を参照してください。
+`flux_train_network.py`には、サンプル画像の生成 (`--sample_prompts`など) や詳細なオプティマイザ設定など、`train_network.py`と共通の機能も多く存在します。これらについては、[`train_network.py`のガイド](train_network.md#5-other-features--その他の機能)やスクリプトのヘルプ (`python flux_train_network.py --help`) を参照してください。
+
+# FLUX.1 LoRA学習の補足説明
+
+以下は、以上の基本的なFLUX.1 LoRAの学習手順を補足するものです。より詳細な設定オプションなどについて説明します。
+
+## 1. VRAM使用量の最適化
+
+FLUX.1モデルは比較的大きなモデルであるため、十分なVRAMを持たないGPUでは工夫が必要です。以下に、VRAM使用量を削減するための設定を紹介します。
+
+### 1.1 メモリ使用量別の推奨設定
+
+| GPUメモリ | 推奨設定 |
+|----------|----------|
+| 24GB VRAM | 基本設定で問題なく動作します（バッチサイズ2） |
+| 16GB VRAM | バッチサイズ1に設定し、`--blocks_to_swap`を使用 |
+| 12GB VRAM | `--blocks_to_swap 16`と8bit AdamWを使用 |
+| 10GB VRAM | `--blocks_to_swap 22`を使用、T5XXLはfp8形式を推奨 |
+| 8GB VRAM | `--blocks_to_swap 28`を使用、T5XXLはfp8形式を推奨 |
+
+### 1.2 主要なVRAM削減オプション
+
+- **`--blocks_to_swap <数値>`**：
+  CPUとGPU間でブロックをスワップしてVRAM使用量を削減します。数値が大きいほど多くのブロックをスワップし、より多くのVRAMを節約できますが、学習速度は低下します。FLUX.1では最大35ブロックまでスワップ可能です。
+
+- **`--cpu_offload_checkpointing`**：
+  勾配チェックポイントをCPUにオフロードします。これにより最大1GBのVRAM使用量を削減できますが、学習速度は約15%低下します。`--blocks_to_swap`とは併用できません。
+
+- **`--cache_text_encoder_outputs` / `--cache_text_encoder_outputs_to_disk`**：
+  CLIP-LとT5-XXLの出力をキャッシュします。これによりメモリ使用量を削減できます。
+
+- **`--cache_latents` / `--cache_latents_to_disk`**：
+  AEの出力をキャッシュします。メモリ使用量を削減できます。
+
+- **Adafactor オプティマイザの使用**：
+  8bit AdamWよりもVRAM使用量を削減できます。以下の設定を使用してください：
+  ```
+  --optimizer_type adafactor --optimizer_args "relative_step=False" "scale_parameter=False" "warmup_init=False" --lr_scheduler constant_with_warmup --max_grad_norm 0.0
+  ```
+
+- **T5XXLのfp8形式の使用**：
+  10GB未満のVRAMを持つGPUでは、T5XXLのfp8形式チェックポイントの使用を推奨します。[comfyanonymous/flux_text_encoders](https://huggingface.co/comfyanonymous/flux_text_encoders)から`t5xxl_fp8_e4m3fn.safetensors`をダウンロードできます（`scaled`なしで使用してください）。
+
+## 2. FLUX.1 LoRA学習の重要な設定オプション
+
+FLUX.1の学習には多くの未知の点があり、いくつかの設定は引数で指定できます。以下に重要な引数とその説明を示します。
+
+### 2.1 タイムステップのサンプリング方法
+
+`--timestep_sampling`オプションで、タイムステップ（0-1）のサンプリング方法を指定できます：
+
+- `sigma`：SD3と同様のシグマベース
+- `uniform`：一様ランダム
+- `sigmoid`：正規分布乱数のシグモイド（x-flux、AI-toolkitなどと同様）
+- `shift`：正規分布乱数のシグモイド値をシフト
+- `flux_shift`：解像度に応じて正規分布乱数のシグモイド値をシフト（FLUX.1 dev推論と同様）。この設定では`--discrete_flow_shift`は無視されます。
+
+### 2.2 モデル予測の処理方法
+
+`--model_prediction_type`オプションで、モデルの予測をどのように解釈し処理するかを指定できます：
+
+- `raw`：そのまま使用（x-fluxと同様）【推奨】
+- `additive`：ノイズ入力に加算
+- `sigma_scaled`：シグマスケーリングを適用（SD3と同様）
+
+### 2.3 推奨設定
+
+実験の結果、以下の設定が良好に動作することが確認されています：
+```
+--timestep_sampling shift --discrete_flow_shift 3.1582 --model_prediction_type raw --guidance_scale 1.0
+```
+
+ガイダンススケールについて：FLUX.1 dev版は特定のガイダンススケール値で蒸留されていますが、学習時には`--guidance_scale 1.0`を指定してガイダンススケールを無効化することを推奨します。
+
+## 3. 各層に対するランク指定
+
+FLUX.1の各層に対して異なるランク（network_dim）を指定できます。これにより、特定の層に対してLoRAの効果を強調したり、無効化したりできます。
+
+以下のnetwork_argsを指定することで、各層のランクを指定できます。0を指定するとその層にはLoRAが適用されません。
+
+| network_args | 対象レイヤー |
+|--------------|--------------|
+| img_attn_dim | DoubleStreamBlockのimg_attn |
+| txt_attn_dim | DoubleStreamBlockのtxt_attn |
+| img_mlp_dim | DoubleStreamBlockのimg_mlp |
+| txt_mlp_dim | DoubleStreamBlockのtxt_mlp |
+| img_mod_dim | DoubleStreamBlockのimg_mod |
+| txt_mod_dim | DoubleStreamBlockのtxt_mod |
+| single_dim | SingleStreamBlockのlinear1とlinear2 |
+| single_mod_dim | SingleStreamBlockのmodulation |
+
+使用例：
+```
+--network_args "img_attn_dim=4" "img_mlp_dim=8" "txt_attn_dim=2" "txt_mlp_dim=2" "img_mod_dim=2" "txt_mod_dim=2" "single_dim=4" "single_mod_dim=2"
+```
+
+さらに、FLUXの条件付けレイヤーにLoRAを適用するには、network_argsに`in_dims`を指定します。5つの数値をカンマ区切りのリストとして指定する必要があります。
+
+例：
+```
+--network_args "in_dims=[4,2,2,2,4]"
+```
+
+各数値は、`img_in`、`time_in`、`vector_in`、`guidance_in`、`txt_in`に対応します。上記の例では、すべての条件付けレイヤーにLoRAを適用し、`img_in`と`txt_in`のランクを4、その他のランクを2に設定しています。
+
+0を指定するとそのレイヤーにはLoRAが適用されません。例えば、`[4,0,0,0,4]`は`img_in`と`txt_in`にのみLoRAを適用します。
+
+## 4. 学習するブロックの指定
+
+FLUX.1 LoRA学習では、network_argsの`train_double_block_indices`と`train_single_block_indices`を指定することで、学習するブロックを指定できます。インデックスは0ベースです。省略した場合のデフォルトはすべてのブロックを学習することです。
+
+インデックスは、`0,1,5,8`のような整数のリストや、`0,1,4-5,7`のような整数の範囲として指定します。
+- double blocksの数は19なので、有効な範囲は0-18です
+- single blocksの数は38なので、有効な範囲は0-37です
+- `all`を指定するとすべてのブロックを学習します
+- `none`を指定するとブロックを学習しません
+
+使用例：
+```
+--network_args "train_double_block_indices=0,1,8-12,18" "train_single_block_indices=3,10,20-25,37"
+```
+
+または：
+```
+--network_args "train_double_block_indices=none" "train_single_block_indices=10-15"
+```
+
+`train_double_block_indices`または`train_single_block_indices`のどちらか一方だけを指定した場合、もう一方は通常通り学習されます。
+
+## 5. Text Encoder LoRAのサポート
+
+FLUX.1 LoRA学習は、CLIP-LとT5XXL LoRAのトレーニングもサポートしています。
+
+- FLUX.1のみをトレーニングする場合は、`--network_train_unet_only`を指定します
+- FLUX.1とCLIP-Lをトレーニングする場合は、`--network_train_unet_only`を省略します
+- FLUX.1、CLIP-L、T5XXLすべてをトレーニングする場合は、`--network_train_unet_only`を省略し、`--network_args "train_t5xxl=True"`を追加します
+
+CLIP-LとT5XXLの学習率は、`--text_encoder_lr`で個別に指定できます。例えば、`--text_encoder_lr 1e-4 1e-5`とすると、最初の値はCLIP-Lの学習率、2番目の値はT5XXLの学習率になります。1つだけ指定すると、CLIP-LとT5XXLの学習率は同じになります。`--text_encoder_lr`を指定しない場合、デフォルトの学習率`--learning_rate`が両方に使用されます。
+
+## 6. マルチ解像度トレーニング
+
+データセット設定ファイルで複数の解像度を定義できます。各解像度に対して異なるバッチサイズを指定することができます。
+
+設定ファイルの例：
+```toml
+[general]
+# 共通設定をここで定義
+flip_aug = true
+color_aug = false
+keep_tokens_separator= "|||"
+shuffle_caption = false
+caption_tag_dropout_rate = 0
+caption_extension = ".txt"
+
+[[datasets]]
+# 最初の解像度の設定
+batch_size = 2
+enable_bucket = true
+resolution = [1024, 1024]
+
+  [[datasets.subsets]]
+  image_dir = "画像ディレクトリへのパス"
+  num_repeats = 1
+
+[[datasets]]
+# 2番目の解像度の設定
+batch_size = 3
+enable_bucket = true
+resolution = [768, 768]
+
+  [[datasets.subsets]]
+  image_dir = "画像ディレクトリへのパス"
+  num_repeats = 1
+
+[[datasets]]
+# 3番目の解像度の設定
+batch_size = 4
+enable_bucket = true
+resolution = [512, 512]
+
+  [[datasets.subsets]]
+  image_dir = "画像ディレクトリへのパス"
+  num_repeats = 1
+```
+
+各解像度セクションの`[[datasets.subsets]]`部分は、データセットディレクトリを定義します。各解像度に対して同じディレクトリを指定してください。
\ No newline at end of file
diff --git a/docs/sd3_train_network.md b/docs/sd3_train_network.md
index d5cc5a75..a5b7a82f 100644
--- a/docs/sd3_train_network.md
+++ b/docs/sd3_train_network.md
@@ -6,7 +6,7 @@
 
 `sd3_train_network.py`は、Stable Diffusion 3/3.5モデルに対してLoRAなどの追加ネットワークを学習させるためのスクリプトです。SD3は、MMDiT (Multi-Modal Diffusion Transformer) と呼ばれる新しいアーキテクチャを採用しており、従来のStable Diffusionモデルとは構造が異なります。このスクリプトを使用することで、SD3/3.5モデルに特化したLoRAモデルを作成できます。
 
-このガイドは、基本的なLoRA学習の手順を理解しているユーザーを対象とし、`train_network.py`での学習経験があることを前提としています。基本的な使い方や共通のオプションについては、[`train_network.py`のガイド](how_to_use_train_network.md)を参照してください。
+このガイドは、基本的なLoRA学習の手順を理解しているユーザーを対象とし、`train_network.py`での学習経験があることを前提としています。基本的な使い方や共通のオプションについては、[`train_network.py`のガイド](train_network.md)を参照してください。
 
 **前提条件:**
 
@@ -68,7 +68,7 @@ accelerate launch --num_cpu_threads_per_process 1 sd3_train_network.py
 
 ### 4.1. 主要なコマンドライン引数の解説（`train_network.py`からの追加・変更点）
 
-[`train_network.py`のガイド](how_to_use_train_network.md)で説明されている引数に加え、以下のSD3/3.5特有の引数を指定します。共通の引数（`--output_dir`, `--output_name`, `--network_module`, `--network_dim`, `--network_alpha`, `--learning_rate`など）については、上記ガイドを参照してください。
+[`train_network.py`のガイド](train_network.md)で説明されている引数に加え、以下のSD3/3.5特有の引数を指定します。共通の引数（`--output_dir`, `--output_name`, `--network_module`, `--network_dim`, `--network_alpha`, `--learning_rate`など）については、上記ガイドを参照してください。
 
 #### モデル関連
 
@@ -111,7 +111,7 @@ accelerate launch --num_cpu_threads_per_process 1 sd3_train_network.py
 
 ### 4.2. 学習の開始
 
-必要な引数を設定し、コマンドを実行すると学習が開始されます。基本的な流れやログの確認方法は[`train_network.py`のガイド](how_to_use_train_network.md#32-starting-the-training--学習の開始)と同様です。
+必要な引数を設定し、コマンドを実行すると学習が開始されます。基本的な流れやログの確認方法は[`train_network.py`のガイド](train_network.md#32-starting-the-training--学習の開始)と同様です。
 
 ## 5. 学習済みモデルの利用
 
@@ -119,4 +119,4 @@ accelerate launch --num_cpu_threads_per_process 1 sd3_train_network.py
 
 ## 6. その他
 
-`sd3_train_network.py`には、サンプル画像の生成 (`--sample_prompts`など) や詳細なオプティマイザ設定など、`train_network.py`と共通の機能も多く存在します。これらについては、[`train_network.py`のガイド](how_to_use_train_network.md#5-other-features--その他の機能)やスクリプトのヘルプ (`python sd3_train_network.py --help`) を参照してください。
+`sd3_train_network.py`には、サンプル画像の生成 (`--sample_prompts`など) や詳細なオプティマイザ設定など、`train_network.py`と共通の機能も多く存在します。これらについては、[`train_network.py`のガイド](train_network.md#5-other-features--その他の機能)やスクリプトのヘルプ (`python sd3_train_network.py --help`) を参照してください。
diff --git a/docs/sdxl_train_network.md b/docs/sdxl_train_network.md
index 8a19f7ae..e1f6e9b9 100644
--- a/docs/sdxl_train_network.md
+++ b/docs/sdxl_train_network.md
@@ -1,14 +1,27 @@
-はい、承知いたしました。`sd-scripts` リポジトリに含まれる `sdxl_train_network.py` を使用した SDXL LoRA 学習に関するドキュメントを作成します。`how_to_use_train_network.md` との差分を中心に、初心者ユーザー向けに解説します。
+# How to Use the SDXL LoRA Training Script `sdxl_train_network.py` / SDXL LoRA学習スクリプト `sdxl_train_network.py` の使い方
 
----
-
-# SDXL LoRA学習スクリプト `sdxl_train_network.py` の使い方
+This document explains the basic procedure for training a LoRA (Low-Rank Adaptation) model for SDXL (Stable Diffusion XL) using `sdxl_train_network.py` included in the `sd-scripts` repository.
 
+<details>
+<summary>日本語</summary>
 このドキュメントでは、`sd-scripts` リポジトリに含まれる `sdxl_train_network.py` を使用して、SDXL (Stable Diffusion XL) モデルに対する LoRA (Low-Rank Adaptation) モデルを学習する基本的な手順について解説します。
+</details>
 
-## 1. はじめに
+## 1. Introduction / はじめに
 
-`sdxl_train_network.py` は、SDXL モデルに対して LoRA などの追加ネットワークを学習させるためのスクリプトです。基本的な使い方は `train_network.py` ([LoRA学習スクリプト `train_network.py` の使い方](how_to_use_train_network.md) 参照) と共通ですが、SDXL モデル特有の設定が必要となります。
+`sdxl_train_network.py` is a script for training additional networks such as LoRA for SDXL models. The basic usage is common with `train_network.py` (see [How to Use the LoRA Training Script `train_network.py`](train_network.md)), but SDXL model-specific settings are required.
+
+This guide focuses on SDXL LoRA training, explaining the main differences from `train_network.py` and SDXL-specific configuration items.
+
+**Prerequisites:**
+
+* You have cloned the `sd-scripts` repository and set up the Python environment.
+* Your training dataset is ready. (Please refer to the [Dataset Preparation Guide](link/to/dataset/doc) for dataset preparation)
+* You have read [How to Use the LoRA Training Script `train_network.py`](train_network.md).
+
+<details>
+<summary>日本語</summary>
+`sdxl_train_network.py` は、SDXL モデルに対して LoRA などの追加ネットワークを学習させるためのスクリプトです。基本的な使い方は `train_network.py` ([LoRA学習スクリプト `train_network.py` の使い方](train_network.md) 参照) と共通ですが、SDXL モデル特有の設定が必要となります。
 
 このガイドでは、SDXL LoRA 学習に焦点を当て、`train_network.py` との主な違いや SDXL 特有の設定項目を中心に説明します。
 
@@ -16,10 +29,26 @@
 
 *   `sd-scripts` リポジトリのクローンと Python 環境のセットアップが完了していること。
 *   学習用データセットの準備が完了していること。（データセットの準備については[データセット準備ガイド](link/to/dataset/doc)を参照してください）
-*   [LoRA学習スクリプト `train_network.py` の使い方](how_to_use_train_network.md) を一読していること。
+*   [LoRA学習スクリプト `train_network.py` の使い方](train_network.md) を一読していること。
+</details>
 
-## 2. 準備
+## 2. Preparation / 準備
 
+Before starting training, you need the following files:
+
+1. **Training Script:** `sdxl_train_network.py`
+2. **Dataset Definition File (.toml):** A TOML format file describing the training dataset configuration.
+
+### About the Dataset Definition File
+
+The basic format of the dataset definition file (`.toml`) is the same as for `train_network.py`. Please refer to the [Dataset Configuration Guide](link/to/dataset/config/doc) and [How to Use the LoRA Training Script `train_network.py`](train_network.md#about-the-dataset-definition-file).
+
+For SDXL, it is common to use high-resolution datasets and the aspect ratio bucketing feature (`enable_bucket = true`).
+
+In this example, we'll use a file named `my_sdxl_dataset_config.toml`.
+
+<details>
+<summary>日本語</summary>
 学習を開始する前に、以下のファイルが必要です。
 
 1.  **学習スクリプト:** `sdxl_train_network.py`
@@ -27,14 +56,55 @@
 
 ### データセット定義ファイルについて
 
-データセット定義ファイル (`.toml`) の基本的な書き方は `train_network.py` と共通です。[データセット設定ガイド](link/to/dataset/config/doc) および [LoRA学習スクリプト `train_network.py` の使い方](how_to_use_train_network.md#データセット定義ファイルについて) を参照してください。
+データセット定義ファイル (`.toml`) の基本的な書き方は `train_network.py` と共通です。[データセット設定ガイド](link/to/dataset/config/doc) および [LoRA学習スクリプト `train_network.py` の使い方](train_network.md#データセット定義ファイルについて) を参照してください。
 
 SDXL では、高解像度のデータセットや、アスペクト比バケツ機能 (`enable_bucket = true`) の利用が一般的です。
 
 ここでは、例として `my_sdxl_dataset_config.toml` という名前のファイルを使用することにします。
+</details>
 
-## 3. 学習の実行
+## 3. Running the Training / 学習の実行
 
+Training starts by running `sdxl_train_network.py` from the terminal.
+
+Here's a basic command line execution example for SDXL LoRA training:
+
+```bash
+accelerate launch --num_cpu_threads_per_process 1 sdxl_train_network.py 
+ --pretrained_model_name_or_path="<SDXL base model path>" 
+ --dataset_config="my_sdxl_dataset_config.toml" 
+ --output_dir="<output directory for training results>" 
+ --output_name="my_sdxl_lora" 
+ --save_model_as=safetensors 
+ --network_module=networks.lora 
+ --network_dim=32 
+ --network_alpha=16 
+ --learning_rate=1e-4 
+ --unet_lr=1e-4 
+ --text_encoder_lr1=1e-5 
+ --text_encoder_lr2=1e-5 
+ --optimizer_type="AdamW8bit" 
+ --lr_scheduler="constant" 
+ --max_train_epochs=10 
+ --save_every_n_epochs=1 
+ --mixed_precision="bf16" 
+ --gradient_checkpointing 
+ --cache_text_encoder_outputs 
+ --cache_latents
+```
+
+Comparing with the execution example of `train_network.py`, the following points are different:
+
+* The script to execute is `sdxl_train_network.py`.
+* You specify an SDXL base model for `--pretrained_model_name_or_path`.
+* `--text_encoder_lr` is split into `--text_encoder_lr1` and `--text_encoder_lr2` (since SDXL has two Text Encoders).
+* `--mixed_precision` is recommended to be `bf16` or `fp16`.
+* `--cache_text_encoder_outputs` and `--cache_latents` are recommended to reduce VRAM usage.
+
+Next, we'll explain the main command line arguments that differ from `train_network.py`. For common arguments, please refer to [How to Use the LoRA Training Script `train_network.py`](train_network.md#31-main-command-line-arguments).
+
+<details>
+<summary>日本語</summary>
 学習は、ターミナルから `sdxl_train_network.py` を実行することで開始します。
 
 以下に、SDXL LoRA 学習における基本的なコマンドライン実行例を示します。
@@ -71,10 +141,78 @@ accelerate launch --num_cpu_threads_per_process 1 sdxl_train_network.py
 *   `--mixed_precision` は `bf16` または `fp16` が推奨されます。
 *   `--cache_text_encoder_outputs` や `--cache_latents` は VRAM 使用量を削減するために推奨されます。
 
-次に、`train_network.py` との差分となる主要なコマンドライン引数について解説します。共通の引数については、[LoRA学習スクリプト `train_network.py` の使い方](how_to_use_train_network.md#31-主要なコマンドライン引数) を参照してください。
+次に、`train_network.py` との差分となる主要なコマンドライン引数について解説します。共通の引数については、[LoRA学習スクリプト `train_network.py` の使い方](train_network.md#31-主要なコマンドライン引数) を参照してください。
+</details>
 
-### 3.1. 主要なコマンドライン引数（差分）
+### 3.1. Main Command Line Arguments (Differences) / 主要なコマンドライン引数（差分）
 
+#### Model Related / モデル関連
+
+* `--pretrained_model_name_or_path="<model path>"` **[Required]**
+  * Specifies the **SDXL model** to be used as the base for training. You can specify a Hugging Face Hub model ID (e.g., `"stabilityai/stable-diffusion-xl-base-1.0"`), a local Diffusers format model directory, or a path to a `.safetensors` file.
+* `--v2`, `--v_parameterization`
+  * These arguments are for SD1.x/2.x. When using `sdxl_train_network.py`, since an SDXL model is assumed, these **typically do not need to be specified**.
+
+#### Dataset Related / データセット関連
+
+* `--dataset_config="<config file path>"`
+  * This is common with `train_network.py`.
+  * For SDXL, it is common to use high-resolution data and the bucketing feature (specify `enable_bucket = true` in the `.toml` file).
+
+#### Output & Save Related / 出力・保存関連
+
+* These are common with `train_network.py`.
+
+#### LoRA Parameters / LoRA パラメータ
+
+* These are common with `train_network.py`.
+
+#### Training Parameters / 学習パラメータ
+
+* `--learning_rate=1e-4`
+  * Overall learning rate. This becomes the default value if `unet_lr`, `text_encoder_lr1`, and `text_encoder_lr2` are not specified.
+* `--unet_lr=1e-4`
+  * Learning rate for LoRA modules in the U-Net part. If not specified, the value of `--learning_rate` is used.
+* `--text_encoder_lr1=1e-5`
+  * Learning rate for LoRA modules in **Text Encoder 1 (OpenCLIP ViT-G/14)**. If not specified, the value of `--learning_rate` is used. A smaller value than U-Net is recommended.
+* `--text_encoder_lr2=1e-5`
+  * Learning rate for LoRA modules in **Text Encoder 2 (CLIP ViT-L/14)**. If not specified, the value of `--learning_rate` is used. A smaller value than U-Net is recommended.
+* `--optimizer_type="AdamW8bit"`
+  * Common with `train_network.py`.
+* `--lr_scheduler="constant"`
+  * Common with `train_network.py`.
+* `--lr_warmup_steps`
+  * Common with `train_network.py`.
+* `--max_train_steps`, `--max_train_epochs`
+  * Common with `train_network.py`.
+* `--mixed_precision="bf16"`
+  * Mixed precision training setting. For SDXL, `bf16` or `fp16` is recommended. Choose the one supported by your GPU. This reduces VRAM usage and improves training speed.
+* `--gradient_accumulation_steps=1`
+  * Common with `train_network.py`.
+* `--gradient_checkpointing`
+  * Common with `train_network.py`. Recommended to enable for SDXL due to its high memory consumption.
+* `--cache_latents`
+  * Caches VAE outputs in memory (or on disk when `--cache_latents_to_disk` is specified). By skipping VAE computation, this reduces VRAM usage and speeds up training. Image augmentations (`--color_aug`, `--flip_aug`, `--random_crop`, etc.) are disabled. This option is recommended for SDXL training.
+* `--cache_latents_to_disk`
+  * Used with `--cache_latents`, caches to disk. When loading the dataset for the first time, VAE outputs are cached to disk. This is recommended when you have a large number of training images, as it allows you to skip VAE computation on subsequent training runs.
+* `--cache_text_encoder_outputs`
+  * Caches Text Encoder outputs in memory (or on disk when `--cache_text_encoder_outputs_to_disk` is specified). By skipping Text Encoder computation, this reduces VRAM usage and speeds up training. Caption augmentations (`--shuffle_caption`, `--caption_dropout_rate`, etc.) are disabled.
+  * **Note:** When using this option, LoRA modules for Text Encoder cannot be trained (`--network_train_unet_only` must be specified).
+* `--cache_text_encoder_outputs_to_disk`
+  * Used with `--cache_text_encoder_outputs`, caches to disk.
+* `--no_half_vae`
+  * Runs VAE in `float32` even when using mixed precision (`fp16`/`bf16`). Since SDXL's VAE can be unstable in `float16`, enable this when using `fp16`.
+* `--clip_skip`
+  * Not normally used for SDXL. No need to specify.
+* `--fused_backward_pass`
+  * Fuses gradient computation and optimizer steps to reduce VRAM usage. Available for SDXL. (Currently only supports the `Adafactor` optimizer)
+
+#### Others / その他
+
+* `--seed`, `--logging_dir`, `--log_prefix`, etc. are common with `train_network.py`.
+
+<details>
+<summary>日本語</summary>
 #### モデル関連
 
 *   `--pretrained_model_name_or_path="<モデルのパス>"` **[必須]**
@@ -130,7 +268,7 @@ accelerate launch --num_cpu_threads_per_process 1 sdxl_train_network.py
 *   `--cache_text_encoder_outputs_to_disk`
     *   `--cache_text_encoder_outputs` と併用し、キャッシュ先をディスクにします。
 *   `--no_half_vae`
-    *   混合精度 (`fp16`/`bf16`) 使用時でも VAE を `float32` で動作させます。SDXL の VAE は `float16` で不安定になることがあるため、`fp16` 指定時には有効にしてくだ
+    *   混合精度 (`fp16`/`bf16`) 使用時でも VAE を `float32` で動作させます。SDXL の VAE は `float16` で不安定になることがあるため、`fp16` 指定時には有効にしてください。
 *   `--clip_skip`
     *   SDXL では通常使用しません。指定は不要です。
 *   `--fused_backward_pass`
@@ -139,22 +277,45 @@ accelerate launch --num_cpu_threads_per_process 1 sdxl_train_network.py
 #### その他
 
 *   `--seed`, `--logging_dir`, `--log_prefix` などは `train_network.py` と共通です。
+</details>
 
-### 3.2. 学習の開始
+### 3.2. Starting the Training / 学習の開始
 
+After setting the necessary arguments, execute the command to start training. The training progress will be displayed on the console. The basic flow is the same as with `train_network.py`.
+
+<details>
+<summary>日本語</summary>
 必要な引数を設定し、コマンドを実行すると学習が開始されます。学習の進行状況はコンソールに出力されます。基本的な流れは `train_network.py` と同じです。
+</details>
 
-## 4. 学習済みモデルの利用
+## 4. Using the Trained Model / 学習済みモデルの利用
 
+When training is complete, a LoRA model file (`.safetensors`, etc.) with the name specified by `output_name` will be saved in the directory specified by `output_dir`.
+
+This file can be used with GUI tools that support SDXL, such as AUTOMATIC1111/stable-diffusion-webui and ComfyUI.
+
+<details>
+<summary>日本語</summary>
 学習が完了すると、`output_dir` で指定したディレクトリに、`output_name` で指定した名前の LoRA モデルファイル (`.safetensors` など) が保存されます。
 
 このファイルは、AUTOMATIC1111/stable-diffusion-webui 、ComfyUI などの SDXL に対応した GUI ツールで利用できます。
+</details>
 
-## 5. 補足: `train_network.py` との主な違い
+## 5. Supplement: Main Differences from `train_network.py` / 補足: `train_network.py` との主な違い
 
+* **Target Model:** `sdxl_train_network.py` is exclusively for SDXL models.
+* **Text Encoder:** Since SDXL has two Text Encoders, there are differences in learning rate specifications (`--text_encoder_lr1`, `--text_encoder_lr2`), etc.
+* **Caching Features:** `--cache_text_encoder_outputs` is particularly effective for SDXL and is recommended.
+* **Recommended Settings:** Due to high VRAM usage, mixed precision (`bf16` or `fp16`), `gradient_checkpointing`, and caching features (`--cache_latents`, `--cache_text_encoder_outputs`) are recommended. When using `fp16`, it is recommended to run the VAE in `float32` with `--no_half_vae`.
+
+For other detailed options, please refer to the script's help (`python sdxl_train_network.py --help`) and other documents in the repository.
+
+<details>
+<summary>日本語</summary>
 *   **対象モデル:** `sdxl_train_network.py` は SDXL モデル専用です。
 *   **Text Encoder:** SDXL は 2 つの Text Encoder を持つため、学習率の指定 (`--text_encoder_lr1`, `--text_encoder_lr2`) などが異なります。
 *   **キャッシュ機能:** `--cache_text_encoder_outputs` は SDXL で特に効果が高く、推奨されます。
 *   **推奨設定:** VRAM 使用量が大きいため、`bf16` または `fp16` の混合精度、`gradient_checkpointing`、キャッシュ機能 (`--cache_latents`, `--cache_text_encoder_outputs`) の利用が推奨されます。`fp16` 指定時は、VAE は `--no_half_vae` で `float32` 動作を推奨します。
 
-その他の詳細なオプションについては、スクリプトのヘルプ (`python sdxl_train_network.py --help`) やリポジトリ内の他のドキュメントを参照してください。
\ No newline at end of file
+その他の詳細なオプションについては、スクリプトのヘルプ (`python sdxl_train_network.py --help`) やリポジトリ内の他のドキュメントを参照してください。
+</details>
\ No newline at end of file