rockerBOO
c149cf283b
Add parser args for other trainers.
2025-08-03 00:58:25 -04:00
rockerBOO
d6f158ddf6
Fix incorrect destructoring for load_abritrary_dataset
2025-01-08 18:48:05 -05:00
rockerBOO
9fde0d7972
Handle tuple return from generate_dataset_group_by_blueprint
2025-01-08 18:38:20 -05:00
Kohya S
cc11989755
fix: refactor huber-loss calculation in multiple training scripts
2024-12-01 21:20:28 +09:00
recris
740ec1d526
Fix issues found in review
2024-11-28 20:38:32 +00:00
recris
420a180d93
Implement pseudo Huber loss for Flux and SD3
2024-11-27 18:37:09 +00:00
Kohya S
5fba6f514a
Merge branch 'dev' into sd3
2024-10-25 19:03:27 +09:00
catboxanon
e1b63c2249
Only add warning for deprecated scaling vpred loss function
2024-10-21 08:12:53 -04:00
catboxanon
8fc30f8205
Fix training for V-pred and ztSNR
...
1) Updates debiased estimation loss function for V-pred.
2) Prevents now-deprecated scaling of loss if ztSNR is enabled.
2024-10-21 07:34:33 -04:00
Kohya S
2500f5a798
fix latents caching not working closes #1696
2024-10-15 07:16:34 +09:00
kohya-ss
c80c304779
Refactor caching in train scripts
2024-10-12 20:18:41 +09:00
Kohya S
f2bc820133
support weighted captions for SD/SDXL
2024-10-11 08:48:55 +09:00
Kohya S
d050638571
Merge branch 'dev' into sd3
2024-09-29 10:00:01 +09:00
Kohya S
fe2aa32484
adjust min/max bucket reso divisible by reso steps #1632
2024-09-29 09:49:25 +09:00
Plat
a823fd9fb8
Improve wandb logging ( #1576 )
...
* fix: wrong training steps were recorded to wandb, and no log was sent when logging_dir was not specified
* fix: checking of whether wandb is enabled
* feat: log images to wandb with their positive prompt as captions
* feat: logging sample images' caption for sd3 and flux
* fix: import wandb before use
2024-09-11 22:21:16 +09:00
Kohya S
41dee60383
Refactor caching mechanism for latents and text encoder outputs, etc.
2024-07-27 13:50:05 +09:00
Kohya S
c68baae480
add --log_config option to enable/disable output training config
2024-05-19 17:21:04 +09:00
Maatra
2c9db5d9f2
passing filtered hyperparameters to accelerate
2024-04-20 14:11:43 +01:00
kabachuha
90b18795fc
Add option to use Scheduled Huber Loss in all training pipelines to improve resilience to data corruption ( #1228 )
...
* add huber loss and huber_c compute to train_util
* add reduction modes
* add huber_c retrieval from timestep getter
* move get timesteps and huber to own function
* add conditional loss to all training scripts
* add cond loss to train network
* add (scheduled) huber_loss to args
* fixup twice timesteps getting
* PHL-schedule should depend on noise scheduler's num timesteps
* *2 multiplier to huber loss cause of 1/2 a^2 conv.
The Taylor expansion of sqrt near zero gives 1/2 a^2, which differs from a^2 of the standard MSE loss. This change scales them better against one another
* add option for smooth l1 (huber / delta)
* unify huber scheduling
* add snr huber scheduler
---------
Co-authored-by: Kohya S <52813779+kohya-ss@users.noreply.github.com >
2024-04-07 13:54:21 +09:00
ykume
cd587ce62c
verify command line args if wandb is enabled
2024-04-05 08:23:03 +09:00
Kohya S
a2b8531627
make each script consistent, fix to work w/o DeepSpeed
2024-03-25 22:28:46 +09:00
Kohya S
fbb98f144e
Merge branch 'dev' into deep-speed
2024-03-20 18:15:26 +09:00
gesen2egee
095b8035e6
save state on train end
2024-03-10 23:33:38 +08:00
Kohya S
e3ccf8fbf7
make deepspeed_utils
2024-02-27 21:30:46 +09:00
Kohya S
eefb3cc1e7
Merge branch 'deep-speed' into deepspeed
2024-02-27 18:57:42 +09:00
Kohya S
f4132018c5
fix to work with cpu_count() == 1 closes #1134
2024-02-24 19:25:31 +09:00
BootsofLagrangian
4d5186d1cf
refactored codes, some function moved into train_utils.py
2024-02-22 16:20:53 +09:00
Kohya S
358ca205a3
Merge branch 'dev' into dev_device_support
2024-02-12 13:01:54 +09:00
Kohya S
e24d9606a2
add clean_memory_on_device and use it from training
2024-02-12 11:10:52 +09:00
BootsofLagrangian
03f0816f86
the reason not working grad accum steps found. it was becasue of my accelerate settings
2024-02-09 17:47:49 +09:00
Kohya S
055f02e1e1
add logging args for training scripts
2024-02-08 21:16:42 +09:00
BootsofLagrangian
62556619bd
fix full_fp16 compatible and train_step
2024-02-07 16:42:05 +09:00
BootsofLagrangian
7d2a9268b9
apply offloading method runable for all trainer
2024-02-05 22:42:06 +09:00
BootsofLagrangian
4295f91dcd
fix all trainer about vae
2024-02-05 20:19:56 +09:00
Kohya S
efd3b58973
Add logging arguments and update logging setup
2024-02-04 20:44:10 +09:00
Yuta Hayashibe
5f6bf29e52
Replace print with logger if they are logs ( #905 )
...
* Add get_my_logger()
* Use logger instead of print
* Fix log level
* Removed line-breaks for readability
* Use setup_logging()
* Add rich to requirements.txt
* Make simple
* Use logger instead of print
---------
Co-authored-by: Kohya S <52813779+kohya-ss@users.noreply.github.com >
2024-02-04 18:14:34 +09:00
BootsofLagrangian
dfe08f395f
support deepspeed
2024-02-04 03:12:42 +09:00
Disty0
a6a2b5a867
Fix IPEX support and add XPU device to device_utils
2024-01-31 17:32:37 +03:00
Aarni Koskela
afc38707d5
Refactor memory cleaning into a single function
2024-01-23 14:28:50 +02:00
Aarni Koskela
6f3f701d3d
Deduplicate ipex initialization code
2024-01-19 18:07:36 +02:00
Kohya S
32b759a328
Add wandb_run_name parameter to init_kwargs #1032
2024-01-14 22:02:03 +09:00
Kohya S
912dca8f65
fix duplicated sample gen for every epoch ref #907
2023-12-07 22:13:38 +09:00
Isotr0py
db84530074
Fix gradients synchronization for multi-GPUs training ( #989 )
...
* delete DDP wrapper
* fix train_db vae and train_network
* fix train_db vae and train_network unwrap
* network grad sync
---------
Co-authored-by: Kohya S <52813779+kohya-ss@users.noreply.github.com >
2023-12-07 22:01:42 +09:00
Kohya S
383b4a2c3e
Merge pull request #907 from shirayu/add_option_sample_at_first
...
Add option --sample_at_first
2023-12-03 21:00:32 +09:00
feffy380
6b3148fd3f
Fix min-snr-gamma for v-prediction and ZSNR.
...
This fixes min-snr for vpred+zsnr by dividing directly by SNR+1.
The old implementation did it in two steps: (min-snr/snr) * (snr/(snr+1)), which causes division by zero when combined with --zero_terminal_snr
2023-11-07 23:02:25 +01:00
Kohya S
6231aa91e2
common lr logging, set default None to ddp_timeout
2023-11-05 19:09:17 +09:00
Yuta Hayashibe
2c731418ad
Added sample_images() for --sample_at_first
2023-10-29 22:08:42 +09:00
Kohya S
96d877be90
support separate LR for Text Encoder for SD1/2
2023-10-29 21:30:32 +09:00
Kohya S
9d6a5a0c79
Merge pull request #899 from shirayu/use_moving_average
...
Show moving average loss in the progress bar
2023-10-29 14:37:58 +09:00
Yuta Hayashibe
63992b81c8
Fix initialize place of loss_recorder
2023-10-27 21:13:29 +09:00