-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Description
I'm getting a OOM running the
python enhance_a_video.py \
--version v2 \
--up_scale 4 --target_fps 24 --noise_aug 250 \
--solver_mode 'fast' --steps 15 \
--input_path 'prompts/' \
--prompt_path 'prompts/text_prompts.txt' \
--save_dir 'results/' \
--model_path 'ckpts/venhancer_v2.pt'using the
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA L4 On | 00000000:35:00.0 Off | 0 |
| N/A 47C P0 20W / 72W | 0MiB / 23034MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Error
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.13 GiB (GPU 0; 21.96 GiB total capacity; 18.17 GiB already allocated; 837.06 MiB free; 20.89 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFIs it possibile to apply Bf16 quantization? My approach using CogVideoX to run in 24GB is to Tiled VAE Decoding, Sliced VAE Decoding plus CPU offload and running the pipe in BF16. To quantize the model I use torchao:
from torchao.quantization import quantize_, int8_weight_only
from torchao.float8.inference import ActivationCasting, QuantConfig, quantize_to_float8
def quantize_model(part, quantization_scheme):
if quantization_scheme == "int8":
quantize_(part, int8_weight_only())
elif quantization_scheme == "fp8":
quantize_to_float8(part, QuantConfig(ActivationCasting.DYNAMIC))
return partNot sure that this can be applied to your model too.
andreaboscarino, simonefrancia, lucatorellimxm, paolomagnani-mxm, shoegazerstella and 1 more
Metadata
Metadata
Assignees
Labels
No labels