-
Notifications
You must be signed in to change notification settings - Fork 192
Open
Description
Hello,
training process is initiated without problem, but when some times left, it is frozen like:
it doesn't show any change on console
and I check GPUs at that time and what I got is GPU-Util(not memory) is full when the process is frozen (that I think this is a clue of this problem):
I fixed parameter like batch_size, worker, etc, but it doesn't help
Can anyone help?
my env is on miniconda3, and using CUDA 11.8, so version is:
PyTorch 2.0.0
PyTorch Lightning 2.0.2
Metadata
Metadata
Assignees
Labels
No labels

