-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
System Info
- lerobot version: 0.3.4
- Platform: Linux-5.15.0-153-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface Hub version: 0.35.3
- Datasets version: 4.1.1
- Numpy version: 2.2.6
- PyTorch version: 2.7.1+cu126
- Is PyTorch built with CUDA support?: True
- Cuda version: 12.6
- GPU model: NVIDIA GeForce RTX 4090
- Using GPU in script?: <fill in>
###
on different machine I had same problemInformation
- One of the scripts in the examples/ folder of LeRobot
- My own task or dataset (give details below)
Reproduction
After I merge datasets with merge datasets tool, and running training on it, have cuch traceback:
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1515, in _next_data
return self._process_data(data, worker_id)
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1550, in _process_data
data.reraise()
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/torch/_utils.py", line 750, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 349, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/lerobot/datasets/lerobot_dataset.py", line 989, in getitem
video_frames = self._query_videos(query_timestamps, ep_idx)
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/lerobot/datasets/lerobot_dataset.py", line 954, in _query_videos
frames = decode_video_frames(video_path, shifted_query_ts, self.tolerance_s, self.video_backend)
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/lerobot/datasets/video_utils.py", line 69, in decode_video_frames
return decode_video_frames_torchcodec(video_path, timestamps, tolerance_s)
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/lerobot/datasets/video_utils.py", line 259, in decode_video_frames_torchcodec
frames_batch = decoder.get_frames_at(indices=frame_indices)
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/torchcodec/decoders/_video_decoder.py", line 227, in get_frames_at
data, pts_seconds, duration_seconds = core.get_frames_at_indices(
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/torch/_ops.py", line 756, in call
return self._op(*args, **kwargs)
RuntimeError: Invalid frame index=19603 for streamIndex=0; must be less than 1414
I mentioned, error happens if videos in datasets are divided on more then one chunk per camera - if there only one video file per one camera, everything works.
Any ideas how can I increase max video size above default 500 mb to walkaround the bug for now?
Expected behavior
Training should work fo many video chunks or videos should not be chunked.