Skip to content

Can't train on merged dataset if it contains more then one video file per camera #2328

@Grigorij-Dudnik

Description

@Grigorij-Dudnik

System Info

- lerobot version: 0.3.4
- Platform: Linux-5.15.0-153-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface Hub version: 0.35.3
- Datasets version: 4.1.1
- Numpy version: 2.2.6
- PyTorch version: 2.7.1+cu126
- Is PyTorch built with CUDA support?: True
- Cuda version: 12.6
- GPU model: NVIDIA GeForce RTX 4090
- Using GPU in script?: <fill in>


###

on different machine I had same problem

Information

  • One of the scripts in the examples/ folder of LeRobot
  • My own task or dataset (give details below)

Reproduction

After I merge datasets with merge datasets tool, and running training on it, have cuch traceback:

File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1515, in _next_data
return self._process_data(data, worker_id)
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1550, in _process_data
data.reraise()
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/torch/_utils.py", line 750, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 349, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/lerobot/datasets/lerobot_dataset.py", line 989, in getitem
video_frames = self._query_videos(query_timestamps, ep_idx)
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/lerobot/datasets/lerobot_dataset.py", line 954, in _query_videos
frames = decode_video_frames(video_path, shifted_query_ts, self.tolerance_s, self.video_backend)
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/lerobot/datasets/video_utils.py", line 69, in decode_video_frames
return decode_video_frames_torchcodec(video_path, timestamps, tolerance_s)
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/lerobot/datasets/video_utils.py", line 259, in decode_video_frames_torchcodec
frames_batch = decoder.get_frames_at(indices=frame_indices)
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/torchcodec/decoders/_video_decoder.py", line 227, in get_frames_at
data, pts_seconds, duration_seconds = core.get_frames_at_indices(
File "/home/gregor/Experiments/lerobot/venv/lib/python3.10/site-packages/torch/_ops.py", line 756, in call
return self._op(*args, **kwargs)
RuntimeError: Invalid frame index=19603 for streamIndex=0; must be less than 1414

I mentioned, error happens if videos in datasets are divided on more then one chunk per camera - if there only one video file per one camera, everything works.

Any ideas how can I increase max video size above default 500 mb to walkaround the bug for now?

Expected behavior

Training should work fo many video chunks or videos should not be chunked.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn’t working correctlydatasetIssues regarding data inputs, processing, or datasets

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions