-
Notifications
You must be signed in to change notification settings - Fork 3.1k
fix(dataset) Fixing video indexing bug when using merged dataset | Fixes #[2328] | (🐛 Bug) | #2438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(dataset) Fixing video indexing bug when using merged dataset | Fixes #[2328] | (🐛 Bug) | #2438
Conversation
|
Hey @andras-makany! I tested your PR; hovewer, it not works unfortunatelly. I still receiving the same 'RuntimeError: Invalid frame index=52130 for streamIndex=0; must be less than 19675' when trying to train a model on the output dataset. What I did is cloned your branch, installed lerobot from it and runned merging with command: The output dataset is here: https://huggingface.co/datasets/Grigorij/xle_left_arm_merged_filtered_repaired/tree/main |
|
Hey @Grigorij-Dudnik! Thank you for testing my attempt on this fix. Your attached dataset revealed a major problem. My dataset only had a singular task, but yours have at least two. It seems when the task description changes, the merging starts a new file. In this case, the offset hasn't been reset, resulting in your error. I will look into a possible solution Monday. |
…ing occurs or having multiple episodes in one video file
|
@Grigorij-Dudnik So I found the problem that resulted in your error. The chunk and file indexing forced a single value on every episode in a dataset. This caused your problem, since some of your datasets, episodes were concatenated to the previous file, but in the metadata got the new file's index. Second error was due to having multiple episodes in one file. Your datasets had one episode per video which meant that the episode count was equal to the loop count. Mine had multiple episodes per video (as the resulting dataset too). This caused missing indexes in my dataset when aggregation logic was corrected to yours. As a solution for chunk and file index instead of a single value, I'm using a list to keep track of the resulting indexes. The resulting dataset is correct both for your case, and mine. Tests were fine, |
|
@andras-makany no worry, thanks for you for trying to solve an issue. What about first error "The chunk and file indexing forced a single value on every episode in a dataset. This caused your problem, since some of your datasets, episodes were concatenated to the previous file, but in the metadata got the new file's index." - I didn't undestood what you meant here to be honest. But as I understand, you managed to fix it? About second error - saving episodes in different video files been done by puropose - other way we had some ffmpeg problem during dataset collection. I can test it on Sunday or Monday and confirm if it works for me. |
|
@Grigorij-Dudnik thank you for your answer! The first mentioned problem was that when aggregating datasets, if the video recordings from a single dataset gets partially merged into one video file in the merged dataset, and videos from other episodes into a second video file, the metadata was set incorrectly, as if every episode was in the last used file. Yes, the problem was solved. For the second, thank you for your explanation. |
What it does
First mentioned in issue #2328. When using a merged dataset (version 3.0)
torch.util.data.DataLoaderencounters an exception that the given frame index is invalid.The error was due to not reseting the
latest_durationoffset inaggregate_videoswhen creating a new file, resulting in an offset equal to the last files frame number when writing the second episode of the new file.Solution was to reset
latest_durationafter creating a new file.How it was tested
Using the fixed aggregate, I merged a new copy of my dataset containing 75 episodes with video file size set to 500 MB, creating 13 video file in the merged dataset.
Viewing the newly merged dataset, the offsets were corrected.
Then I executed a model training with batch size of 10 and 100 training steps. After multiple try, no exception was caught, assuming the problem was solved.
Repository included tests were successful.
How to checkout & try? (for the reviewer)
Try merging smaller datasets to have at least 2 video files in the new dataset. Viewing the meta/episodes/.../file-000.parquet the
from_timestampandto_timestampvalues are sequential with no skips and resets to 0.0 when a new file starts.Training a model gives no errors as well.