Skip to content

Conversation

@gaikwadabhishek
Copy link

@gaikwadabhishek gaikwadabhishek commented Nov 21, 2025

  • Introduced environment variable USE_AIS_GET_BATCH to toggle AIS batch loading.
  • Updated PromptedAudioToTextLhotseDataset to utilize AISBatchLoader when enabled.
  • Modified LazyNeMoTarredIterator to handle URL-based recordings when AIS batch loading is active.

What does this PR do ?

Implement URL-based audio loading using AIStore's Get-Batch API to improve data pipeline efficiency. This allows batch fetching of multiple audio files without local tar archive extraction, offloading processing to AIStore.

Collection: asr

Changelog

  • Updated PromptedAudioToTextLhotseDataset to utilize AISBatchLoader when enabled.
  • Modified LazyNeMoTarredIterator to handle URL-based recordings when AIS batch loading is active

Usage

  • Implement URL-based audio loading using AIStore's Get-Batch API to improve data pipeline efficiency. This allows batch fetching of multiple audio files without local tar archive extraction, offloading processing to AIStore.

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • [] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

@gaikwadabhishek gaikwadabhishek marked this pull request as draft November 21, 2025 02:36
@gaikwadabhishek gaikwadabhishek marked this pull request as ready for review November 21, 2025 02:42
@gaikwadabhishek gaikwadabhishek force-pushed the get-batch branch 3 times, most recently from 6ae836b to 3e60dd6 Compare November 21, 2025 18:01
pzelasko
pzelasko previously approved these changes Nov 21, 2025
Copy link
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, LGTM

@gaikwadabhishek
Copy link
Author

tests are failing due to missing lhotse release. will re-run after the new lhotse release is in place

auto-merge was automatically disabled November 24, 2025 17:51

Head branch was pushed to by a user without write access

@tbartley94 tbartley94 self-requested a review November 24, 2025 19:36
* Introduced environment variable USE_AIS_GET_BATCH to toggle AIS batch loading.
* Updated PromptedAudioToTextLhotseDataset to utilize AISBatchLoader when enabled.
* Modified LazyNeMoTarredIterator to handle URL-based recordings when AIS batch loading is active.

Implement URL-based audio loading using AIStore's Get-Batch API to improve
data pipeline efficiency. This allows batch fetching of multiple audio files
without local tar archive extraction, offloading processing to AIStore.

Signed-off-by: Abhishek Gaikwad <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants