-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Labels
enhancementNew feature or requestNew feature or requestevent gather pipelineA feature or bugfix relating to event processingA feature or bugfix relating to event processing
Description
Feature Description
A clear and concise description of the feature you're requesting.
Add parameters:
batch-sizean optional integer that will be used to iteratively slice and run the pipeline on that many events at a time. I.e. if the gather for the specified time range finds 50 events but the batch size is 10, the pipeline will run 5 independent times each with 10 events to process.skip-errored-events-during-processingthat will ignore events that raise an error during processing. Enough debug info should be gathered / kept that the log printed out after the pipeline finishes contains the event details and "the thing that errored".skip-errored-events-during-gatherthat will ignore events that fail to scrape / gather. Similar to the above parameter, enough debug info should be printed after scraping. "Found 20 events, skipping 2 due to errors" for example.
Also would be really interesting to see if I can allow certain errors. retry-errors=[ConnectionError]
Use Case
Please provide a use case to help us understand your request in context.
I am backfilling a lot of data for certain instances and it is becoming annoying to process week by week. This is generally required for a couple of reasons:
- storage space on machine (GHA runners only have 16 GB of disk so can't download and process more than ~4 meeting videos at a time) -- hence batch size
- there are errors in less than 1% of events that aren't random connection errors. These are things like the video page being parsed incorrectly and such.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestevent gather pipelineA feature or bugfix relating to event processingA feature or bugfix relating to event processing