[help] Completed dynamic branches not "recorded" when the pipeline restarts #1552
Replies: 1 comment
-
|
I've been looking into this and I found the problem. I realized that the missing branches always start happening at the same time of day, coincidentally with a systemd timer that starts the pipeline.
I can try to create a reproducible example if anyone consider this important enough |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Help
Description
I have a fairly long pipeline that iterates by 381 dates, a dynamic branching with
crewwith 2 local workers. The pipeline takes long because some targets are calibrations (23 min per date) and interpolations (24 min per date) that are time-consuming. Ideally, when running everyday, it only will process the last date, as the rest were already processed in previous runs. But the first time it has to run all dates.Pipeline code can be found here: https://github.com/emf-creaf/emf_meteorology but as it requires some API keys and harcoded paths, it is not reproducible, sorry.
After a some hours running, if I check the pipeline progress, I can see (first image) that 133 branches were skipped (they were done in a previous run) and 88 branches completed (total of 221 branches done). Checking
targetsstore I can see the objects for those branches, and meta has been updated, as the progress show.If for some reason the pipeline is stopped and restarted (i.e.
Ctrl+C), what I see (second image) is that only 194 branches were skipped, I'm missing 27 branches.Nothing else has changed, no function code, no inputs, nothing. No branch already completed should have been invalidated.
Not only that, if I let the pipeline finish succesfully, with no errors (which takes almost 5-6 days), next run that should skip the dates already done, starts with only half (rough estimate, is not always the same number) of the dates skipped. Only if I stop the pipeline every 4-6 hours, more or less, then all the completed branches are properly "recorded" and restarting the pipeline results in the expected skipped branches.
How can I start debugging this?
Beta Was this translation helpful? Give feedback.
All reactions