This repository was archived by the owner on Jun 27, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 124
This repository was archived by the owner on Jun 27, 2025. It is now read-only.
Checking batch job status fails #456
Copy link
Copy link
Open
Description
I have batch job that perform some one-time short-running task. Successfull deploument looks like:
2022-06-29T16:00:17Z |INFO| levant/deploy: triggering a deployment job_id=some_nomad_job_name
2022-06-29T16:00:18Z |INFO| levant/deploy: evaluation e9d76b4c-8f4b-68e5-05e3-eee20a82d225 finished successfully job_id=some_nomad_job_name
2022-06-29T16:00:18Z |DEBU| levant/job_status_checker: running job status checker for job job_id=some_nomad_job_name
2022-06-29T16:00:18Z |INFO| levant/job_status_checker: job has status running job_id=some_nomad_job_name
2022-06-29T16:00:18Z |INFO| levant/job_status_checker: task command in allocation 124b605d-518e-6292-5cd3-8decc4d033ec now in pending state job_id=some_nomad_job_name
2022-06-29T16:00:27Z |INFO| levant/job_status_checker: task command in allocation 124b605d-518e-6292-5cd3-8decc4d033ec now in running state job_id=some_nomad_job_name
2022-06-29T16:00:27Z |INFO| levant/job_status_checker: all allocations in deployment of job are running job_id=some_nomad_job_name
2022-06-29T16:00:27Z |INFO| levant/deploy: job deployment successful job_id=some_nomad_job_name
Today i'v got error:
2022-07-06T14:57:01Z |INFO| levant/deploy: triggering a deployment job_id=some_nomad_job_name
2022-07-06T14:57:03Z |INFO| levant/deploy: evaluation ffa905f9-e937-e178-2e1a-d2b3d18ed8a8 finished successfully job_id=some_nomad_job_name
2022-07-06T14:57:03Z |DEBU| levant/job_status_checker: running job status checker for job job_id=some_nomad_job_name
2022-07-06T14:57:07Z |ERRO| levant/job_status_checker: job has status dead job_id=some_nomad_job_name
2022-07-06T14:57:07Z |ERRO| levant/deploy: job deployment failed job_id=some_nomad_job_name
In successful deployment time between "levant/job_status_checker: running job status checker for job" and first status is 0 seconds.
In failed - 4 seconds. During this time my job was successfully finished and has status 'dead' but levant thinks that this task is just dead so it exited with non zero code and fails by CI pipeline.
As i see, levant have some problems with communication to nomad and its tooks to long time to get job status.
Is it possible to disable check of job? because asynchronous checking of short lived tasks may fail unexpectedly
Metadata
Metadata
Assignees
Labels
No labels