-
Notifications
You must be signed in to change notification settings - Fork 86
Open
Description
We have a big and busy EKS cluster with nodes joining and leaving many times in a day (spot instances failing or being replaced). We try to update each ASG separately with ASG_NAMES setting. The problem is, the eks-rolling-update always checks the whole cluster for node count and it many times fails as node count is not matched with expected.
It should only monitor the selected ASG(s) for expected instance count.
2021-02-10 16:26:57,425 INFO Current k8s node count is 94
2021-02-10 16:26:57,426 INFO Current k8s node count is 94
2021-02-10 16:26:57,426 INFO Waiting for k8s nodes to reach count 92...
2021-02-10 16:27:18,198 INFO Getting k8s nodes...
2021-02-10 16:27:19,341 INFO Current k8s node count is 94
2021-02-10 16:27:19,342 INFO Current k8s node count is 94
2021-02-10 16:27:19,342 INFO Waiting for k8s nodes to reach count 92...
2021-02-10 16:27:40,119 INFO Getting k8s nodes...
2021-02-10 16:27:41,470 INFO Current k8s node count is 94
2021-02-10 16:27:41,471 INFO Current k8s node count is 94
2021-02-10 16:27:41,471 INFO Waiting for k8s nodes to reach count 92...
...
2021-02-10 16:28:01,472 INFO Validation failed for cluster *****. Didn't reach expected node count 92.
2021-02-10 16:28:01,472 INFO Exiting since ASG healthcheck failed after 2 attempts
2021-02-10 16:28:01,472 ERROR ASG healthcheck failed
2021-02-10 16:28:01,472 ERROR *** Rolling update of ASG has failed. Exiting ***
2021-02-10 16:28:01,472 ERROR AWS Auto Scaling Group processes will need resuming manually
aywrite, zichul, pmichna and fliphess-tiqets
Metadata
Metadata
Assignees
Labels
No labels