-
Notifications
You must be signed in to change notification settings - Fork 457
OCPBUGS-59925: Fix keepalived SIGTERM handling #5403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Previously the SIGTERM that is sent when kubelet shuts down the pod was being ignored, which caused keepalived to be SIGKILLed and not able to send the priority 0 message so another node takes the VIP immediately. This caused ~5 seconds of delay when keepalived and haproxy were restarted during upgrades, and if this happened on a node where the local apiserver is also unavailable it causes a temporary API outage. There appears to have been two reasons for this: 1) The socat call blocks SIGTERM handling in the bash script 2) Sending SIGTERM to keepalived without waiting for it to complete can cause the container to exit before priority 0 is sent Using the "wait" command seems to make this work as expected. Now socat is started in the background and wait is used to keep the script from exiting. A delay is also added to allow keepalived time to shut down cleanly. Timeouts are not needed for any of these calls because kubelet will already send SIGKILL after the grace period expires.
|
@cybertron: This pull request references Jira Issue OCPBUGS-59925, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@cybertron: This pull request references Jira Issue OCPBUGS-59925, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test e2e-metal-ipi-ovn-dualstack |
|
@cybertron: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/test e2e-metal-ipi-ovn-dualstack Failed on ovnk, probably not related but would still like to see it green. |
| if pid=$(pgrep -o keepalived); then | ||
| kill -s SIGTERM "$pid" | ||
| # Give keepalived time to shut down | ||
| while pgrep -o keepalived; do sleep 1; done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
killall -o -w -s SIGTERM keepalived maybe?
-o,--older-than kill processes older than TIME
-w,--wait wait for processes to die
or do we need the extra sleeps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind. killall doesn't seem to match pgrep -o. Could pkill though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or could we do wait $pid it's a child process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, we don't want to be fast, we want to sleep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked about this with Ross offline, but documenting here for future reference:
The reason I'm looping on pgrep instead of using wait is that sometimes the pid we get back from keepalived isn't a child of the main script and wait will fail. I suspect that may be a bug in itself - we use -o to get the oldest pid, which would presumably be the parent keepalived process that was started by the main script, but it seems that isn't always true. It may be that the oldest pid somehow isn't always the main keepalived process.
In any case, another advantage of using pgrep is we will wait until all of the keepalived processes have exited so we should know for certain that priority 0 was sent by the time that completes. It's a bit inelegant, but it seems to be the safest way to handle this.
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: cybertron, rbbratta The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/verified by @rbbratta |
|
@rbbratta: This PR has been marked as verified by In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Previously the SIGTERM that is sent when kubelet shuts down the pod was being ignored, which caused keepalived to be SIGKILLed and not able to send the priority 0 message so another node takes the VIP immediately. This caused ~5 seconds of delay when keepalived and haproxy were restarted during upgrades, and if this happened on a node where the local apiserver is also unavailable it causes a temporary API outage.
There appears to have been two reasons for this:
can cause the container to exit before priority 0 is sent
Using the "wait" command seems to make this work as expected. Now socat is started in the background and wait is used to keep the script from exiting. A delay is also added to allow keepalived time to shut down cleanly.
Timeouts are not needed for any of these calls because kubelet will already send SIGKILL after the grace period expires.
- What I did
- How to verify it
- Description for the changelog