Skip to content

Leader KIC Controller Pod Becomes Non-Functional When Non-Leader Pod's Node Fails or Disconnects from Cluster #7665

@piyush280

Description

@piyush280

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When a non-leader Kong Ingress Controller (KIC) pod's node experiences failure or is disconnected from the Kubernetes cluster (e.g., via hardware failure, network disconnection, kubelet stop, or standalone mode activation), the leader controller pod becomes unresponsive despite being on a separate, healthy node.

Specific observations include:

  • The leader pod’s CPU usage drops significantly, indicating it stops processing events.

  • Logs from the leader pod stop appearing, including:

    • No Successfully synced configuration to Kong messages
Image
  • No reconciliation or resource processing logs
Image
  • The leader pod does not release its lease, but also does not perform any active controller functions.

  • Ingress configuration stops updating across the proxy pods.

  • No crash, OOM, or restart occurs — the pod stays in Running state but is functionally stalled.

This leads to stalled ingress updates and potential traffic disruption, even though the leader pod is still technically alive and holding leadership.

Expected Behavior

The leader Kong Ingress Controller (KIC) pod should continue functioning normally and without interruption, regardless of failures or disconnections on nodes running non-leader controller pods.

Specifically:

The leader pod should continue to:

  • Process Kubernetes resource events

  • Push configuration updates to the proxy pods

  • Emit regular sync logs (e.g., Successfully synced configuration to Kong)

  • Renew its leadership lease

Failures affecting non-leader pods (such as node disconnection, kubelet shutdown, or standalone mode activation) should not:

  • Stall the leader pod

  • Cause loss of functionality

  • Block config syncs or event processing

Steps To Reproduce

Trigger failure on a non-leader pod's node

On the node running any non-leader controller pod, simulate one of the following failure scenarios:

Option A (Recommended on Bottlerocket):
apiclient set kubernetes.standalone-mode=true

Option B (Manually stop the kubelet):
systemctl stop kubelet

Kong Ingress Controller version

kong/kubernetes-ingress-controller:3.2

Kubernetes version

v1.27.16-eks-4096722

Anything else?

CPU metrics

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions