Skip to content

Karpenter repeatedly creating launch templates, exhausting quota, during AWS outage scenario #8686

@jhughes-mc

Description

@jhughes-mc

Description

Observed Behavior:
AWS outage causing Karpenter to repeatedly create launch templates, exhausting account limits and preventing capacity from being created. AWS outage began on 10/20 around 1am PDT

Started on version 1.5.0, upgraded and issue persisted. Went from 1.5.0 >1.5.5> 1.8.0

│ karpenter-749b756667-dnj5q {"level":"ERROR","time":"2025-10-20T13:30:36.045Z","logger":"controller","message":"Reconciler error","commit":"31fa2ea","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeCla │
│ im","NodeClaim":{"name":"spot-gpu-tdqhw"},"namespace":"","name":"spot-gpu-tdqhw","reconcileID":"38e77c45-d816-4235-b8eb-2c2f21385517","aws-error-code":"MaxTemplateLimitExceeded","aws-operation-name":"CreateLaunchTemplate","aws-request-id":"1 │
1a30a53-4686-4487-88ab-fd866640d4ef","aws-service-name":"EC2","aws-status-code":400,"error":"launching nodeclaim, creating instance, creating nodeclaim, getting launch template configs, getting launch templates, creating launch template, ope │
│ ration error EC2: CreateLaunchTemplate, https response error StatusCode: 400, RequestID: 11a30a53-4686-4487-88ab-fd866640d4ef, api error MaxTemplateLimitExceeded: Account has already reached max permitted templates (aws-error-code=MaxTemplat │
│ eLimitExceeded, aws-operation-name=CreateLaunchTemplate, aws-request-id=11a30a53-4686-4487-88ab-fd866640d4ef, aws-service-name=EC2, aws-status-code=400)"}
Expected Behavior:
Karpenter correctly recognizes problem and does not seemingly loop endlessly exhausting launch template limits.

Reproduction Steps (Please include YAML):
Not sure how, but would need to recreate the outage scenario that occured on the AWS side. I gave an example of our nodeclass/nodepool config below

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default-1
labels:
app.kubernetes.io/name: default
spec:
metadataOptions:
httpPutResponseHopLimit: 2
detailedMonitoring: true
amiSelectorTerms:

alias: bottlerocket@latest
blockDeviceMappings:
deviceName: /dev/xvda
ebs:
encrypted: true
volumeSize: 20Gi
volumeType: gp3
deviceName: /dev/xvdb
ebs:
encrypted: true
volumeSize: 100Gi
volumeType: gp3
nodepool
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
disruption:
budgets:

  • nodes: 20%
    consolidateAfter: 1m
    consolidationPolicy: WhenEmptyOrUnderutilized
    template:
    metadata:
    labels:
    lifecycle: OnDemand
    nodegroup-type: ondemand
    spec:
    expireAfter: 720h0m0s
    nodeClassRef:
    group: karpenter.k8s.aws
    kind: EC2NodeClass
    name: default-1
    requirements:
  • key: kubernetes.io/os
    operator: In
    values:
  • linux
  • key: karpenter.k8s.aws/instance-hypervisor
    operator: In
    values:
  • xen
  • nitro

Versions:

  • Chart Version: 1.8.0
  • Kubernetes Version (kubectl version): 1.33.5
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions