-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Description
Observed Behavior:
AWS outage causing Karpenter to repeatedly create launch templates, exhausting account limits and preventing capacity from being created. AWS outage began on 10/20 around 1am PDT
Started on version 1.5.0, upgraded and issue persisted. Went from 1.5.0 >1.5.5> 1.8.0
│ karpenter-749b756667-dnj5q {"level":"ERROR","time":"2025-10-20T13:30:36.045Z","logger":"controller","message":"Reconciler error","commit":"31fa2ea","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeCla │
│ im","NodeClaim":{"name":"spot-gpu-tdqhw"},"namespace":"","name":"spot-gpu-tdqhw","reconcileID":"38e77c45-d816-4235-b8eb-2c2f21385517","aws-error-code":"MaxTemplateLimitExceeded","aws-operation-name":"CreateLaunchTemplate","aws-request-id":"1 │
│ 1a30a53-4686-4487-88ab-fd866640d4ef","aws-service-name":"EC2","aws-status-code":400,"error":"launching nodeclaim, creating instance, creating nodeclaim, getting launch template configs, getting launch templates, creating launch template, ope │
│ ration error EC2: CreateLaunchTemplate, https response error StatusCode: 400, RequestID: 11a30a53-4686-4487-88ab-fd866640d4ef, api error MaxTemplateLimitExceeded: Account has already reached max permitted templates (aws-error-code=MaxTemplat │
│ eLimitExceeded, aws-operation-name=CreateLaunchTemplate, aws-request-id=11a30a53-4686-4487-88ab-fd866640d4ef, aws-service-name=EC2, aws-status-code=400)"}
Expected Behavior:
Karpenter correctly recognizes problem and does not seemingly loop endlessly exhausting launch template limits.
Reproduction Steps (Please include YAML):
Not sure how, but would need to recreate the outage scenario that occured on the AWS side. I gave an example of our nodeclass/nodepool config below
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default-1
labels:
app.kubernetes.io/name: default
spec:
metadataOptions:
httpPutResponseHopLimit: 2
detailedMonitoring: true
amiSelectorTerms:
alias: bottlerocket@latest
blockDeviceMappings:
deviceName: /dev/xvda
ebs:
encrypted: true
volumeSize: 20Gi
volumeType: gp3
deviceName: /dev/xvdb
ebs:
encrypted: true
volumeSize: 100Gi
volumeType: gp3
nodepool
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
disruption:
budgets:
- nodes: 20%
consolidateAfter: 1m
consolidationPolicy: WhenEmptyOrUnderutilized
template:
metadata:
labels:
lifecycle: OnDemand
nodegroup-type: ondemand
spec:
expireAfter: 720h0m0s
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default-1
requirements: - key: kubernetes.io/os
operator: In
values: - linux
- key: karpenter.k8s.aws/instance-hypervisor
operator: In
values: - xen
- nitro
Versions:
- Chart Version: 1.8.0
- Kubernetes Version (kubectl version): 1.33.5
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment