-
Notifications
You must be signed in to change notification settings - Fork 47
Open
Labels
area/ciIssues or PRs related to CIIssues or PRs related to CIkind/bugSomething isn't workingSomething isn't workinglifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.needs-priorityIndicates an issue or PR needs a priority assigning to itIndicates an issue or PR needs a priority assigning to itneeds-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.
Description
What happened:
The e2e Move/Pivot test fails due to an unknown containerd issue.
How to run the test:
GINKGO_SKIP="" GINKGO_FOCUS="Pivot" make test-e2e
What happens:
- Very sporadically, when upscaling control plane nodes, there is a chance the new Machine will be stuck in Provisioning. I can only reproduce this after the Cluster has been moved to itself.
- The container associated to the Machine is Created, but not Started:
> docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f44792628f4e kindest/node:v1.30.4 "/usr/local/bin/entr…" 45 minutes ago Created caprke2-e2e-fnk907-pivot-control-plane-qwzwm
0a0e80f73338 kindest/node:v1.30.4 "/usr/local/bin/entr…" 48 minutes ago Up 48 minutes caprke2-e2e-fnk907-pivot-md-0-2hxwq-s95cg
5911ad690f14 kindest/node:v1.30.4 "/usr/local/bin/entr…" 58 minutes ago Up 58 minutes 127.0.0.1:32851->6443/tcp caprke2-e2e-fnk907-pivot-control-plane-zrlrc
20643d6f3baf kindest/haproxy:v20230606-42a2262b "haproxy -W -db -f /…" 58 minutes ago Up 58 minutes 0.0.0.0:32798->6443/tcp, 0.0.0.0:32799->8404/tcp caprke2-e2e-fnk907-pivot-lb
409f2a2eff0d moby/buildkit:buildx-stable-1 "buildkitd --allow-i…" 5 months ago Up 4 days buildx_buildkit_rancher-turtles0
fdcbbc271ea7 moby/buildkit:buildx-stable-1 "buildkitd --allow-i…" 5 months ago Up 4 days buildx_buildkit_cluster-api-provider-rke20- CAPD is unable to bootstrap this machine and fails in a loop:
I0701 08:02:30.134274 1 machine.go:392] "Failed running command" controller="dockermachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="DockerMachine" DockerMachine="bootstrap-pivot-cluster-ue8zsw/caprke2-e2e-fnk907-pivot-control-plane-9x22z" namespace="bootstrap-pivot-cluster-ue8zsw" name="caprke2-e2e-fnk907-pivot-control-plane-9x22z" reconcileID="44905eb4-bf56-4e89-b172-0168572bfa3d" Machine="bootstrap-pivot-cluster-ue8zsw/caprke2-e2e-fnk907-pivot-control-plane-qwzwm" Machine="bootstrap-pivot-cluster-ue8zsw/caprke2-e2e-fnk907-pivot-control-plane-qwzwm" Cluster="bootstrap-pivot-cluster-ue8zsw/caprke2-e2e-fnk907-pivot" instance="caprke2-e2e-fnk907-pivot-control-plane-qwzwm" command={"Cmd":"mkdir","Args":["-p","/etc/rancher/rke2"],"Stdin":""} stdout="" stderr="" bootstrap data="IyMgdGVtcGxhdGU6IGppbmphCiNjbG91ZC1jb25maWcKCndyaXRlX2ZpbGVzOgotICAgcGF0aDogL2V0Yy9yYW5jaGVyL3JrZTIvcmVnaXN0cmllcy55YW1sCiAgICBvd25lcjogcm9vdDpyb290CiAgICBwZXJtaXNzaW9uczogJzA2NDAnCiAgICBjb250ZW50OiB8CiAgICAgIGNvbmZpZ3M6IG51bGwKICAgICAgbWlycm9yczoge30KICAgICAgCi0gICBwYXRoOiAvZXRjL3JhbmNoZXIvcmtlMi9jb25maWcueWFtbAogICAgb3duZXI6IHJvb3Q6cm9vdAogICAgcGVybWlzc2lvbnM6ICcwNjQwJwogICAgY29udGVudDogfAogICAgICBkaXNhYmxlLWNsb3VkLWNvbnRyb2xsZXI6IHRydWUKICAgICAgZGlzYWJsZToKICAgICAgICAtIHJrZTItaW5ncmVzcy1uZ2lueAogICAgICBrdWJlLWFwaXNlcnZlci1hcmc6CiAgICAgICAgLSAtLWFub255bW91cy1hdXRoPXRydWUKICAgICAgdGxzLXNhbjoKICAgICAgICAtIDE3Mi4xOC4wLjMKICAgICAgY2x1c3Rlci1jaWRyOiAxMC40NS4wLjAvMTYKICAgICAgc2VydmljZS1jaWRyOiAxMC40Ni4wLjAvMTYKICAgICAga3ViZWxldC1hcmc6CiAgICAgICAgLSBhbm9ueW1vdXMtYXV0aD10cnVlCiAgICAgIHNlcnZlcjogaHR0cHM6Ly8xNzIuMTguMC4zOjkzNDUKICAgICAgdG9rZW46IDRhYmY0MWRiM2MyYjllMGRhZjkzMzhjOTdkNWFmNTgzCiAgICAgIAoKCnJ1bmNtZDoKICAtICdjdXJsIC1zZkwgaHR0cHM6Ly9nZXQucmtlMi5pbyB8IElOU1RBTExfUktFMl9WRVJTSU9OPXYxLjMxLjArcmtlMnIxIHNoIC1zIC0gc2VydmVyJwogIC0gJ3N5c3RlbWN0bCBlbmFibGUgcmtlMi1zZXJ2ZXIuc2VydmljZScKICAtICdzeXN0ZW1jdGwgc3RhcnQgcmtlMi1zZXJ2ZXIuc2VydmljZScKICAtICdta2RpciAtcCAvcnVuL2NsdXN0ZXItYXBpJwogIC0gJ2VjaG8gc3VjY2VzcyA+IC9ydW4vY2x1c3Rlci1hcGkvYm9vdHN0cmFwLXN1Y2Nlc3MuY29tcGxldGUnCg=="
I0701 08:02:30.135344 1 machine.go:573] "Got logs from the machine container" controller="dockermachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="DockerMachine" DockerMachine="bootstrap-pivot-cluster-ue8zsw/caprke2-e2e-fnk907-pivot-control-plane-9x22z" namespace="bootstrap-pivot-cluster-ue8zsw" name="caprke2-e2e-fnk907-pivot-control-plane-9x22z" reconcileID="44905eb4-bf56-4e89-b172-0168572bfa3d" Machine="bootstrap-pivot-cluster-ue8zsw/caprke2-e2e-fnk907-pivot-control-plane-qwzwm" Machine="bootstrap-pivot-cluster-ue8zsw/caprke2-e2e-fnk907-pivot-control-plane-qwzwm" Cluster="bootstrap-pivot-cluster-ue8zsw/caprke2-e2e-fnk907-pivot" output=<
Inspected the container:
{"Id":"f44792628f4e98a35e0eeff514cad90def39f18257b592a7bca1338d8ece0a13","Created":"2025-07-01T07:25:20.254689114Z","Path":"/usr/local/bin/entrypoint","Args":["/sbin/init"],"State":{"Status":"created","Running":false,"Paused":false,"Restarting":false,"OOMKilled":false,"Dead":false,"Pid":0,"ExitCode":0,"Error":"","StartedAt":"0001-01-01T00:00:00Z","FinishedAt":"0001-01-01T00:00:00Z"},"Image":"sha256:ea9c9420224022fe8410c8b9d96d80385ec9b0e444c0d1681fd71f118cb6377a","ResolvConfPath":"","HostnamePath":"","HostsPath":"","LogPath":"/var/lib/docker/containers/f44792628f4e98a35e0eeff514cad90def39f18257b592a7bca1338d8ece0a13/f44792628f4e98a35e0eeff514cad90def39f18257b592a7bca1338d8ece0a13-json.log","Name":"/caprke2-e2e-fnk907-pivot-control-plane-qwzwm","RestartCount":0,"Driver":"overlay2","Platform":"linux","MountLabel":"","ProcessLabel":"","AppArmorProfile":"unconfined","ExecIDs":null,"HostConfig":{"Binds":["/var/run/docker.sock:/var/run/docker.sock","/home/andrea/repos/cluster-api-provider-rke2/out/images:/var/lib/rancher/rke2/agent/images","/lib/modules:/lib/modules:ro"],"ContainerIDFile":"","LogConfig":{"Type":"json-file","Config":{"max-file":"5","max-size":"10m"}},"NetworkMode":"kind","PortBindings":{"6443/tcp":[{"HostIp":"127.0.0.1","HostPort":"0"}]},"RestartPolicy":{"Name":"on-failure","MaximumRetryCount":1},"AutoRemove":false,"VolumeDriver":"","VolumesFrom":null,"ConsoleSize":[0,0],"CapAdd":null,"CapDrop":null,"CgroupnsMode":"private","Dns":null,"DnsOptions":null,"DnsSearch":null,"ExtraHosts":null,"GroupAdd":null,"IpcMode":"private","Cgroup":"","Links":null,"OomScoreAdj":0,"PidMode":"","Privileged":true,"PublishAllPorts":false,"ReadonlyRootfs":false,"SecurityOpt":["seccomp=unconfined","apparmor=unconfined","label=disable"],"Tmpfs":{"/run":"","/tmp":""},"UTSMode":"","UsernsMode":"","ShmSize":67108864,"Runtime":"runc","Isolation":"","CpuShares":0,"Memory":0,"NanoCpus":0,"CgroupParent":"","BlkioWeight":0,"BlkioWeightDevice":null,"BlkioDeviceReadBps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteIOps":null,"CpuPeriod":0,"CpuQuota":0,"CpuRealtimePeriod":0,"CpuRealtimeRuntime":0,"CpusetCpus":"","CpusetMems":"","Devices":null,"DeviceCgroupRules":null,"DeviceRequests":null,"MemoryReservation":0,"MemorySwap":0,"MemorySwappiness":null,"OomKillDisable":false,"PidsLimit":null,"Ulimits":null,"CpuCount":0,"CpuPercent":0,"IOMaximumIOps":0,"IOMaximumBandwidth":0,"MaskedPaths":null,"ReadonlyPaths":null,"Init":false},"GraphDriver":{"Data":{"ID":"f44792628f4e98a35e0eeff514cad90def39f18257b592a7bca1338d8ece0a13","LowerDir":"/var/lib/docker/overlay2/a1174e81ebbb1c950829b4eaffd79f2bb733e0ecccc77648fb0c3e36bd2e4f40-init/diff:/var/lib/docker/overlay2/0f3f34eab8ad0f0ee4c00fad856a13b327a017980e8c7a8fb4d5f3f6bb3f644f/diff:/var/lib/docker/overlay2/06501e4a9f1b13df72873b89dced1ba8d7b7ea87444e5b269771765a46ab9cc8/diff","MergedDir":"/var/lib/docker/overlay2/a1174e81ebbb1c950829b4eaffd79f2bb733e0ecccc77648fb0c3e36bd2e4f40/merged","UpperDir":"/var/lib/docker/overlay2/a1174e81ebbb1c950829b4eaffd79f2bb733e0ecccc77648fb0c3e36bd2e4f40/diff","WorkDir":"/var/lib/docker/overlay2/a1174e81ebbb1c950829b4eaffd79f2bb733e0ecccc77648fb0c3e36bd2e4f40/work"},"Name":"overlay2"},"Mounts":[{"Type":"bind","Source":"/var/run/docker.sock","Destination":"/var/run/docker.sock","Mode":"","RW":true,"Propagation":"rprivate"},{"Type":"bind","Source":"/home/andrea/repos/cluster-api-provider-rke2/out/images","Destination":"/var/lib/rancher/rke2/agent/images","Mode":"","RW":true,"Propagation":"rprivate"},{"Type":"bind","Source":"/lib/modules","Destination":"/lib/modules","Mode":"ro","RW":false,"Propagation":"rprivate"},{"Type":"volume","Name":"be5be41062dc3e50b55c3c56319cad560c142d1e919ec4d7e0d3dcb1780ebcac","Source":"/var/lib/docker/volumes/be5be41062dc3e50b55c3c56319cad560c142d1e919ec4d7e0d3dcb1780ebcac/_data","Destination":"/var","Driver":"local","Mode":"","RW":true,"Propagation":""}],"Config":{"Hostname":"caprke2-e2e-fnk907-pivot-control-plane-qwzwm","Domainname":"","User":"","AttachStdin":false,"AttachStdout":false,"AttachStderr":false,"ExposedPorts":{"6443/tcp":{}},"Tty":true,"OpenStdin":false,"StdinOnce":false,"Env":["KUBECONFIG=/etc/kubernetes/admin.conf","PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin","container=docker","HTTP_PROXY=","HTTPS_PROXY=","NO_PROXY="],"Cmd":null,"Image":"kindest/node:v1.30.4","Volumes":{"/var":{}},"WorkingDir":"/","Entrypoint":["/usr/local/bin/entrypoint","/sbin/init"],"OnBuild":null,"Labels":{"io.x-k8s.kind.cluster":"caprke2-e2e-fnk907-pivot","io.x-k8s.kind.role":"control-plane"},"StopSignal":"SIGRTMIN+3"},"NetworkSettings":{"Bridge":"","SandboxID":"","SandboxKey":"","Ports":{},"HairpinMode":false,"LinkLocalIPv6Address":"","LinkLocalIPv6PrefixLen":0,"SecondaryIPAddresses":null,"SecondaryIPv6Addresses":null,"EndpointID":"","Gateway":"","GlobalIPv6Address":"","GlobalIPv6PrefixLen":0,"IPAddress":"","IPPrefixLen":0,"IPv6Gateway":"","MacAddress":"","Networks":{"kind":{"IPAMConfig":null,"Links":null,"Aliases":null,"MacAddress":"","DriverOpts":null,"NetworkID":"","EndpointID":"","Gateway":"","IPAddress":"","IPPrefixLen":0,"IPv6Gateway":"","GlobalIPv6Address":"","GlobalIPv6PrefixLen":0,"DNSNames":null}}}}
Got logs from the container:
>
E0701 08:02:30.135741 1 controller.go:316] "Reconciler error" err="failed to exec DockerMachine bootstrap: failed to run cloud config: stdout: stderr: : error creating container exec: Error response from daemon: container f44792628f4e98a35e0eeff514cad90def39f18257b592a7bca1338d8ece0a13 is not running" controller="dockermachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="DockerMachine" DockerMachine="bootstrap-pivot-cluster-ue8zsw/caprke2-e2e-fnk907-pivot-control-plane-9x22z" namespace="bootstrap-pivot-cluster-ue8zsw" name="caprke2-e2e-fnk907-pivot-control-plane-9x22z" reconcileID="44905eb4-bf56-4e89-b172-0168572bfa3d"
I could not find any info in logs or docker events.
I dump here all files I collected.
It's not clear to me if this is a CAPD problem or it is due RKE2 bootstrapping.
In cluster-api, a similar test also performs a rollout on the self-managed cluster, however this seems to be working just fine.
Also note that starting the container manually works without issues.
docker start f44792628f4e
What did you expect to happen:
How to reproduce it:
Anything else you would like to add:
capd.txt
docker-events.txt
docker-ps-all.txt
docker-version.txt
journal.txt
Metadata
Metadata
Assignees
Labels
area/ciIssues or PRs related to CIIssues or PRs related to CIkind/bugSomething isn't workingSomething isn't workinglifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.needs-priorityIndicates an issue or PR needs a priority assigning to itIndicates an issue or PR needs a priority assigning to itneeds-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.