-
Notifications
You must be signed in to change notification settings - Fork 137
Description
Hi,
We have a Kubernetes node configured with the stargz snapshotter and kubeconfig-based authentication. At first glance, the snapshotter is working, and images can be pulled. But after letting it run for a couple of hours, we see issues with pulling big images from private repositories.
When trying to run big images, with sizes up to 120 GiB, we see many errors like: Sep 10 12:27:46 containerd-stargz-grpc[2300160]: {"error":"failed to resolve layer: failed to resolve layer \"sha256:1ecba421368dbe43565a8e2c35edf8c97ccc59e1933cde70bf61c794c80d0a1e\" from \"<registry>/<image>:<tag>\": failed to resolve the blob: failed to resolve the source: cannot resolve layer: failed to redirect (host \"<registry>\", <registry>/<image> ref:\":<tag>\", digest:\"sha256:1ecba421368dbe43565a8e2c35edf8c97ccc59e1933cde70bf61c794c80d0a1e\"): failed to access to the registry with code 401: failed to resolve: failed to resolve target","key":"k8s.io/3028/extract-44511699-zbUS sha256:55b5e97365c06a7a1c9f76e6fd6185dea55480a0fcb96692db186fdcd235c273","level":"warning","msg":"failed to prepare remote snapshot","parent":"sha256:ed2a93970076f33326c48d2d8c21828e8f60808b1e077777a71ac7be3a9a435f","remote-snapshot-prepared":"false","time":"2025-09-10T12:27:46.132357580Z"}
The snapshotter has access to the kubeconfig, and some images manage to get pulled, so I'm not sure what the issue is and why the error is failed to redirect. The errors causes the image pull to fallback to containerd, making the node go out of disk.
In the past, we were using the CRI-based authentication. The images were successfully pulled, but since the credentials are not persisted on the node, we moved to kubeconfig authentication to be resilient to restarts: #1989
I found this issue that seems related: #1584 (comment)