-
Couldn't load subscription status.
- Fork 49
Description
Describe the bug
When deploying the argocd-agent in a multi-cluster KIND environment, the agent repeatedly failed to connect to the Redis instance in the principal cluster. Logs from both argocd-agent-agent (workload cluster) and argocd-agent-principal (control-plane cluster) showed DNS resolution failures for the hostname argocd-redis. The error persisted even though the Redis service was running and reachable within the principal cluster.
Steps to reproduce the behaviour
- Create two KIND clusters: one for the principal (kind-argocd-hub) and one for the agent (agent cluster).
- Deploy Argo CD components (including argocd-redis) in the argocd namespace on the principal cluster.
- Deploy argocd-agent-agent in the agent cluster using mTLS configuration.
- Observe agent logs after startup.
Expected behavior
The agent should establish a successful mTLS connection to the principal and communicate through the Redis proxy without DNS lookup failures.
Additional context
Add any other context about the problem here.
Agent logs:
redis: connection pool: failed to dial after 5 attempts: dial tcp: lookup argocd-redis: i/o timeout
time="..." level=error msg="Failed to get cluster info from cache" error="dial tcp: lookup argocd-redis: i/o timeout"Principal logs:
failed to get connection info from cluster: dial tcp: lookup argocd-redis: i/o timeout
This occurred because the agent attempted to resolve the Redis hostname from a separate cluster where the DNS record did not exist.