Skip to content

redis: connection pool: failed to dial after 5 attempts: dial tcp: lookup argocd-redis: i/o timeout #611

@bianbbc87

Description

@bianbbc87

Describe the bug
When deploying the argocd-agent in a multi-cluster KIND environment, the agent repeatedly failed to connect to the Redis instance in the principal cluster. Logs from both argocd-agent-agent (workload cluster) and argocd-agent-principal (control-plane cluster) showed DNS resolution failures for the hostname argocd-redis. The error persisted even though the Redis service was running and reachable within the principal cluster.

Steps to reproduce the behaviour

  1. Create two KIND clusters: one for the principal (kind-argocd-hub) and one for the agent (agent cluster).
  2. Deploy Argo CD components (including argocd-redis) in the argocd namespace on the principal cluster.
  3. Deploy argocd-agent-agent in the agent cluster using mTLS configuration.
  4. Observe agent logs after startup.

Expected behavior
The agent should establish a successful mTLS connection to the principal and communicate through the Redis proxy without DNS lookup failures.

Additional context
Add any other context about the problem here.
Agent logs:

redis: connection pool: failed to dial after 5 attempts: dial tcp: lookup argocd-redis: i/o timeout
time="..." level=error msg="Failed to get cluster info from cache" error="dial tcp: lookup argocd-redis: i/o timeout"

Principal logs:

failed to get connection info from cluster: dial tcp: lookup argocd-redis: i/o timeout

This occurred because the agent attempted to resolve the Redis hostname from a separate cluster where the DNS record did not exist.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions