Skip to content

Monitoring the CloudZero Agent

Evan Nemerson edited this page Nov 26, 2025 · 3 revisions

Note: This document is under construction. We are actively working on ways to improve observability within the CloudZero Agent. This document currently focuses on monitoring the webhook server using Kubernetes API server metrics; we will be expanding this document in the future with additional methods for monitoring other components of the agent.

Monitoring Webhook Performance in CloudZero Agent

This document explains how to monitor the CloudZero Agent webhook's performance impact on your Kubernetes cluster using Kubernetes API server metrics.

Why Monitor Webhook Performance?

The CloudZero Agent webhook operates as a Kubernetes validating admission controller that intercepts resource operations (CREATE, UPDATE, DELETE) before they're persisted to etcd. While the webhook is designed to be fast and non-blocking, it's important to monitor its performance impact to ensure it doesn't introduce significant latency into your cluster operations.

Key concerns include:

  • API request latency: How much time does the webhook add to resource operations?
  • Network overhead: Time spent on TLS handshakes, network transit, and proxy components
  • Service mesh impact: Additional latency introduced by service meshes (e.g., Istio, Linkerd)
  • Operational visibility: Understanding the webhook's behavior during high-load scenarios

Why API Server Metrics?

Monitoring webhook performance from the webhook server itself only captures part of the picture. The webhook server can measure how long it takes to process a request once received, but it cannot measure:

  • TLS handshake time
  • Network transit time (both directions)
  • Service mesh overhead (Istio, Linkerd, etc.)
  • API server queuing time
  • Connection setup overhead

The API server metrics provide the complete end-to-end latency from when the API server initiates the webhook call to when it receives the response. This is the metric that matters for understanding the webhook's impact on cluster operations.

The Right Metric: apiserver_admission_webhook_admission_duration_seconds

Kubernetes exposes a STABLE metric specifically for tracking admission webhook latency (see the Kubernetes Metrics Reference):

apiserver_admission_webhook_admission_duration_seconds

Metric Details:

  • Type: Histogram
  • Stability: STABLE
  • Labels:
    • name: Webhook name (e.g., "cz-agent-cloudzero-agent-webhook-server-webhook.cloudzero-agent.svc")
    • operation: API operation (CREATE, UPDATE, DELETE, CONNECT)
    • rejected: Whether the request was rejected ("true" or "false")
    • type: Webhook type ("validating" or "admit")
  • Buckets: [0.005, 0.025, 0.1, 0.5, 1, 2.5, 10, 25] seconds

This metric captures the complete round-trip time including:

  • TLS handshake and connection setup
  • Network transit (to webhook server and back)
  • Service mesh proxy processing (if applicable)
  • Webhook server processing time
  • Any queuing or retry logic

Accessing the Metrics

The metric is exposed by the Kubernetes API server on the /metrics endpoint. If you have Prometheus configured to scrape the API server, this metric will be available automatically. You can query it directly using:

kubectl get --raw /metrics | grep apiserver_admission_webhook_admission_duration_seconds

To see metrics specific to the CloudZero webhook, filter by the webhook name in your Prometheus queries.

Webhook Server Metrics (Supplementary)

While API server metrics provide the complete picture, the webhook server itself also exposes metrics that can help diagnose issues:

Available at: http://<webhook-pod-ip>:9090/metrics

Key metrics:

  • http_request_duration_seconds: Server-side processing time (excludes network/TLS)
  • http_requests_total: Request count by status code
  • czo_webhook_types_total: Webhook events by resource type and operation

Note: These metrics only show webhook server processing time and don't include network latency or TLS overhead. They're useful for isolating whether high latency is due to webhook processing or network/infrastructure issues.

Comparison: API Server vs. Webhook Server Metrics

Aspect API Server Metrics Webhook Server Metrics
Scope Complete end-to-end latency Server processing only
Includes TLS ✅ Yes ❌ No
Includes Network ✅ Yes ❌ No
Includes Sidecars ✅ Yes ❌ No
Granularity Per webhook name All requests
Recommended for Performance monitoring Debugging webhook logic

Recommendation: Use API server metrics for understanding the webhook's impact on cluster operations. Use webhook server metrics only for debugging specific webhook processing issues.

Conclusion

Monitoring webhook performance is essential for maintaining cluster health and understanding the CloudZero Agent's operational impact. The apiserver_admission_webhook_admission_duration_seconds metric provides complete visibility into webhook latency, including all network and infrastructure overhead that affects actual cluster operations.

By monitoring this metric and setting appropriate alerts, you can ensure the CloudZero Agent webhook maintains minimal impact on your cluster's performance while providing valuable cost attribution data.

Monitoring for OOM Kills

This section covers how to detect and alert on Out-Of-Memory (OOM) kills in your Kubernetes cluster, including CloudZero Agent components.

Why Monitor OOM Kills?

OOM kills occur when a container exceeds its memory limit and is terminated by the kernel. This can cause:

  • Service disruption: Pods restart, causing temporary unavailability
  • Data loss: In-flight requests or unsaved state may be lost
  • Performance degradation: Constant restart loops consume cluster resources
  • Cascading failures: Multiple OOM kills can indicate systemic resource issues

Monitoring OOM kills helps you:

  • Detect undersized memory requests/limits
  • Identify memory leaks or inefficient code
  • Right-size workloads for cost optimization
  • Prevent production incidents

OOM Kill Detection Strategy

We suggest a two-tier approach:

  1. Reactive detection: Alert when OOM kills have occurred
  2. Proactive monitoring: Warn when memory usage exceeds requests (before hitting limits)

Why Monitor Against Requests vs Limits?

  • Limits trigger OOM kills: When a container exceeds its memory limit, the kernel OOMKills it
  • Requests indicate expected usage: When actual usage significantly exceeds requests, it suggests:
    • Requests are undersized (scheduling problems)
    • Memory usage is growing unexpectedly (potential leaks)
    • Workload characteristics have changed

Monitoring against requests provides early warning before containers approach their limits and get killed. This is especially valuable for cost optimization and capacity planning.

Required Metrics Source: kube-state-metrics

kube-state-metrics provides metrics about OOMKills. This is an extremely common Kubernetes component that exposes pod and container state as Prometheus metrics.

Deployment: Most Kubernetes monitoring stacks (kube-prometheus-stack, etc.) include kube-state-metrics by default.

Verification:

# Check if kube-state-metrics is running
kubectl get pods -A | grep kube-state-metrics

# Verify OOM metrics are exposed
kubectl port-forward -n kube-system deployment/kube-state-metrics 8080:8080
curl http://localhost:8080/metrics | grep kube_pod_container_status_last_terminated_reason

Metrics for OOM Kill Detection

Reactive: Detecting OOM Kills

Primary metric (from kube-state-metrics):

kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}

This metric indicates that a container's last termination was due to an OOM kill.

Labels:

  • namespace: Kubernetes namespace
  • pod: Pod name
  • container: Container name
  • reason: Termination reason (filter for "OOMKilled")

Example query - Current OOMKilled containers:

kube_pod_container_status_last_terminated_reason{reason="OOMKilled"} == 1

Example query - OOM kill rate over time:

sum(rate(kube_pod_container_status_restarts_total[5m])) by (namespace, pod, container)
  and on(namespace, pod, container)
  kube_pod_container_status_last_terminated_reason{reason="OOMKilled"} == 1

Proactive: Memory Pressure Detection

Monitoring against requests (from cAdvisor/kubelet):

# Memory usage as percentage of request
(
  container_memory_working_set_bytes{container!="", container!="POD"}
  /
  kube_pod_container_resource_requests{resource="memory"}
) * 100

Key metrics:

  • container_memory_working_set_bytes: Actual memory usage (what Kubernetes uses for eviction decisions)
  • kube_pod_container_resource_requests{resource="memory"}: Memory request for the container

Why container_memory_working_set_bytes? This metric excludes cached data that can be evicted, representing the true memory pressure. It's the same metric Kubernetes uses for OOM decisions.