Skip to content

Use PodMonitors/ServiceMonitors instead of annotations for configuring Prometheus scraping #623

@hashemi-soroush

Description

@hashemi-soroush

The issue with the current Prometheus support

Currently, the operator annotates Redis and Sentinel pods with the following set of annotations and expects the maintainer of the operator to write a Prometheus scrape config to use these annotations for finding the pods and configuring scraping.

prometheus.io/scrape: "true"
prometheus.io/port: http
prometheus.io/path: /metrics

Writing a scrape config is tedious and low-level, so most people prefer the high-level PodMonitor and ServiceMonitor alternatives. Also, sometimes the operator maintainer doesn't have access to the Prometheus configuration (even through additional scrape configs), for example, when they are delegating Prometheus setup and maintenance to another team inside or outside their organization.

The alternatives

We can use PodMonitors and ServiceMonitors instead. But there are a number of ways we can use them and we need to decide which one is best for the target users of this operator.

We can create a single PodMonitor/ServiceMonitor that selects all Redis and Sentinel pods of all the RedisFailover instances, but it limits us in customizing the scraping configuration for each RedisFailover instance (maybe some need to be scraped every second, while 1-minute interval suffices for the rest).

We can make it more configurable by having one PodMonitor per RedisFailover. It fixes the above problem, but in most cases, Sentinels don't need to be scraped as often as Redises.

Creating one PodMonitor for Sentinels and one for Redises per RedisFailover fixes the above problem. But maybe we need to scrape Redis master pod more often.

And finally, we can have three PodMonitors per RedisFailover: one for Redis master, one for Redis slaves, and one for Sentinels. The problem with this solution is that it makes the cluster very crowded, especially if, due to an organization standard, all Redis instances must be created in a single designated Kubernetes namespace.

My suggestion

I believe the best course of action is to start with the simplest alternative and then let the community experiment with their ideas and discuss how to mature this feature.

At first sight, the simplest alternative seems to be the singular PodMonitor for all RedisFailovers, but it prevents the community to experience and experiment with customizing scraping. We all have critical RedisFailovers that we don't want to touch unless absolutely necessary, and not-so-important RedisFailovers that are victims to all our experiments, including the playful ones. If we have a single PodMonitor for all RedisFailovers, we won't experiment on it because it will also affect the critical ones. So, in order to let the community run all kinds of experiments, we need to have separate PodMonitor(s) for each RedisFailover.

If we decide to have a single PodMonitor per RedisFailover, we will face a number of problems in the RedisFailover CRD. To make it easier to follow, I rewrite the RefisFailovers spec struct and some of its composing structs here.

type RedisFailoverSpec struct {
	Redis          RedisSettings      `json:"redis,omitempty"`
	Sentinel       SentinelSettings   `json:"sentinel,omitempty"`
	Auth           AuthSettings       `json:"auth,omitempty"`
	LabelWhitelist []string           `json:"labelWhitelist,omitempty"`
	BootstrapNode  *BootstrapSettings `json:"bootstrapNode,omitempty"`
}
type RedisSettings struct {
        // ...
	Exporter    Exporter    `json:"exporter,omitempty"`
        // ...
}
type SentinelSettings struct {
        // ...
	Exporter    Exporter    `json:"exporter,omitempty"`
        // ...
}
type Exporter struct {
	Enabled    bool    `json:"enabled,omitempty"`
        // ...
}

In order to have a single PodMonitor for the entire RedisFailover, we will have to add a PodMonitor field to RedisFailoverSpec, since it belongs to the entire RedisFailover not just the Redis or Sentinel part. The sole function of this PodMonitor will be to connect Prometheus with the endpoints created by the exporters of Redis and Sentinel.

Both these exporters can be disabled, in which case, the creation of the PodMonitor CR is futile and possibly confusing. Also, if you enable only one of them, PodMonitor will only be able to connect that one to Prometheus and it won't be clear which one for the maintainer unless they check the RedisFailover or the pod containers, again a little confusing.

Also, if you disable PodMonitor and enable both Exporters, they won't be connected to Prometheus. So, since the functionality of the PodMonitor and Exporters are so intertwined, from a design standpoint, it makes more sense to put them closer to each other and preferably let them share a few subfields, for example, Enabled.

The easiest way to get rid of this poor design and confusion is by having one PodMonitor for Redises and one for Sentinels, adding a PodMonitor subfield to Exporter, and letting the Exporter's Enabled field control the creation of PodMonitor.

type Exporter struct {
	Enabled       bool                `json:"enabled,omitempty"`
        PodMonitor    PodMonitorConfig    `json:"podMonitor,omitempty"`
        // ...
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions