Use PodMonitors/ServiceMonitors instead of annotations for configuring Prometheus scraping

## The issue with the current Prometheus support
Currently, the operator annotates Redis and Sentinel pods with the following set of annotations and expects the maintainer of the operator to write a Prometheus scrape config to use these annotations for finding the pods and configuring scraping.
```yaml
prometheus.io/scrape: "true"
prometheus.io/port: http
prometheus.io/path: /metrics
```
Writing a scrape config is tedious and low-level, so most people prefer the high-level `PodMonitor` and `ServiceMonitor` alternatives. Also, sometimes the operator maintainer doesn't have access to the Prometheus configuration (even through `additional scrape configs`), for example, when they are delegating Prometheus setup and maintenance to another team inside or outside their organization.

## The alternatives
We can use `PodMonitor`s and `ServiceMonitor`s instead. But there are a number of ways we can use them and we need to decide which one is best for the target users of this operator.

We can create a single `PodMonitor`/`ServiceMonitor` that selects all Redis and Sentinel pods of all the `RedisFailover` instances, but it limits us in customizing the scraping configuration for each `RedisFailover` instance (maybe some need to be scraped every second, while 1-minute interval suffices for the rest).

We can make it more configurable by having one `PodMonitor` per `RedisFailover`. It fixes the above problem, but in most cases, Sentinels don't need to be scraped as often as Redises.

Creating one `PodMonitor` for Sentinels and one for Redises per `RedisFailover` fixes the above problem. But maybe we need to scrape Redis master pod more often.

And finally, we can have three `PodMonitor`s per `RedisFailover`: one for Redis master, one for Redis slaves, and one for Sentinels. The problem with this solution is that it makes the cluster very crowded, especially if, due to an organization standard, all Redis instances must be created in a single designated Kubernetes namespace. 

## My suggestion
I believe the best course of action is to start with the simplest alternative and then let the community experiment with their ideas and discuss how to mature this feature.

At first sight, the simplest alternative seems to be the singular `PodMonitor` for all `RedisFailover`s, but it prevents the community to experience and experiment with customizing scraping. We all have critical `RedisFailover`s that we don't want to touch unless absolutely necessary, and not-so-important `RedisFailover`s that are victims to all our experiments, including the playful ones. If we have a single `PodMonitor` for all `RedisFailover`s, we won't experiment on it because it will also affect the critical ones. So, in order to let the community run all kinds of experiments, we need to have separate `PodMonitor`(s) for each `RedisFailover`.

If we decide to have a single `PodMonitor` per `RedisFailover`, we will face a number of problems in the `RedisFailover` CRD. To make it easier to follow, I rewrite the `RefisFailover`s spec struct and some of its composing structs here. 
```go
type RedisFailoverSpec struct {
	Redis          RedisSettings      `json:"redis,omitempty"`
	Sentinel       SentinelSettings   `json:"sentinel,omitempty"`
	Auth           AuthSettings       `json:"auth,omitempty"`
	LabelWhitelist []string           `json:"labelWhitelist,omitempty"`
	BootstrapNode  *BootstrapSettings `json:"bootstrapNode,omitempty"`
}
type RedisSettings struct {
        // ...
	Exporter    Exporter    `json:"exporter,omitempty"`
        // ...
}
type SentinelSettings struct {
        // ...
	Exporter    Exporter    `json:"exporter,omitempty"`
        // ...
}
type Exporter struct {
	Enabled    bool    `json:"enabled,omitempty"`
        // ...
}
```
In order to have a single `PodMonitor` for the entire `RedisFailover`, we will have to add a `PodMonitor` field to `RedisFailoverSpec`, since it belongs to the entire `RedisFailover` not just the `Redis` or `Sentinel` part. The sole function of this `PodMonitor` will be to connect Prometheus with the endpoints created by the exporters of Redis and Sentinel.

Both these exporters can be disabled, in which case, the creation of the `PodMonitor` CR is futile and possibly confusing. Also, if you enable only one of them, `PodMonitor` will only be able to connect that one to Prometheus and it won't be clear which one for the maintainer unless they check the `RedisFailover` or the pod containers, again a little confusing.

Also, if you disable `PodMonitor` and enable both `Exporter`s, they won't be connected to Prometheus. So, since the functionality of the `PodMonitor` and `Exporter`s are so intertwined, from a design standpoint, it makes more sense to put them closer to each other and preferably let them share a few subfields, for example, `Enabled`.

The easiest way to get rid of this poor design and confusion is by having one `PodMonitor` for Redises and one for Sentinels, adding a `PodMonitor` subfield to `Exporter`, and letting the `Exporter`'s `Enabled` field control the creation of `PodMonitor`.
```go
type Exporter struct {
	Enabled       bool                `json:"enabled,omitempty"`
        PodMonitor    PodMonitorConfig    `json:"podMonitor,omitempty"`
        // ...
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use PodMonitors/ServiceMonitors instead of annotations for configuring Prometheus scraping #623

The issue with the current Prometheus support

The alternatives

My suggestion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use PodMonitors/ServiceMonitors instead of annotations for configuring Prometheus scraping #623

Description

The issue with the current Prometheus support

The alternatives

My suggestion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions