-
Notifications
You must be signed in to change notification settings - Fork 376
Description
The issue with the current Prometheus support
Currently, the operator annotates Redis and Sentinel pods with the following set of annotations and expects the maintainer of the operator to write a Prometheus scrape config to use these annotations for finding the pods and configuring scraping.
prometheus.io/scrape: "true"
prometheus.io/port: http
prometheus.io/path: /metricsWriting a scrape config is tedious and low-level, so most people prefer the high-level PodMonitor and ServiceMonitor alternatives. Also, sometimes the operator maintainer doesn't have access to the Prometheus configuration (even through additional scrape configs), for example, when they are delegating Prometheus setup and maintenance to another team inside or outside their organization.
The alternatives
We can use PodMonitors and ServiceMonitors instead. But there are a number of ways we can use them and we need to decide which one is best for the target users of this operator.
We can create a single PodMonitor/ServiceMonitor that selects all Redis and Sentinel pods of all the RedisFailover instances, but it limits us in customizing the scraping configuration for each RedisFailover instance (maybe some need to be scraped every second, while 1-minute interval suffices for the rest).
We can make it more configurable by having one PodMonitor per RedisFailover. It fixes the above problem, but in most cases, Sentinels don't need to be scraped as often as Redises.
Creating one PodMonitor for Sentinels and one for Redises per RedisFailover fixes the above problem. But maybe we need to scrape Redis master pod more often.
And finally, we can have three PodMonitors per RedisFailover: one for Redis master, one for Redis slaves, and one for Sentinels. The problem with this solution is that it makes the cluster very crowded, especially if, due to an organization standard, all Redis instances must be created in a single designated Kubernetes namespace.
My suggestion
I believe the best course of action is to start with the simplest alternative and then let the community experiment with their ideas and discuss how to mature this feature.
At first sight, the simplest alternative seems to be the singular PodMonitor for all RedisFailovers, but it prevents the community to experience and experiment with customizing scraping. We all have critical RedisFailovers that we don't want to touch unless absolutely necessary, and not-so-important RedisFailovers that are victims to all our experiments, including the playful ones. If we have a single PodMonitor for all RedisFailovers, we won't experiment on it because it will also affect the critical ones. So, in order to let the community run all kinds of experiments, we need to have separate PodMonitor(s) for each RedisFailover.
If we decide to have a single PodMonitor per RedisFailover, we will face a number of problems in the RedisFailover CRD. To make it easier to follow, I rewrite the RefisFailovers spec struct and some of its composing structs here.
type RedisFailoverSpec struct {
Redis RedisSettings `json:"redis,omitempty"`
Sentinel SentinelSettings `json:"sentinel,omitempty"`
Auth AuthSettings `json:"auth,omitempty"`
LabelWhitelist []string `json:"labelWhitelist,omitempty"`
BootstrapNode *BootstrapSettings `json:"bootstrapNode,omitempty"`
}
type RedisSettings struct {
// ...
Exporter Exporter `json:"exporter,omitempty"`
// ...
}
type SentinelSettings struct {
// ...
Exporter Exporter `json:"exporter,omitempty"`
// ...
}
type Exporter struct {
Enabled bool `json:"enabled,omitempty"`
// ...
}In order to have a single PodMonitor for the entire RedisFailover, we will have to add a PodMonitor field to RedisFailoverSpec, since it belongs to the entire RedisFailover not just the Redis or Sentinel part. The sole function of this PodMonitor will be to connect Prometheus with the endpoints created by the exporters of Redis and Sentinel.
Both these exporters can be disabled, in which case, the creation of the PodMonitor CR is futile and possibly confusing. Also, if you enable only one of them, PodMonitor will only be able to connect that one to Prometheus and it won't be clear which one for the maintainer unless they check the RedisFailover or the pod containers, again a little confusing.
Also, if you disable PodMonitor and enable both Exporters, they won't be connected to Prometheus. So, since the functionality of the PodMonitor and Exporters are so intertwined, from a design standpoint, it makes more sense to put them closer to each other and preferably let them share a few subfields, for example, Enabled.
The easiest way to get rid of this poor design and confusion is by having one PodMonitor for Redises and one for Sentinels, adding a PodMonitor subfield to Exporter, and letting the Exporter's Enabled field control the creation of PodMonitor.
type Exporter struct {
Enabled bool `json:"enabled,omitempty"`
PodMonitor PodMonitorConfig `json:"podMonitor,omitempty"`
// ...
}