Skip to content

Commit 3332528

Browse files
committed
docs: Summarize hackathon topics
1 parent ba11517 commit 3332528

File tree

2 files changed

+348
-0
lines changed

2 files changed

+348
-0
lines changed
Lines changed: 348 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,348 @@
1+
---
2+
title: December 2025
3+
weight: -202512
4+
outline: 2
5+
---
6+
7+
# Hack The Garden 12/2025 Wrap Up
8+
9+
- 🗓️ **Date:** 08.12.2025 – 12.12.2025
10+
- 📍 **Location:** SAP Center Walldorf
11+
- 👤 **Organizer:** SAP
12+
- 📘 **Topics:** https://hackmd.io/
13+
- 🎤 **Review Meeting Summary:** https://gardener.cloud/community/review-meetings/2025-reviews/
14+
15+
![Group picture](./images/2025-12.jpg)
16+
17+
## ⚙️ OIDC Webhook Garden Class Extension
18+
19+
### Problem Statement
20+
21+
Install OIDC in Garden cluster as a Garden class extension to enable proper authentication mechanisms.
22+
23+
### Motivation & Benefits
24+
25+
Enable OIDC authentication for the Garden cluster through a proper extension mechanism, improving security and authentication capabilities.
26+
27+
### Achievements
28+
29+
* Extensions were adapted to manage garden class extensions.
30+
* Successfully installed the OIDC Webhook Authenticator for the Garden cluster.
31+
* Identified several gaps in Gardener with respect to Garden Extensions:
32+
* Garden namespace is not labelled with `extension-type: true` label, so admission webhooks namespace selector does not work.
33+
* `EnsureKubeAPIServer` handler needs to add support for deployments named `virtual-garden-kube-apiserver`.
34+
* Health Check controller needs preparation for garden use-case.
35+
* SecretsManager utility needs to create a manager from the garden resource.
36+
* Need to avoid collisions when a runtime cluster is also a seed and have multiple extension controllers working on the same resource.
37+
38+
### Next Steps
39+
40+
* Finalize changes in extension (mainly adapt tests).
41+
* Migrate from LSS component to extension.
42+
43+
### Code & Pull Requests
44+
45+
* [gardener/gardener#13635](https://github.com/gardener/gardener/issues/13635)
46+
* [gardener/gardener#13662](https://github.com/gardener/gardener/issues/13662)
47+
48+
### Contributors
49+
50+
@Theodora Tosheva, @Tim Usner, @Vladimir Nachev
51+
52+
## 🔍 Use Dummy RBAC Resources+Verb to Identify DoD/SRE and Auditor
53+
54+
### Problem Statement
55+
56+
Dashboard & Gardener APIServer rely on RBAC to determine your role (e.g. admin).
57+
This must replace the checks for "Can I list all secrets" and "Can I list all Projects".
58+
59+
### Motivation & Benefits
60+
61+
Improve RBAC-based role detection by using dedicated dummy resources and verbs instead of relying on permission checks for sensitive resources like secrets and projects.
62+
63+
### Achievements
64+
65+
* The topic was explored and the problem statement was defined.
66+
* A proposition for a solution that will improve the current state was developed.
67+
68+
### Next Steps
69+
70+
* Wait for input by the community on the best approach.
71+
72+
### Code & Pull Requests
73+
74+
* Issue: [gardener/gardener#13657](https://github.com/gardener/gardener/issues/13657)
75+
* Proposed solution: [dimityrmirchev/gardener:virtualrbac](https://github.com/dimityrmirchev/gardener/tree/virtualrbac)
76+
77+
### Contributors
78+
79+
@Vladimir Nachev, @Dimitar Mirchev
80+
81+
## 🔒 Private API Servers
82+
83+
### Problem Statement
84+
85+
Gardener exposes API servers of shoot clusters to the internet on all infrastructures except for SAP Cloud Infrastructure where this is only opt-in.
86+
While it is very convenient to be able to reach the API server from everywhere, it has security implications by making it easier for malicious actors to attack the API server.
87+
88+
### Motivation & Benefits
89+
90+
With technical features like private link or private service connect, it should be possible to provide private API servers to shoot clusters, reducing the attack surface for security conscious stakeholders.
91+
92+
### Achievements
93+
94+
Work in progress on implementing private API server support.
95+
96+
### Next Steps
97+
98+
Continue implementation and testing of private API server functionality.
99+
100+
### Contributors
101+
102+
@Konstantinos Angelopoulos, @Aleksandar Savchev, @Johannes Scheerer
103+
104+
## 🔐 Use GitHub OIDC Federation to Get Rid of GitHub Apps (Again)
105+
106+
### Problem Statement
107+
108+
GHA-Pipelines can make use of trust-based-authentication (for Gardener-CICD, we already do this for shoot clusters and hyperscalers).
109+
However, there is no builtin support for access to either other GitHub-(Enterprise)-Instances, or even cross-repository/cross-organisation-access.
110+
111+
### Motivation & Benefits
112+
113+
Eliminate the need for GitHub-Apps for cross-github-authentication by leveraging OIDC federation, improving security and reducing maintenance overhead.
114+
115+
### Achievements
116+
117+
* Explored existing PoCs from OSPO and Chainguard's Secure Token Service ([octo-sts](https://github.com/octo-sts)).
118+
* Evaluated the feasibility of setting up a custom instance with an internet-facing token-exchange-endpoint.
119+
120+
### Next Steps
121+
122+
* Set up a custom instance, including support for github.wdf.sap.corp.
123+
* Complete implementation with the ultimate goal of eliminating GitHub-Apps usage for cross-github-authentication.
124+
125+
### Code & Pull Requests
126+
127+
* Chainguard STS: [github.com/apps/octo-sts](https://github.com/apps/octo-sts)
128+
* Source code: [github.com/octo-sts](https://github.com/octo-sts)
129+
130+
### Contributors
131+
132+
@Christian Cwienk, @Jonas Brand
133+
134+
135+
## ✅ Provide Validation Job of Kyverno Policies Without Kyverno
136+
137+
### Problem Statement
138+
139+
It is generally agreed that validating Kubernetes resources based on common policies is a good health measure for Kubernetes deployments and increases security.
140+
Today's options are limited: Kyverno provides this functionality but can be problematic for cluster health and requires significant effort to implement.
141+
Kubernetes admission policies are only applicable to newly deployed resources, not existing ones.
142+
143+
### Motivation & Benefits
144+
145+
Have lightweight jobs that regularly check all resources in the cluster and report violations based on common (Kyverno) policies, without requiring a full Kyverno installation.
146+
147+
### Achievements
148+
149+
* Validated that the Kyverno CLI can verify policies against a target cluster with some limitations (e.g., it cannot check CRDs).
150+
* Identified CLI deficiencies:
151+
* The report command does not produce valid YAML but includes free text requiring special parsing.
152+
* The command errors on policy violations, making it hard to distinguish between command errors and failing policies.
153+
* Created a PoC integration of the Kyverno CLI into DIKI.
154+
155+
### Next Steps
156+
157+
* Address the identified CLI limitations.
158+
* Complete the integration into DIKI and prepare for production use.
159+
160+
### Code & Pull Requests
161+
162+
* PoC branch: [dimityrmirchev/diki:kyverno-ruleset-poc](https://github.com/dimityrmirchev/diki/tree/kyverno-ruleset-poc)
163+
164+
### Contributors
165+
166+
@Dirk Marwinski, @Linus Roepert, @Dimitar Mirchev
167+
168+
169+
## 📡 Observability Signals Externalization
170+
171+
### Problem Statement
172+
173+
Currently, observability signals (logs and metrics) for clients are persisted in the shoot control-plane namespace of the shoot clusters running in the seed.
174+
GEP-34 laid down the foundations of OpenTelemetry support in Gardener, making it possible for signals to be collected and processed by OpenTelemetry collectors.
175+
However, the observability signals we persist still have a fixed retention policy over which clients don't have any control.
176+
177+
### Motivation & Benefits
178+
179+
Allowing control-plane signals such as metrics and logs to be forwarded and persisted by cluster owners to their own OpenTelemetry-aware endpoint would let clients configure and manage their own retention policies.
180+
181+
### Achievements
182+
183+
* OTel extension is in a functional state with local setup ready.
184+
* ServiceMonitors from shoot control-plane namespace are being discovered by Target Allocator (TA) and scraped by the Collector.
185+
* Implemented authentication between the OTel exporter and remote OTel receiver.
186+
* Implemented mTLS using Gardener secrets manager between Target Allocator and Collector.
187+
* Defined Extension API.
188+
189+
### Next Steps
190+
191+
* Polish the implementation (complete TODO items, cleanup the API).
192+
* Increase test coverage.
193+
* Add support for shipping signals to an OTLP gRPC endpoint.
194+
* Add support for shipping logs.
195+
* Follow up on upstream issues related to mTLS.
196+
197+
### Contributors
198+
199+
@Rafael Franzke, @Marin Nikolov, @Nikolai Dokovski
200+
201+
## 🛰️ Perses Dashboards & Explore Plugin
202+
203+
### Problem Statement
204+
205+
Version [0.53.0-beta2](https://github.com/perses/perses/releases/tag/v0.53.0-beta.2) of Perses now supports VictoriaLogs as a data source, aligning with [GEP-35](https://github.com/gardener/gardener/pull/13242) where VictoriaLogs will replace Vali as the backend of logs from all clusters.
206+
[Plutono](https://github.com/credativ/plutono) is the current UI for displaying metrics from Prometheus and Vali, but we're aiming at replacing it in the same timeframe as Vali.
207+
However, feature parity is not there yet because Perses does not have an explore plugin for logs yet.
208+
209+
### Motivation & Benefits
210+
211+
Achieve a full replacement of Plutono with Perses, including configuration with Prometheus and VictoriaLogs as data sources, migrating all dashboards, and achieving full feature parity with the current solution.
212+
213+
### Achievements
214+
215+
* Evaluated what would be needed to achieve a full replacement of Plutono with Perses.
216+
* Explored existing plugins (prometheus, pyroscope, and tempo) that can be used as a frame of reference for a Log Explorer plugin.
217+
* Identified operator shortcomings that need to be addressed.
218+
219+
### Next Steps
220+
221+
* Contribute the Log Explorer plugin to remove one of the roadblocks on the way to replacing Plutono.
222+
* Address operator shortcomings:
223+
* Add emptyDir support to avoid reliance on PersistentVolumes for config files.
224+
* Find solutions for creating global datasources needed for explore mode.
225+
* Complete dashboard migration and feature parity assessment.
226+
227+
### Contributors
228+
229+
@Jeremy Rickards, @Radoslav Hubenov, @Luca Bernstein, @Alexander Hebel
230+
231+
## 🔭 Gardener OpenTelemetry Receiver
232+
233+
### Problem Statement
234+
235+
The gardener-metrics-exporter provides information about the condition of Shoots, Seeds, and other Gardener resources in the Prometheus metric format.
236+
There is an ongoing initiative to replace gardener-metrics-exporter with kube-state-metrics and its custom resource state monitoring feature to avoid maintaining a custom exporter.
237+
The OpenTelemetry Collector introduced a receiver concept, which is an alternative to Prometheus exporters and can provide OTLP metrics without any conversions.
238+
239+
### Motivation & Benefits
240+
241+
Develop a custom receiver for the OpenTelemetry Collector that provides similar information as the gardener-metrics-exporter and evaluate if our planned migration to kube-state-metrics is still the right choice with our Observability 2.0 strategy.
242+
The custom receiver could be used by stakeholders to monitor the condition of their Shoots and ingest the metrics into a managed backend like Dynatrace or the BTP Cloud Logging Service, requiring only a (lightweight) OpenTelemetry Collector instead of a full Prometheus instance.
243+
244+
### Achievements
245+
246+
* Implemented OTel receiver prototype by following the tutorials from the OTel community for building a receiver and building a custom collector.
247+
* Emitting metrics with shoot and seed information.
248+
* Demo setup to push Gardener metrics to Prometheus using its OTLP API.
249+
250+
### Next Steps
251+
252+
* Align within the monitoring team on whether to continue with this approach.
253+
* Add missing metrics to reach feature parity with gardener-metrics-exporter.
254+
* Re-think the metric modeling to follow best practices of the OTel ecosystem.
255+
* Bring the prototype into a proper shape (e.g., proper use of k8s client with informers, adding tests, update to latest collector dependencies, etc.).
256+
257+
### Contributors
258+
259+
@Victor Herrero Otal, @Christoph Kleineweber, @Bozhidar Atanasov
260+
261+
## 🔄 PoC: Replace Prometheus Exporters by OpenTelemetry Receivers in Gardener Monitoring Stack
262+
263+
### Problem Statement
264+
265+
The Gardener monitoring stack uses various upstream Prometheus exporters to collect typical metrics of a Kubernetes cluster, including node_exporter, kube-state-metrics, and cAdvisor.
266+
The OpenTelemetry project introduced the receiver concept in the Collector component to gather telemetry data.
267+
The opentelemetry-collector-contrib GitHub repository already contains multiple receivers that serve a similar purpose as the mentioned Prometheus exporters.
268+
269+
### Motivation & Benefits
270+
271+
Build a PoC setup purely based on native OpenTelemetry components to monitor a Kubernetes cluster and gather experience about the best telemetry data sources when implementing the Observability 2.0 strategy.
272+
273+
### Achievements
274+
275+
* Built a PoC setup using native OpenTelemetry components.
276+
* Compared telemetry data approaches between Prometheus exporters and OpenTelemetry receivers.
277+
278+
### Next Steps
279+
280+
* Identify potential gaps in the metrics provided by the OpenTelemetry receivers.
281+
* Complete the comparison and document findings.
282+
* Decide on the adoption strategy for Observability 2.0.
283+
284+
### Contributors
285+
286+
@Plamen Kokanov, @Stoyan Vitanov
287+
288+
## 🔕 PoC: Investigate Alertmanager Silence-Operator
289+
290+
### Problem Statement
291+
292+
Investigate the operational cost benefit of eliminating alertmanager silence PVs and assess if it's worth introducing the silence operator for alertmanager.
293+
294+
### Motivation & Benefits
295+
296+
Reduce operational costs and complexity by potentially eliminating the need for PersistentVolumes for alertmanager silences.
297+
298+
### Achievements
299+
300+
* Investigated the operational cost benefits.
301+
* Assessed the feasibility of introducing the silence operator.
302+
* Documented results in HackMD.
303+
304+
### Next Steps
305+
306+
* Review findings with the team.
307+
* Decide on adoption strategy based on cost-benefit analysis.
308+
309+
### Code & Pull Requests
310+
311+
* Results: HackMD
312+
313+
### Contributors
314+
315+
@Marc Vornetran, @Vladimir Nachev, @Viktor Kostov
316+
317+
## 🔀 PoC: Investigate OpenTelemetry as Replacement for Federation to Garden from Aggregate Prometheus
318+
319+
### Problem Statement
320+
321+
Current federation from aggregate to garden Prometheus requires access from the runtime cluster to seeds.
322+
Typically in Gardener, connection originates from seeds to the runtime cluster.
323+
This creates an undesired communication direction.
324+
325+
### Motivation & Benefits
326+
327+
Assess the feasibility of replacing current federation by using OpenTelemetry to push metrics into the garden Prometheus, effectively turning the communication direction around.
328+
Also compare this approach to using Prometheus native remote write.
329+
330+
### Achievements
331+
332+
* Assessed the feasibility of using OpenTelemetry as a replacement for federation.
333+
* Compared OpenTelemetry approach with Prometheus native remote write.
334+
* Documented results in HackMD.
335+
336+
### Next Steps
337+
338+
* Review findings with the team.
339+
* Decide on the implementation approach based on the comparison.
340+
* Plan migration strategy if approach is adopted.
341+
342+
### Code & Pull Requests
343+
344+
* Results: HackMD
345+
346+
### Contributors
347+
348+
@Viktor Kostov, @Marc Vornetran, @Victor Herrero Otal
1.81 MB
Loading

0 commit comments

Comments
 (0)