Allow configurable MaxConcurrentReconciles in Rufio controllers

## Problem

Currently, Rufio controllers default to controller-runtime's [default value](https://github.com/kubernetes-sigs/controller-runtime/blob/v0.21.0/pkg/controller/controller.go#L48-L49) for `MaxConcurrentReconciles` in all its controllers. Since the controller-runtime default is set to 1, this means Rufio can handle at most one BMC operation during a reconciliation.

When multiple nodes are getting provisioned concurrently by the Tinkerbell stack, this creates a significant bottleneck for operations like netboot jobs to complete.

## Proposed Solution

Allow Rufio deployment to be configurable with concurrent reconciliations to improve latency. This could be implemented as:

1. A CLI flag: `--max-concurrent-reconciles`
2. An environment variable: `RUFIO_MAX_CONCURRENT_RECONCILES`
3. Both options for flexibility

## Performance Impact

I ran a benchmark script with different levels of concurrency to test the time for netboot jobs to complete across 30 nodes concurrently. The results are striking:

| Concurrency | Total Jobs | Completed Jobs | Failed Jobs | Min Duration (s) | Max Duration (s) | Avg Duration (s) | Median Duration (s) | Total Duration (s) |
|-------------|------------|---------------|------------|-----------------|-----------------|-----------------|-------------------|-------------------|
| 1           | 30         | 30            | 0          | 469             | 620             | 568.13          | 571.5             | 647               |
| 5           | 30         | 30            | 0          | 102             | 147             | 126.77          | 127.5             | 181               |
| 10          | 30         | 30            | 0          | 64              | 96              | 81.17           | 85                | 124               |

**Improvement Analysis:**
- Concurrency 5 vs 1: 72.00% faster
- Concurrency 10 vs 1: 80.00% faster

## Test Script

The benchmark was performed using [this script](https://gist.github.com/rahulbabu95/922e4facf63faf1f6f52d3b3574e907e) which creates netboot jobs for multiple machines and measures completion time with different concurrency settings.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow configurable MaxConcurrentReconciles in Rufio controllers #311

Problem

Proposed Solution

Performance Impact

Test Script

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Concurrency	Total Jobs	Completed Jobs	Min Duration (s)	Max Duration (s)	Avg Duration (s)	Median Duration (s)	Total Duration (s)
1	30	30	469	620	568.13	571.5	647
5	30	30	102	147	126.77	127.5	181
10	30	30	64	96	81.17	85	124

Allow configurable MaxConcurrentReconciles in Rufio controllers #311

Description

Problem

Proposed Solution

Performance Impact

Test Script

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions