Skip to content

Conversation

@marquiz
Copy link
Contributor

@marquiz marquiz commented Aug 26, 2025

This PR adds support for directly adjusting the intelRdt object of the container config.

Previously, we have the concept of RDT class, which provides indirect manipulation of the intelRdt.closID field. There the higher level runtime (containerd, cri-o) registers a resolver function which the NRI adaptation uses to translate the provided "RDT class" to actual closID (which is basically the resctrl group in the Linux resctrl filesystem).

The API extensions in this PR provides direct control of the relevant fields of the intelRdt object. The PR also includes a sample plugin to test/demonstrate the functionality (including deployment files and CI integration to build/publish images)

REFS:

NOTES:

OPEN QUESTION/TODO:

In this PR there is no mechanism for removing the intelRdt object from the config. That is probably a desirable feature.

@marquiz
Copy link
Contributor Author

marquiz commented Aug 26, 2025

github.com/onsi/ginkgo/v2 v2.19.1
github.com/onsi/gomega v1.34.0
github.com/opencontainers/runtime-spec v1.1.0
github.com/opencontainers/runtime-spec v1.2.2-0.20250818071321-383cadbf08c0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marquiz Do we know what is the expected timeframe for cutting the first such runtime-spec release which would put 383cadbf08c0 commit behind a tag ?

I am asking because we have been reluctant lately to pull in untagged versions of the runtime spec, because that would cause extra hassle/work if we wanted to or were forced to cut a new release and bump containerd and cri-o to pull in that new version. For instance #157 and #166 are current drafts exactly for this reason: the runtime-spec they require is not available yet in a tagged release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sure. I don't think we need to hurry with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marquiz @mikebrow @chrishenzie In principle there is one way how we could get this partially in already now, to avoid having to keep rebasing this with a guaranteed conflict whenever we add something to LinuxContainer or LinuxContainerAdjustment.

It would be to split this up to 2 stacked PRs:

  • PR#1: The bulk of this PR, without the OCI Spec bump + a single change to Generator.Adjust(): return an error if adjust.GetLinux().GetRdt() != nil
  • draft PR#2: All the commits in PR#1 + the OCI Spec bump (eventually to a tagged version) + a Generator.Adjust() updated to do RDT adjustment if requested

Then we could merge #PR1 (and rebase PR#2 when that happens), and keep the OCI Spec-dependent bits pending until the necessary bits get tagged. We could apply the same to #157 and #166...

Maybe not worth the effort though, and we can just mark this a draft until we get a tagged OCI Spec.

@klihub
Copy link
Member

klihub commented Aug 26, 2025

nit: typo in the body of commit message for 'api: add rdt': repeated 'the' in "... the the IntelRdt message is sepate from LinuxResources. This is to ..."

a.Linux.Resources.Unified = make(map[string]string)
}
}
func (a *ContainerAdjustment) initLinuxRdt() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: #118 has a bunch of queued similar changes, updating the result-handling code so that there is no need to always a priory create a fully constructed but empty adjustment. It would be nice to get that finally in, so I'll rebase it that on the latest main/HEAD then update with any missing init*() bits necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I'll review when that's updated. And then rebase when that's merged

@klihub
Copy link
Member

klihub commented Aug 26, 2025

@marquiz @mikebrow @chrishenzie This overall LGTM. My biggest question is the timeline for tagging a new opencontainers/runtime-spec so that bits we rely on here become available via a tagged release.

@klihub klihub requested review from chrishenzie and mikebrow August 26, 2025 13:35
@marquiz marquiz force-pushed the devel/rdt branch 2 times, most recently from b2a5d93 to c089e9c Compare August 26, 2025 14:05
@marquiz
Copy link
Contributor Author

marquiz commented Aug 26, 2025

My biggest question is the timeline for tagging a new opencontainers/runtime-spec so that bits we rely on here become available via a tagged release.

I'm on the same side. I think there's no immediate hurry with this, especially because even the runc/crun implementation(s) are not there, yet. I've been working on this area so I wanted to get this one off my shoulders. That said, I think this is ready for review. I could've submitted this as a draft, but decided not to because from my perspective this is complete.

@marquiz
Copy link
Contributor Author

marquiz commented Aug 26, 2025

I just recalled that there's actually one aspect/feature missing from the new extensions: ability to remove the linux.intelRdt from the container config (set it to nil/null). Suggestions for how to implement this are warmly welcome.

I couldn't come up with any nice solutions, I played with two alternatives but didn't commit either of them:

  1. Have a separate field (like drop) in the NRI API
  2. Have a well documented, otherwise invalid value for the closID field in the NRI API (e.g. -/) that causes the config to be dropped

Copy link

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@klihub
Copy link
Member

klihub commented Aug 27, 2025

I just recalled that there's actually one aspect/feature missing from the new extensions: ability to remove the linux.intelRdt from the container config (set it to nil/null). Suggestions for how to implement this are warmly welcome.

I couldn't come up with any nice solutions, I played with two alternatives but didn't commit either of them:

  1. Have a separate field (like drop) in the NRI API
  2. Have a well documented, otherwise invalid value for the closID field in the NRI API (e.g. -/) that causes the config to be dropped

@marquiz I'd vote with a mild bias for 2 above (because it's how we handle other similar cases where removal is supported). But with the addition that on the wire-protocol this is the convention we use, but in the user-facing API we provide explicit functions for clearing any LinuxRdt, with something along the lines of this:

func (a *ContainerAdjustment) RemoveLinuxRDT() {
    a.initLinux()
    a.Linux.Rdt = &LinuxRdt{
        ClosId = MarkForRemoval("") // or MarkForRemoval("/") if you think that's better
    }
}

And then have the corresponding handling for this in intermediate result-calculation and final Spec generation.

@marquiz
Copy link
Contributor Author

marquiz commented Aug 27, 2025

@klihub thanks for the feedback. The MarkForRemoval thingie was a good hint, I didn't have that in my early prototype. I'll cook up something along the lines you suggested and then we can mull over that.

Signed-off-by: Markus Lehtonen <[email protected]>
Add new message for directly managing the linux.intelRdt object of the
runtime-spec.

NOTE: the old RdtClass field resides under LinuxResources message but
the IntelRdt message is sepate from LinuxResources. This is to maintain
parity with the OCI runtime-spec. It is also much more logical because
the LinuxResources message is used in many places (like pod overhead,
update resources etc) where the RDT fields don't make sense or cannot be
applied. Also note that the Rdt object cannot modified on container
update (missing from UpdateContainerRequest) as this functionality is
missing from the runtime spec and runtimes (runc/crun).

Signed-off-by: Markus Lehtonen <[email protected]>
Add support for direct manipulation of the linuxRdt object in the
container config.

Signed-off-by: Markus Lehtonen <[email protected]>
Add support for direct manipulation of the intelRdt object of the
container config.

Signed-off-by: Markus Lehtonen <[email protected]>
Signed-off-by: Markus Lehtonen <[email protected]>
Makes it possible to remove/override the linux.intelRdt object from the
container configuration. Extend the API by adding new 'Remove' field to
the LinuxRdt message which is used as a marker to fully delete/override
the IntelRdt configuration. This patch also updates the adaptation
and runtime-tools correspondingly.

Signed-off-by: Markus Lehtonen <[email protected]>
@marquiz
Copy link
Contributor Author

marquiz commented Aug 28, 2025

Updated:

  • rebased (on top of the containerized-protobuild commit)
  • added unit test for pkg/runtime-tools
  • small change in pkg/adaptation/result.go (rely on initAdjustRdt() in initializing Container.Linux.Rdt)
  • fixed commit message
  • added support for removing RDT config (as a separate commit)

nit: typo in the body of commit message for 'api: add rdt': repeated 'the' in "... the the IntelRdt message is sepate from LinuxResources. This is to ..."

Thx for spotting this. Fixed

@marquiz
Copy link
Contributor Author

marquiz commented Aug 28, 2025

I'd vote with a mild bias for 2 above (because it's how we handle other similar cases where removal is supported). But with the addition that on the wire-protocol this is the convention we use, but in the user-facing API we provide explicit functions for clearing any LinuxRdt,

I ended up with adding a separate field. I implemented both variants and the "prefix closid with -/" variant was an ugly fragile mess tbh. So much cleaner to just have a separate field for this purpose. For envs and other guys it's a bit different when we're updating a map or slice and you selectively remove some items. But here we're managing the whole object. (if interested, I have the still lying around at https://github.com/marquiz/nri/commits/devel/rdt-remove-2/)

@klihub
Copy link
Member

klihub commented Aug 28, 2025

I'd vote with a mild bias for 2 above (because it's how we handle other similar cases where removal is supported). But with the addition that on the wire-protocol this is the convention we use, but in the user-facing API we provide explicit functions for clearing any LinuxRdt,

I ended up with adding a separate field. I implemented both variants and the "prefix closid with -/" variant was an ugly fragile mess tbh. So much cleaner to just have a separate field for this purpose. For envs and other guys it's a bit different when we're updating a map or slice and you selectively remove some items. But here we're managing the whole object.

@marquiz Yes, and sorry about the long sidetrack in our offline discussion earlier. I was in the totally wrong mental model, associated with most of the existing bits (slices and maps). You were absolutely right that we can't reasonably handle removal with an inline notation, because then it would not be possible to indicate a removal prior to an addition to avoiding conflict with an earlier plugin. I agree a separate dedicated field in the wire-protocol is the way to go here.

@klihub klihub marked this pull request as draft September 22, 2025 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants