Skip to content

Conversation

@askervin
Copy link
Contributor

This patch depends on:

TODO:

  • Drop commit 7371298. It's there only for testing with containerd.

go.sum Outdated
Comment on lines 89 to 100
<<<<<<< HEAD
golang.org/x/sys v0.0.0-20191026070338-33540a1f6037/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20191115151921-52ab43148777/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
=======
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20190911185100-cd5d95a43a6e/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20200930185726-fdedc70b468f/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
>>>>>>> 7dec8f9 (DO NOT MERGE: Revert "api: remove NRI/CRI resource conversion functions.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An unresolved conflict crept in...

Comment on lines 150 to 170
// SetLinuxMemoryPolicyMode records setting the memory policy mode for a container.
func (a *ContainerAdjustment) SetLinuxMemoryPolicyMode(mode string) {
a.initLinuxMemoryPolicy()
a.Linux.MemoryPolicy.Mode = mode
}

// SetLinuxMemoryPolicyNodes records setting memory policy nodes for a container.
func (a *ContainerAdjustment) SetLinuxMemoryPolicyNodes(nodes string) {
a.initLinuxMemoryPolicy()
a.Linux.MemoryPolicy.Nodes = nodes
}

// SetLinuxMemoryPolicyFlags records setting memory policy flags for a container.
func (a *ContainerAdjustment) SetLinuxMemoryPolicyFlags(flags []string) {
a.initLinuxMemoryPolicy()
a.Linux.MemoryPolicy.Flags = flags
}

Copy link
Member

@klihub klihub Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that both Mode and Nodes are required (in the corresponding OCI Spec memory policy setting), we should at least consider having a single function to set both. This could be achieved, for instance by adding a func (a *ContainerAdjustment) SetLinuxMemoryPolicy(mode, nodes string, flags ...string) (or using enum types instead of strings as mentioned on the other review comment...). Or having the new function directly take an *api.LinuxMemoryPolicy as an argument...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up separating all three (Mode, Nodes, Flags) into separate functions because MPOL_DEFAULT and MPOL_LOCAL require that Nodes must be not be defined. Or more exactly, the nodemask must be 0.

Of course, this does not prevent having SetLinuxMemoryPolicy(mode, nodes string, flags ...string) where nodes just must be empty string.

For type safety, I'd actually like to use SetLinuxMemoryPolicy(MemoryPolicyModeType, string, ...MemoryPolicyFlagType) here, but as this api does not import runtime-spec, so it would look a bit strange to have it for this case only.

Anyway, I like your "flags... string" better than "flags []string".

Let's discuss tomorrow face-to-face what would be the best combo out of all these.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For type safety, I'd actually like to use SetLinuxMemoryPolicy(MemoryPolicyModeType, string, ...MemoryPolicyFlagType) here, but as this api does not import runtime-spec, so it would look a bit strange to have it for this case only.

Actually, we can and do import opencontainers/runtime-spec/specs-go in pkg/api. What we can't do nicely is pass runtime-spec defined types over the wire as a protobuf message. AFAICT, we'd need to pass it as a protobuf/types.Any (which I think essentially would end up being a byte slice of marshalled data).

So we define corresponding types and those types typically provide a ToOCI() functions which returns the corresponding runtime-specs/specs-go representation of the same data, and that's why we already have that import. If the same data type can also be used as input, we typically also provide a FromOCI${DATA_TYPE}(in *runtimespecs.${DATA_TYPE}) *${DATA_TYPE} "conversion constructor" to do the conversion in the other direction. That makes it than bearably easy to do forth and back conversion at the edge interfaces...

Comment on lines 454 to 456
string mode = 1;
string nodes = 2;
repeated string flags = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two comparable PRs open, #155 and #160,. They are very much comparable, apart from the fact that her you are also in the process of adding the necessary changes/bits to opencontainers/runtime-spec, opencontainers/runtime-tools, and runc, whereas thoseadd controls for already existing OCI Spec features to NRI. But this difference is irrelevant in the scope of this PR (but would be more relevant in the scope of the corresponding OCI Spec PR).

Anyway, in those PR we don't use strings for things like kernel supported policies/mode or flags for those policies. Instead we define an explicit enumeration with the supported values, and IIRC the OCI Spec is doing the same for the controls in those PRs, by defining a custom string types and then consts for it. That is less flexible in case the kernel gains new modes/policies, but it provides a way to flag usage errors early rather than late (when someone tries to use an unknown policy or flag instead of when runc/crun tries to apply it to the container).

I'm not sure whether one would be a clear winner approach here, but I suspect that unless the kernel is gaining new policies/modes or flags frequently, the inflexibility disadvantage of using enums is negligible, while it is more self-documenting (no need to guess whether mode should be MPOL_INTERLEAVE, INTERLEAVE, or interleave, for instance).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, and I checked and that's what you also have (custom string types and const 'enums') in the OCI Spec additions, so maybe we could follow suit with the other PRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning for having memory policy modes and flags as strings in the OCI spec was following what seemed to be the practice there. LinuxSchedulerPolicy, LinuxPersonalityDomain and flags, Seccomp stuff, syscalls, etc. are all strings yet they could be integers. It made sense to me, as VM-based runtimes might even use different kernels than the container runtime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning for having memory policy modes and flags as strings in the OCI spec was following what seemed to be the practice there. LinuxSchedulerPolicy, LinuxPersonalityDomain and flags, Seccomp stuff, syscalls, etc. are all strings yet they could be integers. It made sense to me, as VM-based runtimes might even use different kernels than the container runtime.

Well, in the OCI Spec/runtime-spec golang package they are not direct, native string types, but custom types with a string as the underlying type, with set of predefined consts that limit what you can (easily) pass in as value (unless you shotgun-typecast arbitrary strings to the custom type). For all of the other existing custom types, IOPrioClass, LinuxSchedulerPolicy, LinuxSchedulerFlags, maybe a few others as well, on the NRI side of the abstraction we opted to use an enum, which AFAICT is as close as you can get to the custom type+predefined const values as you can get with a proto message. You can see these aforementioned abstraction here for I/O priority and here for scheduler policy attributes.

So basically the essence of my comment and main point was here that, provided we think those other abstractions are fine, and if we want to be consistent within the realm of NRI (which TBH might very well be a tall order when it comes to bits that I have added because nowadays I feel like the only thing I can do consistently is being inconsistent) then we at least should consider using protobuf enums for the suitable memory policy bits as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question regarding prefixing enum values in api.proto.

Following the pattern in above PRs, we'll get:

// Memory policy of a container being created.
message LinuxMemoryPolicy {
  MpolMode mode = 1;
  string nodes = 2;
  repeated MpolFlag flags = 3;
}

enum MpolMode {
    MPOL_MODE_DEFAULT = 0;
    MPOL_MODE_PREFERRED = 1;
    MPOL_MODE_BIND = 2;
    MPOL_MODE_INTERLEAVE = 3;
    MPOL_MODE_LOCAL = 4;
    MPOL_MODE_PREFERRED_MANY = 5;
    MPOL_MODE_WEIGHTED_INTERLEAVE = 6;
}

From which protobuf generates api.pb.go lines:

type MpolMode int32

const (
	MpolMode_MPOL_MODE_DEFAULT             MpolMode = 0
	MpolMode_MPOL_MODE_PREFERRED           MpolMode = 1
	MpolMode_MPOL_MODE_BIND                MpolMode = 2
	MpolMode_MPOL_MODE_INTERLEAVE          MpolMode = 3
	MpolMode_MPOL_MODE_LOCAL               MpolMode = 4
	MpolMode_MPOL_MODE_PREFERRED_MANY      MpolMode = 5
	MpolMode_MPOL_MODE_WEIGHTED_INTERLEAVE MpolMode = 6
)

...and then a NRI plugin that modifies memory policy would do something like:

ca.SetLinuxMemoryPolicy(api.MpolMode_MPOL_MODE_PREFERRED, "0-1", api.MpolFlag_MPOL_FLAG_STATIC_NODES)

@klihub, would you see problems if we'd use unprefixed enums (like DEFAULT instead of MPOL_MODE_DEFAULT), because protobuf seems to generate type prefix to the enums anyway? Double prefixes are a bit clumsy, but I don't know if dropping prefixes would cause name collisions elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klihub, after sleeping over the weekend, I think I should name MpolMode enum values exactly as in linux/mempolicy.h. Then I can use generated MpolMode_name and _value functions for converting NRI wire values into OCI strings directly. Probably something you suggested already. :D Sorry about the noise.

Copy link
Member

@klihub klihub May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we want to avoid that double-prefix due to enum values getting 'namespaced' by their enum type, then I think calling the enum MpolMode then have the enum values without the MPOL_MODE_ prefix maybe sounds to most appealing/natural, so they then would end up being MpolMode_DEFAULT instead of MpolMode_MPOL_MODE_DEFAULT. It would be close enough to the corresponding original.

An maybe slightly ugly alternative would be to call the enum type MPOL_MODE instead, so have something like

 // Memory policy of a container being created.
message LinuxMemoryPolicy {
  MPOL_MODE mode = 1;
  string nodes = 2;
  repeated MpolFlag flags = 3;
}

enum MPOL_MODE {
    DEFAULT = 0;
    PREFERRED = 1;
    BIND = 2;
    INTERLEAVE = 3;
    LOCAL = 4;
    PREFERRED_MANY = 5;
    WEIGHTED_INTERLEAVE = 6;
}

Then if we want to offset it a bit and allow the enum type to be called MpolMode, we could add pkg/api/memory-policy.go (which I think anyway will be needed to provide ToOCI and FromOCI helpers) with an alias like

// MpolMode is an alias for MPOL_MODE which we prefer to use in hand-written code.
type MpolMode = MPOL_MODE

We can then use MpolMode all over in our hand-written code instead of MPOL_MODE but have the MPOL_MODE_ prefix for the enum values. If I had to chose, probably this would look nicer to my eyes. But this is just my subjective opinion. Someone else might consider not having an alias a better alternative.

I haven't done this in my PRs, so I have double prefixes now. I could try doing something similar for the I/O Priority PR (#155) where the naming is similar and rather consistent.

But unfortunately for the scheduling parameters PR (#160) I think I can't have something like that...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@askervin Okay, I tried how this would look like with the I/O-prio PR (#155) mentioned above. Here is the commit on top of the current PR: klihub@9b5a7ed

Dunno... Clearly doable... but I don't know if it really makes sense to have kludgery this way, and if we'd be able to do it consistently in the future if/when we add more similar features.

I suspect that the easiest to stay consistent about is one of two alternatives. Either a straightforward naive conversion of the corresponding OCI Spec types (which is what you have here and what I have in #155 and #160 which then results in some cases having repetitive prefixes). Or omit repetitive prefixes wrt. the enum type name from the enum values. So then here we'd call the enum type MpolMode and omit the repetitive MPOL_MODE_ prefix from the values, ending up with MpolMode_DEFAULT instead of MPOL_MODE_DEFAULT, which is close enough.

Needed due to runtime-spec update.

Signed-off-by: Antti Kervinen <[email protected]>
The Prestart hook and the LinuxMemory.Kernel field should not be
used. Still keep handling their contents in NRI without breaking those
who still use them.

Signed-off-by: Antti Kervinen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants