-
Notifications
You must be signed in to change notification settings - Fork 15.1k
Add Resource Bin Packing documentation for kube-scheduler #53169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add Resource Bin Packing documentation for kube-scheduler #53169
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Pull request preview available for checkingBuilt without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Hey @tengqm , Can I get a review on this one ? If you have a moment. |
|
I recommend getting a tech review from SIG Scheduling. Once we have that, it LGTM and I'd be happy to apply it. @kubernetes/sig-scheduling-approvers FYI |
| name: NodeResourcesFit | ||
| ``` | ||
|
|
||
| This configuration focuses bin packing *only* on the extended resources `intel.com/foo` and `intel.com/bar`. By explicitly listing only these resources (and omitting the default CPU and memory), the scheduler makes placement decisions based primarily on how full these extended resources are. The `shape` function specifies that a node with 0% utilization of these resources gets a score of 0, while a node at 100% utilization gets a score of 10. This encourages bin packing behavior, where the scheduler prefers nodes already heavily utilizing these resources. Meanwhile, CPU and memory scheduling behavior falls back to the default, decoupling extended resource bin packing from general compute resource behavior. The weights (3 for both intel.com/foo and intel.com/bar) ensure these resources carry equal influence in the final scoring decision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Max score will be 100 if the requestedToCapacityRatio.shape.score = 10
| This configuration focuses bin packing *only* on the extended resources `intel.com/foo` and `intel.com/bar`. By explicitly listing only these resources (and omitting the default CPU and memory), the scheduler makes placement decisions based primarily on how full these extended resources are. The `shape` function specifies that a node with 0% utilization of these resources gets a score of 0, while a node at 100% utilization gets a score of 10. This encourages bin packing behavior, where the scheduler prefers nodes already heavily utilizing these resources. Meanwhile, CPU and memory scheduling behavior falls back to the default, decoupling extended resource bin packing from general compute resource behavior. The weights (3 for both intel.com/foo and intel.com/bar) ensure these resources carry equal influence in the final scoring decision. | |
| This configuration focuses bin packing *only* on the extended resources `intel.com/foo` and `intel.com/bar`. By explicitly listing only these resources (and omitting the default CPU and memory), the scheduler makes placement decisions based primarily on how full these extended resources are. The `shape` function specifies that a node with 0% utilization of these resources gets a score of 0, while a node at 100% utilization gets a score of 100. This encourages bin packing behavior, where the scheduler prefers nodes already heavily utilizing these resources. Meanwhile, CPU and memory scheduling behavior falls back to the default, decoupling extended resource bin packing from general compute resource behavior. The weights (3 for both intel.com/foo and intel.com/bar) ensure these resources carry equal influence in the final scoring decision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth explaining that explicitly
| name: NodeResourcesFit | ||
| ``` | ||
| In this example, the scheduler gives more weight to the extended resources `intel.com/foo` and `intel.com/bar` (weight 3) compared to CPU and memory (weight 1). This means the bin packing decision is *more influenced* by how full these extended resources are on each node. If you have scarce or expensive extended resources, increasing their weight helps pack pods more densely onto nodes that already have higher utilization of those resources. This is useful when the extended resources are the constraint—for instance, specialized hardware accelerators or custom vendor resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please split these really long lines into shorter ones. It's hard to review or change slightly in the future when the whole paragraph has to be referenced/updated.
| name: NodeResourcesFit | ||
| ``` | ||
|
|
||
| This configuration focuses bin packing *only* on the extended resources `intel.com/foo` and `intel.com/bar`. By explicitly listing only these resources (and omitting the default CPU and memory), the scheduler makes placement decisions based primarily on how full these extended resources are. The `shape` function specifies that a node with 0% utilization of these resources gets a score of 0, while a node at 100% utilization gets a score of 10. This encourages bin packing behavior, where the scheduler prefers nodes already heavily utilizing these resources. Meanwhile, CPU and memory scheduling behavior falls back to the default, decoupling extended resource bin packing from general compute resource behavior. The weights (3 for both intel.com/foo and intel.com/bar) ensure these resources carry equal influence in the final scoring decision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "Meanwhile, CPU and memory scheduling behavior falls back to the default, decoupling extended resource bin packing from general compute resource behavior."?
|
something I'd love more clarity on would be an example where MostAllocated and RequestedToCapacityRatio result in different behaviors I would also suggest that |
|
Answering the questions to make it easier for an author
With requestedToCapacityRatio:
shape:
- utilization: 0
score: 0
- utilization: 100
score: 10But, if we add more shape points to the function: requestedToCapacityRatio:
shape:
- utilization: 0
score: 0
- utilization: 50
score: 2
- utilization: 100
score: 10We can get a scoring that will score 0-20 for utilizations 0-50, but then score 20-100 for utilizations 50-100, making the smaller differences more impacting the final score.
Configuration of spreading/binpacking (LeastAllocated/MostAllocated) is done per scheduler profile. So, using a single profile, scheduler will either spread or binpack all configured resources. If you need to binpack some pods and spread out others, you can use two scheduler profiles, one with MostAllocated and the other with LeastAllocated strategy. Then, you can select the desired profile for a pod by specifying |
In my case, given some pods, I want to binpack them for the GPU resource, but spread or not binpack them for memory/cpu/storage (the same pods, not different deployments that could use different schedulers). As far as I understand setting the weight to 0 or -1 for the other resources will accomplish this? |
|
You might well have seen it but I have a separate plan to improve that page: #50664 Can I just check: would you be happy that I incorporate this change into the series of commits there? If you are, I will. |
According to the documentation, "Allowed weights go from 1 to 100". If you don't want to use binpacking based on CPU/memory, etc., simply do not specify these parameters in the config. This will result in binpacking based solely on the GPU, so when two nodes have the same number of available GPUs, other plugins will have the final say on the spreading. Ultimately, the choice of node when many have the same score is random. |
fix: #53149
Added explanations to the Resource Bin Packing docs clarifying how intel.com/foo and intel.com/bar resources influence scheduling decisions. Updated examples to describe the impact of weights and scoring behavior for better clarity.