-
Notifications
You must be signed in to change notification settings - Fork 30
feat: Add kubectl plugin support and enhance EKS aperf script #323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| --namespace) dest="NAMESPACE";; | ||
| --aperf_options) dest="APERF_OPTIONS";; | ||
| --aperf_image) dest="APERF_IMAGE";; | ||
| --cpu-request) dest="CPU_REQUEST";; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why you remove such options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because aperf is going to be as short-lived pod, we don't need the node to guarantee the resources it requests ... aperf won't be restricted by that and will be scheduled to the node I want. The default values were requesting too much resources anyway, isn't aperf just consuming around ~5% of CPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's change the default CPU_REQUEST and MEMORY_REQUEST values so we do not need to remove the feature.
Do we have some better values to set to them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to disagree here. When using Karpenter, the nodes you get are the ones your application need, usually you don't have too much room left to schedule a pod in that node. Karpenter's purpose is to launch a node with as less waste as possible. Even if we lower down the requests, users will face problems when the node launched was too efficient that there's no room guaranteed for another pod, the kube-scheduler will fail to schedule a pod. If not Karpenter but managed node groups, the pod might or not might fit into the node. This of course is related to requested resources, not utilization, that's why I removed this setup because in this case you'll be treating aperf pod as another process in the node as you'd normally do in a server without containers. Once scheduled, the aperf pod will use the resources it needs.
We need to make sure the aperf pod can be scheduled in the node, we don't really need to guarantee resources for the pod. In a Karpenter setup, if we define a number of resources here, we'll have to re-schedule the app you need to profile along with affinities to the aperf pod to make sure both pods are scheduled in the same node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see but some customers can be worry about the impact APerf can have on the node in a production env.
To not remove this feature, can we consider to not set up any request/limit values/setting for both memory and CPU and add them only when customer explicitly pass these options to the command line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good to me, I've pushed a new change to make these values optional
|
|
||
| # Get node taints and generate tolerations | ||
| echo -e "${BOLD}Checking node taints...${NC}" | ||
| TAINTS=$(kubectl get node ${NODE_NAME} -o jsonpath='{.spec.taints[*]}' 2>/dev/null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably need to add a check if the NODE exists before check the taint, otherwise if user enter a wrong node, it fails without a clear reason. What do you think?
| if [[ "$OSTYPE" == "darwin"* ]]; then | ||
| # macOS | ||
| open "$INDEX_FILE" | ||
| elif [[ "$OSTYPE" == "linux-gnu"* ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows support?
kubectl-aperf
Outdated
| POD_NAME="aperf-pod-${NODE_NAME//[.]/-}" | ||
|
|
||
| # Get node taints and generate tolerations | ||
| echo -e "${BOLD}Checking node taints...${NC}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This goes to a new line, can we have the result on a single line?
bash ./kubectl-aperf --aperf_image="${APERF_ECRREPO}:latest" --node="i-087ae37512508cca5"
Checking node taints...
No taints found on node
kubectl-aperf
Outdated
| grep "$(kubectl get pods --all-namespaces --field-selector spec.nodeName=${NODE_NAME} -o jsonpath='{range .items[*]}{.metadata.name}{" "}{end}' | sed 's/[[:space:]]*$//' | sed 's/[[:space:]]/\\|/g')" /tmp/allpods.out --color=never || echo " No pods found on this node" | ||
| rm /tmp/allpods.out 2>/dev/null || true | ||
| else | ||
| echo " ${YELLOW}Note: kubectl top not available (metrics-server may not be installed)${NC}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting is wrong here and can we have it on a single line?
Resource usage for pods on i-087ae375125081ba5:
\033[0;33mNote: kubectl top not available (metrics-server may not be installed)\033[0m
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the new line ... also, I'm not sure why you get the color numbers ... those variables were already in the script (from line 19 to 25) and I just use them.
chrismld
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@salvatoredipietro thanks for reviewing!! I've made a few changes to address your feedback, let me know your thoughts
| --namespace) dest="NAMESPACE";; | ||
| --aperf_options) dest="APERF_OPTIONS";; | ||
| --aperf_image) dest="APERF_IMAGE";; | ||
| --cpu-request) dest="CPU_REQUEST";; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because aperf is going to be as short-lived pod, we don't need the node to guarantee the resources it requests ... aperf won't be restricted by that and will be scheduled to the node I want. The default values were requesting too much resources anyway, isn't aperf just consuming around ~5% of CPU?
kubectl-aperf
Outdated
| grep "$(kubectl get pods --all-namespaces --field-selector spec.nodeName=${NODE_NAME} -o jsonpath='{range .items[*]}{.metadata.name}{" "}{end}' | sed 's/[[:space:]]*$//' | sed 's/[[:space:]]/\\|/g')" /tmp/allpods.out --color=never || echo " No pods found on this node" | ||
| rm /tmp/allpods.out 2>/dev/null || true | ||
| else | ||
| echo " ${YELLOW}Note: kubectl top not available (metrics-server may not be installed)${NC}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the new line ... also, I'm not sure why you get the color numbers ... those variables were already in the script (from line 19 to 25) and I just use them.
Summary
This PR transforms the EKS aperf script into a kubectl plugin and adds several enhancements to improve usability and automation.
Changes
1. Kubectl Plugin Support
eks-aperf.sh→kubectl-aperfto follow kubectl plugin naming conventionskubectl aperf [options]2. Enhanced Script Features
Automatic Taint Handling
kubectl get nodeandjqConfigurable Report Names
--report-nameparameter (default:aperf_record)--report-name="loadtest_run1"Automatic Report Extraction and Browser Opening
index.htmlin the default browser (configurable via--open-browser)open) and Linux (xdg-open)--open-browser=trueImproved Error Handling
kubectl topis unavailable3. Documentation Updates
eks-aperf.shtokubectl-aperfNew Parameters
--report-nameaperf_record--open-browsertrueTesting
Tested on:
✅ macOS (Darwin) with tainted EKS nodes
✅ Automatic browser opening on macOS
✅ Custom report names
✅ Nodes with multiple taints
Breaking Changes
None. The script maintains backward compatibility with all existing parameters.
Future Work
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.