Skip to content

Conversation

@chrismld
Copy link

Summary

This PR transforms the EKS aperf script into a kubectl plugin and adds several enhancements to improve usability and automation.

Changes

1. Kubectl Plugin Support

  • Renamed eks-aperf.shkubectl-aperf to follow kubectl plugin naming conventions
  • Users can now install and run as: kubectl aperf [options]
  • Added installation instructions to README-EKS.md

2. Enhanced Script Features

Automatic Taint Handling

  • Automatically detects node taints using kubectl get node and jq
  • Dynamically generates tolerations in the pod spec
  • Enables aperf to run on tainted nodes without manual configuration

Configurable Report Names

  • Added --report-name parameter (default: aperf_record)
  • Allows users to specify custom names for better organization
  • Example: --report-name="loadtest_run1"

Automatic Report Extraction and Browser Opening

  • Automatically extracts the tar.gz report after collection
  • Opens index.html in the default browser (configurable via --open-browser)
  • Cross-platform support for macOS (open) and Linux (xdg-open)
  • Default behavior: --open-browser=true

Improved Error Handling

  • Added graceful handling for missing metrics-server
  • Shows helpful messages when kubectl top is unavailable
  • Better error messages with debug output

3. Documentation Updates

  • Updated all script references from eks-aperf.sh to kubectl-aperf
  • Added kubectl plugin installation section
  • Added Krew plugin manager documentation (for future distribution)
  • Updated usage examples throughout README-EKS.md

New Parameters

Parameter Type Default Description
--report-name string aperf_record Custom name for aperf record/report
--open-browser boolean true Automatically open report in browser

Testing

Tested on:

✅ macOS (Darwin) with tainted EKS nodes
✅ Automatic browser opening on macOS
✅ Custom report names
✅ Nodes with multiple taints

Breaking Changes

None. The script maintains backward compatibility with all existing parameters.

Future Work

  • Submit to official Krew index for wider distribution
  • Add support for multiple nodes in a single run
  • Add report comparison features

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@chrismld chrismld requested a review from a team as a code owner October 15, 2025 15:59
--namespace) dest="NAMESPACE";;
--aperf_options) dest="APERF_OPTIONS";;
--aperf_image) dest="APERF_IMAGE";;
--cpu-request) dest="CPU_REQUEST";;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why you remove such options?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because aperf is going to be as short-lived pod, we don't need the node to guarantee the resources it requests ... aperf won't be restricted by that and will be scheduled to the node I want. The default values were requesting too much resources anyway, isn't aperf just consuming around ~5% of CPU?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's change the default CPU_REQUEST and MEMORY_REQUEST values so we do not need to remove the feature.
Do we have some better values to set to them?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to disagree here. When using Karpenter, the nodes you get are the ones your application need, usually you don't have too much room left to schedule a pod in that node. Karpenter's purpose is to launch a node with as less waste as possible. Even if we lower down the requests, users will face problems when the node launched was too efficient that there's no room guaranteed for another pod, the kube-scheduler will fail to schedule a pod. If not Karpenter but managed node groups, the pod might or not might fit into the node. This of course is related to requested resources, not utilization, that's why I removed this setup because in this case you'll be treating aperf pod as another process in the node as you'd normally do in a server without containers. Once scheduled, the aperf pod will use the resources it needs.

We need to make sure the aperf pod can be scheduled in the node, we don't really need to guarantee resources for the pod. In a Karpenter setup, if we define a number of resources here, we'll have to re-schedule the app you need to profile along with affinities to the aperf pod to make sure both pods are scheduled in the same node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see but some customers can be worry about the impact APerf can have on the node in a production env.
To not remove this feature, can we consider to not set up any request/limit values/setting for both memory and CPU and add them only when customer explicitly pass these options to the command line?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me, I've pushed a new change to make these values optional


# Get node taints and generate tolerations
echo -e "${BOLD}Checking node taints...${NC}"
TAINTS=$(kubectl get node ${NODE_NAME} -o jsonpath='{.spec.taints[*]}' 2>/dev/null)
Copy link
Contributor

@salvatoredipietro salvatoredipietro Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably need to add a check if the NODE exists before check the taint, otherwise if user enter a wrong node, it fails without a clear reason. What do you think?

if [[ "$OSTYPE" == "darwin"* ]]; then
# macOS
open "$INDEX_FILE"
elif [[ "$OSTYPE" == "linux-gnu"* ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows support?

kubectl-aperf Outdated
POD_NAME="aperf-pod-${NODE_NAME//[.]/-}"

# Get node taints and generate tolerations
echo -e "${BOLD}Checking node taints...${NC}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes to a new line, can we have the result on a single line?

bash ./kubectl-aperf --aperf_image="${APERF_ECRREPO}:latest"  --node="i-087ae37512508cca5"
Checking node taints...
  No taints found on node

kubectl-aperf Outdated
grep "$(kubectl get pods --all-namespaces --field-selector spec.nodeName=${NODE_NAME} -o jsonpath='{range .items[*]}{.metadata.name}{" "}{end}' | sed 's/[[:space:]]*$//' | sed 's/[[:space:]]/\\|/g')" /tmp/allpods.out --color=never || echo " No pods found on this node"
rm /tmp/allpods.out 2>/dev/null || true
else
echo " ${YELLOW}Note: kubectl top not available (metrics-server may not be installed)${NC}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting is wrong here and can we have it on a single line?

Resource usage for pods on i-087ae375125081ba5:
  \033[0;33mNote: kubectl top not available (metrics-server may not be installed)\033[0m

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the new line ... also, I'm not sure why you get the color numbers ... those variables were already in the script (from line 19 to 25) and I just use them.

Copy link
Author

@chrismld chrismld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@salvatoredipietro thanks for reviewing!! I've made a few changes to address your feedback, let me know your thoughts

--namespace) dest="NAMESPACE";;
--aperf_options) dest="APERF_OPTIONS";;
--aperf_image) dest="APERF_IMAGE";;
--cpu-request) dest="CPU_REQUEST";;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because aperf is going to be as short-lived pod, we don't need the node to guarantee the resources it requests ... aperf won't be restricted by that and will be scheduled to the node I want. The default values were requesting too much resources anyway, isn't aperf just consuming around ~5% of CPU?

kubectl-aperf Outdated
grep "$(kubectl get pods --all-namespaces --field-selector spec.nodeName=${NODE_NAME} -o jsonpath='{range .items[*]}{.metadata.name}{" "}{end}' | sed 's/[[:space:]]*$//' | sed 's/[[:space:]]/\\|/g')" /tmp/allpods.out --color=never || echo " No pods found on this node"
rm /tmp/allpods.out 2>/dev/null || true
else
echo " ${YELLOW}Note: kubectl top not available (metrics-server may not be installed)${NC}"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the new line ... also, I'm not sure why you get the color numbers ... those variables were already in the script (from line 19 to 25) and I just use them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants