-
Notifications
You must be signed in to change notification settings - Fork 70
CI - Generate and Upload IRPA files #2640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
PhaneeshB
commented
Nov 4, 2025
- Add steps for llama_8b_fp8 model
b9dc6dd to
e61dacf
Compare
f124137 to
0f29976
Compare
0f29976 to
72ce3a8
Compare
9adb66f to
2148517
Compare
2148517 to
e708c3f
Compare
| --account-name sharkpublic \ | ||
| --sas-token "$AZURE_SAS_TOKEN" \ | ||
| --container-name ossci \ | ||
| --name "$AZURE_BLOB_PATH/${IRPA_FILENAME%.irpa}-TEST.irpa" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-TEST suffix is intention until all testing is completed (verified ipra files being uploaded in the correct location)
because it overwrites the existing file.
Only remove this after the PR is ready to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR consolidates the CI workflow for generating and uploading IRPA (Intermediate Representation Package Archive) files for various LLama model configurations. The primary focus is adding support for FP8 quantized models, particularly the llama_8b_fp8 variant, while refactoring the workflow into a more maintainable structure.
Key Changes:
- Added a new consolidated script (
irpa_export_validate_upload.sh) that handles the complete workflow of exporting, compiling, validating, and uploading IRPA files - Extended model support to include FP8 and FP4 quantized variants (8B, 70B, 405B, and Mistral models)
- Split the CI workflow into two separate jobs: one for smaller models (8B/70B/Mistral) and one for the 405B model requiring specialized hardware
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
scripts/irpa_export_validate_upload.sh |
New consolidated script that orchestrates the export, compile, validate, and upload workflow for IRPA files with comprehensive parameter handling |
scripts/validate_numerics.sh |
Improved error handling by introducing FINAL_RESULT to accumulate errors across all validation prompts |
scripts/setenv.sh |
Fixed typo in comment (amdsharktank → sharktank) |
scripts/merge_safetensors.py |
Fixed typo in logging message (Marging → Merging) |
scripts/download_export_irpa.sh |
Added support for FP8 and FP4 model variants with flexible repository mapping and IRPA export logic |
amdsharktank/amdsharktank/models/llama/tools/import_quark_dataset.py |
Updated model base choices to include mistral-70b and corrected layer count for 70b models from 40 to 80 |
.github/workflows/ci_generate_irpa.yml |
Restructured workflow into two jobs with dedicated steps for each model configuration, including proper environment setup and cleanup |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| config_json_path: Path = args.config_json | ||
| params_path: Path = args.params | ||
| # TODO: find a way to get this programatically so we don't have to flag for it | ||
| split_sizes = [4096, 4096, 4096] if args.model_base == "7b" else [8192, 1024, 1024] |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The layer count for "70b" model base has been changed from 40 to 80. This is a significant change that should be verified. If this is a bug fix (i.e., the original value of 40 was incorrect for 70b models), it could affect existing workflows or previously generated models. Please ensure this change is intentional and properly tested.
| split_sizes = [4096, 4096, 4096] if args.model_base == "7b" else [8192, 1024, 1024] | |
| split_sizes = [4096, 4096, 4096] if args.model_base == "7b" else [8192, 1024, 1024] | |
| # Layer counts for model bases: | |
| # - "7b": 32 layers (Llama-2 7B) | |
| # - "70b": 80 layers (Llama-2 70B, see: https://github.com/facebookresearch/llama/blob/main/docs/model_card.md) | |
| # - "mistral-70b": 40 layers (Mistral 70B, see: https://docs.mistral.ai/models/) | |
| # - "405b": 126 layers (custom model) | |
| # The value for "70b" was updated from 40 to 80 to match the official Llama-2 70B architecture. |
| echo "--model : Model to run (Llama-3.1-8B-Instruct, Llama-3.1-70B-Instruct, Mistral-Nemo-Instruct-2407-FP8 )" | ||
| echo "--hf-token : Hugging face token with access to gated flux models" | ||
| echo "--model : Model to run (Llama-3.1-8B-Instruct, Llama-3.1-70B-Instruct, Llama-3.1-8B-Instruct-FP8-KV, Llama-3.1-70B-Instruct-FP8-KV, Llama-3.1-405B-Instruct-MXFP4-Preview, Mistral-Nemo-Instruct-2407-FP8 )" | ||
| echo "--hf-token : Hugging face token with access to gated flux models" |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in help message: "Hugging face" should be capitalized as "HuggingFace" (or "Hugging Face" with space), and "gated flux models" is misleading since this script is for Llama/Mistral models, not Flux models.
| echo "--hf-token : Hugging face token with access to gated flux models" | |
| echo "--hf-token : Hugging Face token with access to gated Llama/Mistral models" |
| --account-name sharkpublic \ | ||
| --sas-token "$AZURE_SAS_TOKEN" \ | ||
| --container-name ossci \ | ||
| --name "$AZURE_BLOB_PATH/${IRPA_FILENAME%.irpa}-TEST.irpa" \ |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The upload logic uploads files with -TEST.irpa suffix instead of the original filename. This appears to be intentional for testing purposes, but the comment on line 213 says "Upload current version (overwrite)" which suggests this should be the production version. Consider clarifying whether this is the intended behavior or if it should upload to $IRPA_FILENAME instead.
| --name "$AZURE_BLOB_PATH/${IRPA_FILENAME%.irpa}-TEST.irpa" \ | |
| --name "$AZURE_BLOB_PATH/$IRPA_FILENAME" \ |
| if [[ $MODEL = "Llama-3.1-8B-Instruct-FP8-KV" || $MODEL = "Llama-3.1-70B-Instruct-FP8-KV" || $MODEL = "Llama-3.1-405B-Instruct-MXFP4-Preview" ]]; then | ||
| # Determine model-specific parameters | ||
| if [[ $MODEL = "Llama-3.1-8B-Instruct-FP8-KV" ]]; then | ||
| IRPA_OUTPUT="instruct_8b_fp8_e4m3fnuz.irpa" |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The model base value "7b" is used for the 8B model. While this may be intentional for historical reasons or shared configuration, the inconsistency between the model size name (8B) and the model base identifier (7b) could be confusing. Consider adding a comment explaining why "7b" is used for 8B models.
| IRPA_OUTPUT="instruct_8b_fp8_e4m3fnuz.irpa" | |
| IRPA_OUTPUT="instruct_8b_fp8_e4m3fnuz.irpa" | |
| # Note: The model base value "7b" is used for the 8B model due to historical reasons and shared configuration with downstream tools. |
| --model Mistral-Nemo-Instruct-2407-FP8 \ | ||
| --hf-token ${HF_TOKEN} | ||
| rm -rf Llama-3.1-8B-Instruct Llama-3.1-70B-Instruct Llama-3.1-8B-Instruct-FP8-KV output_artifacts | ||
| rm -f instruct_8b_fp8_e4m3fn.irpa instruct_8b_fp8_e4m3fn_previous.irpa instruct_70b_fp16_previous.irpa instruct_8b_fp16_previous.irpa |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cleanup step attempts to remove instruct_8b_fp8_e4m3fn.irpa but the actual file generated is instruct_8b_fp8_e4m3fnuz.irpa (with "uz" suffix). This cleanup command won't remove the correct file.
| rm -f instruct_8b_fp8_e4m3fn.irpa instruct_8b_fp8_e4m3fn_previous.irpa instruct_70b_fp16_previous.irpa instruct_8b_fp16_previous.irpa | |
| rm -f instruct_8b_fp8_e4m3fnuz.irpa instruct_8b_fp8_e4m3fn_previous.irpa instruct_70b_fp16_previous.irpa instruct_8b_fp16_previous.irpa |
| --tokenizer_config "$TOKENIZER_CONFIG_PATH" \ | ||
| --steps "$STEPS" \ | ||
| --kv-cache-dtype "$KV_CACHE_DTYPE" 2>&1 | tee "$(pwd)/output_artifacts/${MODEL_TAG}_run_llm_vmfb.log" | ||
| VALIDATION_EXIT_CODE=${PIPESTATUS[0]} |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The script uses PIPESTATUS[0] to capture the exit code from the piped command. While this works in bash, it would be clearer and more maintainable to capture the validation exit code before piping to tee, or use a temporary file for logging. The current approach is correct but could be fragile if the command structure changes.
| COMPILE_CMD="bash scripts/export_and_compile.sh --irpa $IRPA_PATH --bs-prefill $BS_PREFILL --bs-decode $BS_DECODE --iree-hip-target $IREE_HIP_TARGET" | ||
| if [ -n "$DTYPE" ]; then | ||
| COMPILE_CMD="$COMPILE_CMD --dtype $DTYPE" | ||
| fi | ||
| $COMPILE_CMD 2>&1 | tee "$(pwd)/output_artifacts/${MODEL_TAG}_export_and_compilation.log" || { echo "Compilation failed"; exit 1; } |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The COMPILE_CMD variable is built by concatenating unquoted user-controlled arguments (e.g., IRPA_PATH, BS_PREFILL, BS_DECODE, IREE_HIP_TARGET, DTYPE) into a single shell command string and then executed via $COMPILE_CMD, which allows command injection if any of these parameters contain shell metacharacters (e.g., ;, &&). An attacker who can influence the script arguments (for example via CI parameters or wrapper scripts) could inject arbitrary shell commands that will run with the privileges of this script. To fix this, avoid constructing a command string and instead invoke the script directly with properly quoted arguments, e.g. bash scripts/export_and_compile.sh --irpa "$IRPA_PATH" --bs-prefill "$BS_PREFILL" ... and pass --dtype "$DTYPE" conditionally as an additional argument rather than concatenating to a string.
| COMPILE_CMD="bash scripts/export_and_compile.sh --irpa $IRPA_PATH --bs-prefill $BS_PREFILL --bs-decode $BS_DECODE --iree-hip-target $IREE_HIP_TARGET" | |
| if [ -n "$DTYPE" ]; then | |
| COMPILE_CMD="$COMPILE_CMD --dtype $DTYPE" | |
| fi | |
| $COMPILE_CMD 2>&1 | tee "$(pwd)/output_artifacts/${MODEL_TAG}_export_and_compilation.log" || { echo "Compilation failed"; exit 1; } | |
| # Directly invoke the script with properly quoted arguments to prevent command injection | |
| if [ -n "$DTYPE" ]; then | |
| bash scripts/export_and_compile.sh \ | |
| --irpa "$IRPA_PATH" \ | |
| --bs-prefill "$BS_PREFILL" \ | |
| --bs-decode "$BS_DECODE" \ | |
| --iree-hip-target "$IREE_HIP_TARGET" \ | |
| --dtype "$DTYPE" 2>&1 | tee "$(pwd)/output_artifacts/${MODEL_TAG}_export_and_compilation.log" || { echo "Compilation failed"; exit 1; } | |
| else | |
| bash scripts/export_and_compile.sh \ | |
| --irpa "$IRPA_PATH" \ | |
| --bs-prefill "$BS_PREFILL" \ | |
| --bs-decode "$BS_DECODE" \ | |
| --iree-hip-target "$IREE_HIP_TARGET" 2>&1 | tee "$(pwd)/output_artifacts/${MODEL_TAG}_export_and_compilation.log" || { echo "Compilation failed"; exit 1; } | |
| fi |
| echo "Downloading from repository: $REPO" | ||
| hf download $REPO --local-dir $MODEL |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In download_model, the REPO and MODEL values are derived from command-line arguments (HF_REPO/MODEL) and then interpolated unquoted into the hf download $REPO --local-dir $MODEL command, which enables shell command injection if an attacker supplies a value containing shell metacharacters like ; or &&. This would let an attacker who can control --model or --hf-repo execute arbitrary commands with the privileges of this script (and potentially access the configured Hugging Face credentials). To mitigate this, always quote these variables (e.g., hf download "$REPO" --local-dir "$MODEL") and avoid accepting arbitrary repository strings unless they are validated against an allowlist of expected model names.
| python scripts/merge_safetensors.py $MODEL | ||
| sudo mv merged.safetensors $MODEL/merged.safetensors |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the new FP8/MXFP4 branches, MODEL is taken directly from the --model argument and then used unquoted in commands like python scripts/merge_safetensors.py $MODEL and sudo mv merged.safetensors $MODEL/merged.safetensors, which allows a malicious model name containing shell metacharacters (e.g., ;, &&) to inject additional shell commands. An attacker who can control the --model parameter could leverage this to run arbitrary commands with the privileges of the script (and in the case of sudo, potentially escalate or modify privileged files). To fix this, treat MODEL as data by quoting it everywhere it is used in shell commands (e.g., python scripts/merge_safetensors.py "$MODEL", sudo mv merged.safetensors "$MODEL/merged.safetensors") and consider validating MODEL against a strict pattern or allowlist of expected names.
| --config-json $MODEL/config.json --model-base="70b" --weight-dtype=float16 | ||
| IRPA_OUTPUT=mistral_nemo_2407_fp8_e4m3fnuz.irpa | ||
| python -m amdsharktank.models.llama.tools.import_quark_dataset --params merged.safetensors --output-irpa-file=$IRPA_OUTPUT \ | ||
| --config-json $MODEL/config.json --model-base="mistral-70b" --weight-dtype=float16 |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the updated Mistral branch, MODEL remains unquoted in the --config-json $MODEL/config.json argument to the python -m amdsharktank.models.llama.tools.import_quark_dataset command, so a crafted --model value containing shell metacharacters could break out of the argument and inject additional commands. This allows arbitrary command execution with the privileges of this script if an attacker can influence the --model parameter. To remediate, always quote MODEL when building paths or arguments (e.g., --config-json "$MODEL/config.json") and, ideally, restrict MODEL to an allowlist of known-good model identifiers.
| --config-json $MODEL/config.json --model-base="mistral-70b" --weight-dtype=float16 | |
| --config-json "$MODEL/config.json" --model-base="mistral-70b" --weight-dtype=float16 |