CI - Generate and Upload IRPA files #2640

PhaneeshB · 2025-11-04T17:08:17Z

Add steps for llama_8b_fp8 model

github-actions · 2025-11-04T17:16:56Z

Coverage report

This PR does not seem to contain any modification to coverable code.

PhaneeshB · 2025-12-08T04:59:18Z

scripts/irpa_export_validate_upload.sh

+    --account-name sharkpublic \
+    --sas-token "$AZURE_SAS_TOKEN" \
+    --container-name ossci \
+    --name "$AZURE_BLOB_PATH/${IRPA_FILENAME%.irpa}-TEST.irpa" \


-TEST suffix is intention until all testing is completed (verified ipra files being uploaded in the correct location)
because it overwrites the existing file.
Only remove this after the PR is ready to merge.

Copilot

Pull request overview

This PR consolidates the CI workflow for generating and uploading IRPA (Intermediate Representation Package Archive) files for various LLama model configurations. The primary focus is adding support for FP8 quantized models, particularly the llama_8b_fp8 variant, while refactoring the workflow into a more maintainable structure.

Key Changes:

Added a new consolidated script (irpa_export_validate_upload.sh) that handles the complete workflow of exporting, compiling, validating, and uploading IRPA files
Extended model support to include FP8 and FP4 quantized variants (8B, 70B, 405B, and Mistral models)
Split the CI workflow into two separate jobs: one for smaller models (8B/70B/Mistral) and one for the 405B model requiring specialized hardware

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
`scripts/irpa_export_validate_upload.sh`	New consolidated script that orchestrates the export, compile, validate, and upload workflow for IRPA files with comprehensive parameter handling
`scripts/validate_numerics.sh`	Improved error handling by introducing `FINAL_RESULT` to accumulate errors across all validation prompts
`scripts/setenv.sh`	Fixed typo in comment (amdsharktank → sharktank)
`scripts/merge_safetensors.py`	Fixed typo in logging message (Marging → Merging)
`scripts/download_export_irpa.sh`	Added support for FP8 and FP4 model variants with flexible repository mapping and IRPA export logic
`amdsharktank/amdsharktank/models/llama/tools/import_quark_dataset.py`	Updated model base choices to include mistral-70b and corrected layer count for 70b models from 40 to 80
`.github/workflows/ci_generate_irpa.yml`	Restructured workflow into two jobs with dedicated steps for each model configuration, including proper environment setup and cleanup

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-08T05:02:39Z

amdsharktank/amdsharktank/models/llama/tools/import_quark_dataset.py

    config_json_path: Path = args.config_json
    params_path: Path = args.params
    # TODO: find a way to get this programatically so we don't have to flag for it
    split_sizes = [4096, 4096, 4096] if args.model_base == "7b" else [8192, 1024, 1024]


The layer count for "70b" model base has been changed from 40 to 80. This is a significant change that should be verified. If this is a bug fix (i.e., the original value of 40 was incorrect for 70b models), it could affect existing workflows or previously generated models. Please ensure this change is intentional and properly tested.

Suggested change

split_sizes = [4096, 4096, 4096] if args.model_base == "7b" else [8192, 1024, 1024]

split_sizes = [4096, 4096, 4096] if args.model_base == "7b" else [8192, 1024, 1024]

# Layer counts for model bases:

# - "7b": 32 layers (Llama-2 7B)

# - "70b": 80 layers (Llama-2 70B, see: https://github.com/facebookresearch/llama/blob/main/docs/model_card.md)

# - "mistral-70b": 40 layers (Mistral 70B, see: https://docs.mistral.ai/models/)

# - "405b": 126 layers (custom model)

# The value for "70b" was updated from 40 to 80 to match the official Llama-2 70B architecture.

Copilot · 2025-12-08T05:02:40Z

scripts/download_export_irpa.sh

-        echo "--model            : Model to run (Llama-3.1-8B-Instruct, Llama-3.1-70B-Instruct, Mistral-Nemo-Instruct-2407-FP8 )"
-        echo "--hf-token        : Hugging face token with access to gated flux models"
+        echo "--model            : Model to run (Llama-3.1-8B-Instruct, Llama-3.1-70B-Instruct, Llama-3.1-8B-Instruct-FP8-KV, Llama-3.1-70B-Instruct-FP8-KV, Llama-3.1-405B-Instruct-MXFP4-Preview, Mistral-Nemo-Instruct-2407-FP8 )"
+        echo "--hf-token         : Hugging face token with access to gated flux models"


Typo in help message: "Hugging face" should be capitalized as "HuggingFace" (or "Hugging Face" with space), and "gated flux models" is misleading since this script is for Llama/Mistral models, not Flux models.

Suggested change

echo "--hf-token : Hugging face token with access to gated flux models"

echo "--hf-token : Hugging Face token with access to gated Llama/Mistral models"

Copilot · 2025-12-08T05:02:40Z

scripts/irpa_export_validate_upload.sh

+    --account-name sharkpublic \
+    --sas-token "$AZURE_SAS_TOKEN" \
+    --container-name ossci \
+    --name "$AZURE_BLOB_PATH/${IRPA_FILENAME%.irpa}-TEST.irpa" \


The upload logic uploads files with -TEST.irpa suffix instead of the original filename. This appears to be intentional for testing purposes, but the comment on line 213 says "Upload current version (overwrite)" which suggests this should be the production version. Consider clarifying whether this is the intended behavior or if it should upload to $IRPA_FILENAME instead.

Suggested change

--name "$AZURE_BLOB_PATH/${IRPA_FILENAME%.irpa}-TEST.irpa" \

--name "$AZURE_BLOB_PATH/$IRPA_FILENAME" \

Copilot · 2025-12-08T05:02:40Z

scripts/download_export_irpa.sh

+    if [[ $MODEL = "Llama-3.1-8B-Instruct-FP8-KV" || $MODEL = "Llama-3.1-70B-Instruct-FP8-KV" || $MODEL = "Llama-3.1-405B-Instruct-MXFP4-Preview" ]]; then
+        # Determine model-specific parameters
+        if [[ $MODEL = "Llama-3.1-8B-Instruct-FP8-KV" ]]; then
+            IRPA_OUTPUT="instruct_8b_fp8_e4m3fnuz.irpa"


[nitpick] The model base value "7b" is used for the 8B model. While this may be intentional for historical reasons or shared configuration, the inconsistency between the model size name (8B) and the model base identifier (7b) could be confusing. Consider adding a comment explaining why "7b" is used for 8B models.

Suggested change

IRPA_OUTPUT="instruct_8b_fp8_e4m3fnuz.irpa"

IRPA_OUTPUT="instruct_8b_fp8_e4m3fnuz.irpa"

# Note: The model base value "7b" is used for the 8B model due to historical reasons and shared configuration with downstream tools.

Copilot · 2025-12-08T05:02:40Z

.github/workflows/ci_generate_irpa.yml

-            --model Mistral-Nemo-Instruct-2407-FP8 \
-            --hf-token ${HF_TOKEN}
+          rm -rf Llama-3.1-8B-Instruct Llama-3.1-70B-Instruct Llama-3.1-8B-Instruct-FP8-KV output_artifacts
+          rm -f instruct_8b_fp8_e4m3fn.irpa instruct_8b_fp8_e4m3fn_previous.irpa instruct_70b_fp16_previous.irpa instruct_8b_fp16_previous.irpa


The cleanup step attempts to remove instruct_8b_fp8_e4m3fn.irpa but the actual file generated is instruct_8b_fp8_e4m3fnuz.irpa (with "uz" suffix). This cleanup command won't remove the correct file.

Suggested change

rm -f instruct_8b_fp8_e4m3fn.irpa instruct_8b_fp8_e4m3fn_previous.irpa instruct_70b_fp16_previous.irpa instruct_8b_fp16_previous.irpa

rm -f instruct_8b_fp8_e4m3fnuz.irpa instruct_8b_fp8_e4m3fn_previous.irpa instruct_70b_fp16_previous.irpa instruct_8b_fp16_previous.irpa

Copilot · 2025-12-08T05:02:41Z

scripts/irpa_export_validate_upload.sh

+  --tokenizer_config "$TOKENIZER_CONFIG_PATH" \
+  --steps "$STEPS" \
+  --kv-cache-dtype "$KV_CACHE_DTYPE" 2>&1 | tee "$(pwd)/output_artifacts/${MODEL_TAG}_run_llm_vmfb.log"
+VALIDATION_EXIT_CODE=${PIPESTATUS[0]}


[nitpick] The script uses PIPESTATUS[0] to capture the exit code from the piped command. While this works in bash, it would be clearer and more maintainable to capture the validation exit code before piping to tee, or use a temporary file for logging. The current approach is correct but could be fragile if the command structure changes.

Copilot · 2025-12-08T05:02:41Z

scripts/irpa_export_validate_upload.sh

+COMPILE_CMD="bash scripts/export_and_compile.sh --irpa $IRPA_PATH --bs-prefill $BS_PREFILL --bs-decode $BS_DECODE --iree-hip-target $IREE_HIP_TARGET"
+if [ -n "$DTYPE" ]; then
+  COMPILE_CMD="$COMPILE_CMD --dtype $DTYPE"
+fi
+$COMPILE_CMD 2>&1 | tee "$(pwd)/output_artifacts/${MODEL_TAG}_export_and_compilation.log" || { echo "Compilation failed"; exit 1; }


The COMPILE_CMD variable is built by concatenating unquoted user-controlled arguments (e.g., IRPA_PATH, BS_PREFILL, BS_DECODE, IREE_HIP_TARGET, DTYPE) into a single shell command string and then executed via $COMPILE_CMD, which allows command injection if any of these parameters contain shell metacharacters (e.g., ;, &&). An attacker who can influence the script arguments (for example via CI parameters or wrapper scripts) could inject arbitrary shell commands that will run with the privileges of this script. To fix this, avoid constructing a command string and instead invoke the script directly with properly quoted arguments, e.g. bash scripts/export_and_compile.sh --irpa "$IRPA_PATH" --bs-prefill "$BS_PREFILL" ... and pass --dtype "$DTYPE" conditionally as an additional argument rather than concatenating to a string.

Suggested change

COMPILE_CMD="bash scripts/export_and_compile.sh --irpa $IRPA_PATH --bs-prefill $BS_PREFILL --bs-decode $BS_DECODE --iree-hip-target $IREE_HIP_TARGET"

if [ -n "$DTYPE" ]; then

COMPILE_CMD="$COMPILE_CMD --dtype $DTYPE"

fi

$COMPILE_CMD 2>&1 | tee "$(pwd)/output_artifacts/${MODEL_TAG}_export_and_compilation.log" || { echo "Compilation failed"; exit 1; }

# Directly invoke the script with properly quoted arguments to prevent command injection

if [ -n "$DTYPE" ]; then

bash scripts/export_and_compile.sh \

--irpa "$IRPA_PATH" \

--bs-prefill "$BS_PREFILL" \

--bs-decode "$BS_DECODE" \

--iree-hip-target "$IREE_HIP_TARGET" \

--dtype "$DTYPE" 2>&1 | tee "$(pwd)/output_artifacts/${MODEL_TAG}_export_and_compilation.log" || { echo "Compilation failed"; exit 1; }

else

bash scripts/export_and_compile.sh \

--irpa "$IRPA_PATH" \

--bs-prefill "$BS_PREFILL" \

--bs-decode "$BS_DECODE" \

--iree-hip-target "$IREE_HIP_TARGET" 2>&1 | tee "$(pwd)/output_artifacts/${MODEL_TAG}_export_and_compilation.log" || { echo "Compilation failed"; exit 1; }

fi

Copilot · 2025-12-08T05:02:42Z

scripts/download_export_irpa.sh

+    echo "Downloading from repository: $REPO"
+    hf download $REPO --local-dir $MODEL


In download_model, the REPO and MODEL values are derived from command-line arguments (HF_REPO/MODEL) and then interpolated unquoted into the hf download $REPO --local-dir $MODEL command, which enables shell command injection if an attacker supplies a value containing shell metacharacters like ; or &&. This would let an attacker who can control --model or --hf-repo execute arbitrary commands with the privileges of this script (and potentially access the configured Hugging Face credentials). To mitigate this, always quote these variables (e.g., hf download "$REPO" --local-dir "$MODEL") and avoid accepting arbitrary repository strings unless they are validated against an allowlist of expected model names.

Copilot · 2025-12-08T05:02:42Z

scripts/download_export_irpa.sh

+        python scripts/merge_safetensors.py $MODEL
+        sudo mv merged.safetensors $MODEL/merged.safetensors


In the new FP8/MXFP4 branches, MODEL is taken directly from the --model argument and then used unquoted in commands like python scripts/merge_safetensors.py $MODEL and sudo mv merged.safetensors $MODEL/merged.safetensors, which allows a malicious model name containing shell metacharacters (e.g., ;, &&) to inject additional shell commands. An attacker who can control the --model parameter could leverage this to run arbitrary commands with the privileges of the script (and in the case of sudo, potentially escalate or modify privileged files). To fix this, treat MODEL as data by quoting it everywhere it is used in shell commands (e.g., python scripts/merge_safetensors.py "$MODEL", sudo mv merged.safetensors "$MODEL/merged.safetensors") and consider validating MODEL against a strict pattern or allowlist of expected names.

Copilot · 2025-12-08T05:02:42Z

scripts/download_export_irpa.sh

-                 --config-json $MODEL/config.json --model-base="70b" --weight-dtype=float16
+            IRPA_OUTPUT=mistral_nemo_2407_fp8_e4m3fnuz.irpa
+            python -m amdsharktank.models.llama.tools.import_quark_dataset --params merged.safetensors --output-irpa-file=$IRPA_OUTPUT \
+                 --config-json $MODEL/config.json --model-base="mistral-70b" --weight-dtype=float16


In the updated Mistral branch, MODEL remains unquoted in the --config-json $MODEL/config.json argument to the python -m amdsharktank.models.llama.tools.import_quark_dataset command, so a crafted --model value containing shell metacharacters could break out of the argument and inject additional commands. This allows arbitrary command execution with the privileges of this script if an attacker can influence the --model parameter. To remediate, always quote MODEL when building paths or arguments (e.g., --config-json "$MODEL/config.json") and, ideally, restrict MODEL to an allowlist of known-good model identifiers.

Suggested change

--config-json $MODEL/config.json --model-base="mistral-70b" --weight-dtype=float16

--config-json "$MODEL/config.json" --model-base="mistral-70b" --weight-dtype=float16

PhaneeshB force-pushed the gen_upload_irpa branch 2 times, most recently from b9dc6dd to e61dacf Compare November 12, 2025 16:23

PhaneeshB requested a review from pdhirajkumarprasad November 12, 2025 17:04

PhaneeshB force-pushed the gen_upload_irpa branch 10 times, most recently from f124137 to 0f29976 Compare November 25, 2025 10:23

PhaneeshB force-pushed the gen_upload_irpa branch from 0f29976 to 72ce3a8 Compare November 25, 2025 12:52

PhaneeshB force-pushed the gen_upload_irpa branch from 9adb66f to 2148517 Compare December 3, 2025 05:59

PhaneeshB added 14 commits December 8, 2025 04:54

add steps to upload llama 8b fp irpa

28179b9

clean changes for all single model single step

ef31709

refactor steps to script

ca640fc

add mistral and 405b fp4

4f52c4a

move 405b to use 355 runner

56dbf60

change datetime source

c75e7c3

HF home path DNE

fe68d9e

update tokenizer path for 355 runner

9f5e55a

torch-nightly and sanity checks

96b9a66

use iree source build for 355 runner

2515131

[DNM] Do not overwrite, upload irpa as test

0120470

use correct package name

528a3d1

exit on failed validation

8903710

runner artefacts path

12022de

PhaneeshB added 2 commits December 8, 2025 04:54

fix date update

da03041

cleanup

e708c3f

Copilot AI review requested due to automatic review settings December 8, 2025 04:54

PhaneeshB force-pushed the gen_upload_irpa branch from 2148517 to e708c3f Compare December 8, 2025 04:54

Copilot started reviewing on behalf of PhaneeshB December 8, 2025 04:55 View session

PhaneeshB commented Dec 8, 2025

View reviewed changes

Copilot AI reviewed Dec 8, 2025

View reviewed changes

yash-amd added 5 commits December 9, 2025 07:12

Test For 405B Irpa

3c28f03

Check For Files

03cb39a

Check For Files

5e4f095

Test With /shark-dev dir for tokenizer.json

361a285

Fix Typo

e93d069

-    split_sizes = [4096, 4096, 4096] if args.model_base == "7b" else [8192, 1024, 1024]
+    split_sizes = [4096, 4096, 4096] if args.model_base == "7b" else [8192, 1024, 1024]
+    # Layer counts for model bases:
+    # - "7b": 32 layers (Llama-2 7B)
+    # - "70b": 80 layers (Llama-2 70B, see: https://github.com/facebookresearch/llama/blob/main/docs/model_card.md)
+    # - "mistral-70b": 40 layers (Mistral 70B, see: https://docs.mistral.ai/models/)
+    # - "405b": 126 layers (custom model)
+    # The value for "70b" was updated from 40 to 80 to match the official Llama-2 70B architecture.

	echo "--hf-token : Hugging face token with access to gated flux models"
	echo "--hf-token : Hugging Face token with access to gated Llama/Mistral models"

	--name "$AZURE_BLOB_PATH/${IRPA_FILENAME%.irpa}-TEST.irpa" \
	--name "$AZURE_BLOB_PATH/$IRPA_FILENAME" \

	IRPA_OUTPUT="instruct_8b_fp8_e4m3fnuz.irpa"
	IRPA_OUTPUT="instruct_8b_fp8_e4m3fnuz.irpa"
	# Note: The model base value "7b" is used for the 8B model due to historical reasons and shared configuration with downstream tools.

	rm -f instruct_8b_fp8_e4m3fn.irpa instruct_8b_fp8_e4m3fn_previous.irpa instruct_70b_fp16_previous.irpa instruct_8b_fp16_previous.irpa
	rm -f instruct_8b_fp8_e4m3fnuz.irpa instruct_8b_fp8_e4m3fn_previous.irpa instruct_70b_fp16_previous.irpa instruct_8b_fp16_previous.irpa

-COMPILE_CMD="bash scripts/export_and_compile.sh --irpa $IRPA_PATH --bs-prefill $BS_PREFILL --bs-decode $BS_DECODE --iree-hip-target $IREE_HIP_TARGET"
-if [ -n "$DTYPE" ]; then
-  COMPILE_CMD="$COMPILE_CMD --dtype $DTYPE"
-fi
-$COMPILE_CMD 2>&1 | tee "$(pwd)/output_artifacts/${MODEL_TAG}_export_and_compilation.log" || { echo "Compilation failed"; exit 1; }
+# Directly invoke the script with properly quoted arguments to prevent command injection
+if [ -n "$DTYPE" ]; then
+  bash scripts/export_and_compile.sh \
+    --irpa "$IRPA_PATH" \
+    --bs-prefill "$BS_PREFILL" \
+    --bs-decode "$BS_DECODE" \
+    --iree-hip-target "$IREE_HIP_TARGET" \
+    --dtype "$DTYPE" 2>&1 | tee "$(pwd)/output_artifacts/${MODEL_TAG}_export_and_compilation.log" || { echo "Compilation failed"; exit 1; }
+else
+  bash scripts/export_and_compile.sh \
+    --irpa "$IRPA_PATH" \
+    --bs-prefill "$BS_PREFILL" \
+    --bs-decode "$BS_DECODE" \
+    --iree-hip-target "$IREE_HIP_TARGET" 2>&1 | tee "$(pwd)/output_artifacts/${MODEL_TAG}_export_and_compilation.log" || { echo "Compilation failed"; exit 1; }
+fi

		echo "Downloading from repository: $REPO"
		hf download $REPO --local-dir $MODEL

		python scripts/merge_safetensors.py $MODEL
		sudo mv merged.safetensors $MODEL/merged.safetensors

	--config-json $MODEL/config.json --model-base="mistral-70b" --weight-dtype=float16
	--config-json "$MODEL/config.json" --model-base="mistral-70b" --weight-dtype=float16

CI - Generate and Upload IRPA files #2640

Are you sure you want to change the base?

CI - Generate and Upload IRPA files #2640

Conversation

PhaneeshB commented Nov 4, 2025

Uh oh!

github-actions bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage report

Uh oh!

PhaneeshB Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Nov 4, 2025 •

edited

Loading