Skip to content

Conversation

@PhaneeshB
Copy link
Contributor

  • Add steps for llama_8b_fp8 model

@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2025

Coverage report

This PR does not seem to contain any modification to coverable code.

@PhaneeshB PhaneeshB force-pushed the gen_upload_irpa branch 2 times, most recently from b9dc6dd to e61dacf Compare November 12, 2025 16:23
@PhaneeshB PhaneeshB force-pushed the gen_upload_irpa branch 10 times, most recently from f124137 to 0f29976 Compare November 25, 2025 10:23
Copilot AI review requested due to automatic review settings December 8, 2025 04:54
--account-name sharkpublic \
--sas-token "$AZURE_SAS_TOKEN" \
--container-name ossci \
--name "$AZURE_BLOB_PATH/${IRPA_FILENAME%.irpa}-TEST.irpa" \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-TEST suffix is intention until all testing is completed (verified ipra files being uploaded in the correct location)
because it overwrites the existing file.
Only remove this after the PR is ready to merge.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates the CI workflow for generating and uploading IRPA (Intermediate Representation Package Archive) files for various LLama model configurations. The primary focus is adding support for FP8 quantized models, particularly the llama_8b_fp8 variant, while refactoring the workflow into a more maintainable structure.

Key Changes:

  • Added a new consolidated script (irpa_export_validate_upload.sh) that handles the complete workflow of exporting, compiling, validating, and uploading IRPA files
  • Extended model support to include FP8 and FP4 quantized variants (8B, 70B, 405B, and Mistral models)
  • Split the CI workflow into two separate jobs: one for smaller models (8B/70B/Mistral) and one for the 405B model requiring specialized hardware

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
scripts/irpa_export_validate_upload.sh New consolidated script that orchestrates the export, compile, validate, and upload workflow for IRPA files with comprehensive parameter handling
scripts/validate_numerics.sh Improved error handling by introducing FINAL_RESULT to accumulate errors across all validation prompts
scripts/setenv.sh Fixed typo in comment (amdsharktank → sharktank)
scripts/merge_safetensors.py Fixed typo in logging message (Marging → Merging)
scripts/download_export_irpa.sh Added support for FP8 and FP4 model variants with flexible repository mapping and IRPA export logic
amdsharktank/amdsharktank/models/llama/tools/import_quark_dataset.py Updated model base choices to include mistral-70b and corrected layer count for 70b models from 40 to 80
.github/workflows/ci_generate_irpa.yml Restructured workflow into two jobs with dedicated steps for each model configuration, including proper environment setup and cleanup

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

config_json_path: Path = args.config_json
params_path: Path = args.params
# TODO: find a way to get this programatically so we don't have to flag for it
split_sizes = [4096, 4096, 4096] if args.model_base == "7b" else [8192, 1024, 1024]
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The layer count for "70b" model base has been changed from 40 to 80. This is a significant change that should be verified. If this is a bug fix (i.e., the original value of 40 was incorrect for 70b models), it could affect existing workflows or previously generated models. Please ensure this change is intentional and properly tested.

Suggested change
split_sizes = [4096, 4096, 4096] if args.model_base == "7b" else [8192, 1024, 1024]
split_sizes = [4096, 4096, 4096] if args.model_base == "7b" else [8192, 1024, 1024]
# Layer counts for model bases:
# - "7b": 32 layers (Llama-2 7B)
# - "70b": 80 layers (Llama-2 70B, see: https://github.com/facebookresearch/llama/blob/main/docs/model_card.md)
# - "mistral-70b": 40 layers (Mistral 70B, see: https://docs.mistral.ai/models/)
# - "405b": 126 layers (custom model)
# The value for "70b" was updated from 40 to 80 to match the official Llama-2 70B architecture.

Copilot uses AI. Check for mistakes.
echo "--model : Model to run (Llama-3.1-8B-Instruct, Llama-3.1-70B-Instruct, Mistral-Nemo-Instruct-2407-FP8 )"
echo "--hf-token : Hugging face token with access to gated flux models"
echo "--model : Model to run (Llama-3.1-8B-Instruct, Llama-3.1-70B-Instruct, Llama-3.1-8B-Instruct-FP8-KV, Llama-3.1-70B-Instruct-FP8-KV, Llama-3.1-405B-Instruct-MXFP4-Preview, Mistral-Nemo-Instruct-2407-FP8 )"
echo "--hf-token : Hugging face token with access to gated flux models"
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in help message: "Hugging face" should be capitalized as "HuggingFace" (or "Hugging Face" with space), and "gated flux models" is misleading since this script is for Llama/Mistral models, not Flux models.

Suggested change
echo "--hf-token : Hugging face token with access to gated flux models"
echo "--hf-token : Hugging Face token with access to gated Llama/Mistral models"

Copilot uses AI. Check for mistakes.
--account-name sharkpublic \
--sas-token "$AZURE_SAS_TOKEN" \
--container-name ossci \
--name "$AZURE_BLOB_PATH/${IRPA_FILENAME%.irpa}-TEST.irpa" \
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upload logic uploads files with -TEST.irpa suffix instead of the original filename. This appears to be intentional for testing purposes, but the comment on line 213 says "Upload current version (overwrite)" which suggests this should be the production version. Consider clarifying whether this is the intended behavior or if it should upload to $IRPA_FILENAME instead.

Suggested change
--name "$AZURE_BLOB_PATH/${IRPA_FILENAME%.irpa}-TEST.irpa" \
--name "$AZURE_BLOB_PATH/$IRPA_FILENAME" \

Copilot uses AI. Check for mistakes.
if [[ $MODEL = "Llama-3.1-8B-Instruct-FP8-KV" || $MODEL = "Llama-3.1-70B-Instruct-FP8-KV" || $MODEL = "Llama-3.1-405B-Instruct-MXFP4-Preview" ]]; then
# Determine model-specific parameters
if [[ $MODEL = "Llama-3.1-8B-Instruct-FP8-KV" ]]; then
IRPA_OUTPUT="instruct_8b_fp8_e4m3fnuz.irpa"
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The model base value "7b" is used for the 8B model. While this may be intentional for historical reasons or shared configuration, the inconsistency between the model size name (8B) and the model base identifier (7b) could be confusing. Consider adding a comment explaining why "7b" is used for 8B models.

Suggested change
IRPA_OUTPUT="instruct_8b_fp8_e4m3fnuz.irpa"
IRPA_OUTPUT="instruct_8b_fp8_e4m3fnuz.irpa"
# Note: The model base value "7b" is used for the 8B model due to historical reasons and shared configuration with downstream tools.

Copilot uses AI. Check for mistakes.
--model Mistral-Nemo-Instruct-2407-FP8 \
--hf-token ${HF_TOKEN}
rm -rf Llama-3.1-8B-Instruct Llama-3.1-70B-Instruct Llama-3.1-8B-Instruct-FP8-KV output_artifacts
rm -f instruct_8b_fp8_e4m3fn.irpa instruct_8b_fp8_e4m3fn_previous.irpa instruct_70b_fp16_previous.irpa instruct_8b_fp16_previous.irpa
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup step attempts to remove instruct_8b_fp8_e4m3fn.irpa but the actual file generated is instruct_8b_fp8_e4m3fnuz.irpa (with "uz" suffix). This cleanup command won't remove the correct file.

Suggested change
rm -f instruct_8b_fp8_e4m3fn.irpa instruct_8b_fp8_e4m3fn_previous.irpa instruct_70b_fp16_previous.irpa instruct_8b_fp16_previous.irpa
rm -f instruct_8b_fp8_e4m3fnuz.irpa instruct_8b_fp8_e4m3fn_previous.irpa instruct_70b_fp16_previous.irpa instruct_8b_fp16_previous.irpa

Copilot uses AI. Check for mistakes.
--tokenizer_config "$TOKENIZER_CONFIG_PATH" \
--steps "$STEPS" \
--kv-cache-dtype "$KV_CACHE_DTYPE" 2>&1 | tee "$(pwd)/output_artifacts/${MODEL_TAG}_run_llm_vmfb.log"
VALIDATION_EXIT_CODE=${PIPESTATUS[0]}
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The script uses PIPESTATUS[0] to capture the exit code from the piped command. While this works in bash, it would be clearer and more maintainable to capture the validation exit code before piping to tee, or use a temporary file for logging. The current approach is correct but could be fragile if the command structure changes.

Copilot uses AI. Check for mistakes.
Comment on lines +150 to +154
COMPILE_CMD="bash scripts/export_and_compile.sh --irpa $IRPA_PATH --bs-prefill $BS_PREFILL --bs-decode $BS_DECODE --iree-hip-target $IREE_HIP_TARGET"
if [ -n "$DTYPE" ]; then
COMPILE_CMD="$COMPILE_CMD --dtype $DTYPE"
fi
$COMPILE_CMD 2>&1 | tee "$(pwd)/output_artifacts/${MODEL_TAG}_export_and_compilation.log" || { echo "Compilation failed"; exit 1; }
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The COMPILE_CMD variable is built by concatenating unquoted user-controlled arguments (e.g., IRPA_PATH, BS_PREFILL, BS_DECODE, IREE_HIP_TARGET, DTYPE) into a single shell command string and then executed via $COMPILE_CMD, which allows command injection if any of these parameters contain shell metacharacters (e.g., ;, &&). An attacker who can influence the script arguments (for example via CI parameters or wrapper scripts) could inject arbitrary shell commands that will run with the privileges of this script. To fix this, avoid constructing a command string and instead invoke the script directly with properly quoted arguments, e.g. bash scripts/export_and_compile.sh --irpa "$IRPA_PATH" --bs-prefill "$BS_PREFILL" ... and pass --dtype "$DTYPE" conditionally as an additional argument rather than concatenating to a string.

Suggested change
COMPILE_CMD="bash scripts/export_and_compile.sh --irpa $IRPA_PATH --bs-prefill $BS_PREFILL --bs-decode $BS_DECODE --iree-hip-target $IREE_HIP_TARGET"
if [ -n "$DTYPE" ]; then
COMPILE_CMD="$COMPILE_CMD --dtype $DTYPE"
fi
$COMPILE_CMD 2>&1 | tee "$(pwd)/output_artifacts/${MODEL_TAG}_export_and_compilation.log" || { echo "Compilation failed"; exit 1; }
# Directly invoke the script with properly quoted arguments to prevent command injection
if [ -n "$DTYPE" ]; then
bash scripts/export_and_compile.sh \
--irpa "$IRPA_PATH" \
--bs-prefill "$BS_PREFILL" \
--bs-decode "$BS_DECODE" \
--iree-hip-target "$IREE_HIP_TARGET" \
--dtype "$DTYPE" 2>&1 | tee "$(pwd)/output_artifacts/${MODEL_TAG}_export_and_compilation.log" || { echo "Compilation failed"; exit 1; }
else
bash scripts/export_and_compile.sh \
--irpa "$IRPA_PATH" \
--bs-prefill "$BS_PREFILL" \
--bs-decode "$BS_DECODE" \
--iree-hip-target "$IREE_HIP_TARGET" 2>&1 | tee "$(pwd)/output_artifacts/${MODEL_TAG}_export_and_compilation.log" || { echo "Compilation failed"; exit 1; }
fi

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +34
echo "Downloading from repository: $REPO"
hf download $REPO --local-dir $MODEL
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In download_model, the REPO and MODEL values are derived from command-line arguments (HF_REPO/MODEL) and then interpolated unquoted into the hf download $REPO --local-dir $MODEL command, which enables shell command injection if an attacker supplies a value containing shell metacharacters like ; or &&. This would let an attacker who can control --model or --hf-repo execute arbitrary commands with the privileges of this script (and potentially access the configured Hugging Face credentials). To mitigate this, always quote these variables (e.g., hf download "$REPO" --local-dir "$MODEL") and avoid accepting arbitrary repository strings unless they are validated against an allowlist of expected model names.

Copilot uses AI. Check for mistakes.
Comment on lines +96 to +97
python scripts/merge_safetensors.py $MODEL
sudo mv merged.safetensors $MODEL/merged.safetensors
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new FP8/MXFP4 branches, MODEL is taken directly from the --model argument and then used unquoted in commands like python scripts/merge_safetensors.py $MODEL and sudo mv merged.safetensors $MODEL/merged.safetensors, which allows a malicious model name containing shell metacharacters (e.g., ;, &&) to inject additional shell commands. An attacker who can control the --model parameter could leverage this to run arbitrary commands with the privileges of the script (and in the case of sudo, potentially escalate or modify privileged files). To fix this, treat MODEL as data by quoting it everywhere it is used in shell commands (e.g., python scripts/merge_safetensors.py "$MODEL", sudo mv merged.safetensors "$MODEL/merged.safetensors") and consider validating MODEL against a strict pattern or allowlist of expected names.

Copilot uses AI. Check for mistakes.
--config-json $MODEL/config.json --model-base="70b" --weight-dtype=float16
IRPA_OUTPUT=mistral_nemo_2407_fp8_e4m3fnuz.irpa
python -m amdsharktank.models.llama.tools.import_quark_dataset --params merged.safetensors --output-irpa-file=$IRPA_OUTPUT \
--config-json $MODEL/config.json --model-base="mistral-70b" --weight-dtype=float16
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the updated Mistral branch, MODEL remains unquoted in the --config-json $MODEL/config.json argument to the python -m amdsharktank.models.llama.tools.import_quark_dataset command, so a crafted --model value containing shell metacharacters could break out of the argument and inject additional commands. This allows arbitrary command execution with the privileges of this script if an attacker can influence the --model parameter. To remediate, always quote MODEL when building paths or arguments (e.g., --config-json "$MODEL/config.json") and, ideally, restrict MODEL to an allowlist of known-good model identifiers.

Suggested change
--config-json $MODEL/config.json --model-base="mistral-70b" --weight-dtype=float16
--config-json "$MODEL/config.json" --model-base="mistral-70b" --weight-dtype=float16

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants