Skip to content

Conversation

@kahmed10
Copy link
Collaborator

Motivation

Architectures like MI300 series can have varying number of chiplets based on the compute partition mode.

Technical Details

Use HSA runtime to get number of chiplets. Similar to what rocMLIR does in native mode. Currently rocMLIR will not consume this, but they are working on adding that support and it's not blocking for this PR.

Changelog Category

    • Added: New functionality.
    • Changed: Changes to existing functionality.
    • Removed: Functionality or support that has been removed. (Compared to a previous release)
    • Optimized: Component performance that has been optimized or improved.
    • Resolved Issues: Known issues from a previous version that have been resolved.
    • Not Applicable: This PR is not to be included in the changelog.

@kahmed10 kahmed10 requested a review from causten as a code owner December 10, 2025 20:38
bool found;
};

hsa_status_t status = hsa_init();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Callback function for hsa_iterate_agents
// GPUs are enumerated in the same order as HIP device IDs
auto agent_callback = [](hsa_agent_t agent, void* data) -> hsa_status_t {
auto* info = static_cast<agent_info*>(data);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it confusing that we have info variable in line 238 as well

info.found = false;

// Callback function for hsa_iterate_agents
// GPUs are enumerated in the same order as HIP device IDs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this always the case? any link to the docs?

};

// Iterate through all HSA agents to find matching GPU
status = hsa_iterate_agents(agent_callback, &info);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check status

};

hsa_status_t status = hsa_init();
if(status != HSA_STATUS_SUCCESS)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would refactor this into a macro called RET_IF_HSA_ERR and reuse it

std::size_t target_device_id;
std::size_t gpu_count;
uint32_t num_chiplets;
bool found;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like found is set but not used. Do we want to check if !found at the end of the agent enumeration and throw an error in that case?

// HSA is only available on non-Windows platforms
#ifndef _WIN32
#include "hsa/hsa.h"
#include "hsa/hsa_ext_amd.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Includes should use angle brackets <..>.

status = hsa_iterate_agents(agent_callback, &info);

hsa_shut_down();
return info.num_chiplets;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move all the HSA to a seperate function in a .cpp file? As this gets included by everyone.

// Iterate through all HSA agents to find matching GPU
status = hsa_iterate_agents(agent_callback, &info);

hsa_shut_down();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to call init and shut_down everytime? That might be expensive. We should probably collect all chiplets counts for all devices and store it in vector so we can query only once.

Also if this is necessary(I am not sure this is the case as hip still needs to use hsa) then this should be called in a destructor so its always called. Could make a class that calls hsa_init in the constructor and hsa_shut_down in the desctuctor.

@pfultz2
Copy link
Collaborator

pfultz2 commented Dec 11, 2025

cmake needs to be updated to call find_package and link in hsa. It works now because hsa is installed in the same directory as other dependencies but that is not always the case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants