Skip to content

Conversation

@sredman
Copy link
Contributor

@sredman sredman commented Dec 16, 2025

Description

This PR implements the ability to select which version of ROCm to use when building llama.cpp, in a similar fashion to how we are able to select which version of CUDA to use.

Notes for Reviewers

The motivation is that perfectly usable AMD cards are unsupported by newer ROCm versions. The Vulkan build works excellently on these cards but is not good for distributed inference as it cannot do parallel processing, resulting in sequential layer processing, according to the docs and my testing. Some say it is also not as performant.

I have some questions about how to finish this PR and I have not yet been able to test the actual backeds in my setup, thus leaving this PR as draft for now.

I have built with both version 6.3.3 and 7.1.1 (latest). I will only be able to tests with 6.3.3 since I only have Vega 10 GPUs available.

Some questions for maintainers:

  • Should I include these flags in the base-level Dockerfile? Since backends are now built/shipped separately, I assume the usage in the base-level Dockerfile are no longer necessary.
  • Should I make similar changes to the other backend Dockerfiles? I likely will only be able to test llama-cpp.
  • Would you accept changing the github pipelines to build a v6 and v7 version of ROCm, similar to how we have a v11 and v12 build for CUDA?
    • Assuming the answer to the above question is "yes", would it be OK to make a breaking change to the ROCm backend name/tag, to better match the CUDA build? So we'd have something like *-gpu-amd-rocm-6-llama-cpp and *-gpu-amd-rocm-7-llama-cpp.

Signed commits

  • Yes, I signed my commits.

@netlify
Copy link

netlify bot commented Dec 16, 2025

Deploy Preview for localai ready!

Name Link
🔨 Latest commit 4941f13
🔍 Latest deploy log https://app.netlify.com/projects/localai/deploys/694865145d40560008c91d7c
😎 Deploy Preview https://deploy-preview-7615--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@sredman sredman force-pushed the work/sredman/rocm-version-control branch from 55c9137 to 3540ea3 Compare December 16, 2025 19:28
@sredman
Copy link
Contributor Author

sredman commented Dec 17, 2025

Unfortunately my testing for this has been unsuccessful. LocalAI loads the model, then at the point where it says it's doing a dry run to warm the caches it gives a generic error saying it has failed. I will debug some more, but any tips are welcome!

@sredman
Copy link
Contributor Author

sredman commented Dec 17, 2025

Unfortunately my testing for this has been unsuccessful. LocalAI loads the model, then at the point where it says it's doing a dry run to warm the caches it gives a generic error saying it has failed. I will debug some more, but any tips are welcome!

The same shape of crash happens even with the current release of the ROCm backend! Curious. It looks like the release version is actually using a v6 version of ROCm so should regardless be compatible with my GPUs. I'll see if I can figure out what is going wrong.

@sredman sredman force-pushed the work/sredman/rocm-version-control branch 2 times, most recently from 9004896 to aa2a232 Compare December 18, 2025 15:09
@sredman
Copy link
Contributor Author

sredman commented Dec 19, 2025

Responding to myself:

Should I include these flags in the base-level Dockerfile? Since backends are now built/shipped separately, I assume the usage in the base-level Dockerfile are no longer necessary.

Yes. The base-level dockerfile needs to contain all the runtime libraries.

... ROCm 6.3.3 is the last release which supports GCN5.0 (Vega 10) cards ...

I don't know where I read this, but it is apparently false. I don't know what the last version was which supported GCN5.0, but it was quite ancient (before 5.0.0 which was released in 2022).

Given the lack of support, it appears likely that a llama.cpp change is trying to use a feature which my cards do not support. LocalAi v2.29.0 works fine with ROCm backend, while the Local AI ROCm backend tagged v3.2.0 throws an error indicating "MUL_MAT" is not supported. I will experiment a bit more, but I think the best move for me is to switch to Vulkan.

Dockerfile Outdated
ARG CUDA_MAJOR_VERSION=12
ARG CUDA_MINOR_VERSION=0
ARG ROCM_MAJOR_VERSION=5
ARG ROCM_MINOR_VERSION=5.1 # ROCm version to append to the major version, in the format of their apt repo (https://repo.radeon.com/rocm/apt/). Like `0_alpha` or `3.4`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to update the ROCM_MAJOR_VERSION and ROCM_MINOR_VERSION to match whatever is deployed in LocalAI today. I haven't 100% figured this out. I believe it is v5.5.1, as that is the version in the Ubuntu 22.04 repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD has tricked me again :)
hipblas-dev and rocblas-dev are not in the Ubuntu repos. This Dockerfile gets them because in the build we override BASE_IMAGE with BASE_IMAGE="rocm/dev-ubuntu-22.04:{tag}" -- This defines which version of ROCm libraries become available.
As of this writing, v3.8.0 of LocalAI was built with v6.4.3 of ROCm. That's what I'll use for the defaults here.

@sredman
Copy link
Contributor Author

sredman commented Dec 19, 2025

Vega 10 cards aside, I think this PR still has value. I have tested it with the latest release (7.1.1) on my laptop with an integrated 780m (gfx1103). It fails because it is the primary GPU for my system so ROCm ends up fighting over the memory with my display, but it does appear to basically work, so I suspect this PR would enable support for other newer GPUs as well.

@sredman sredman force-pushed the work/sredman/rocm-version-control branch from a7f7e9c to 2c45d57 Compare December 19, 2025 15:32
@mudler mudler marked this pull request as ready for review December 19, 2025 20:31
@mudler mudler marked this pull request as draft December 19, 2025 20:31
@mudler
Copy link
Owner

mudler commented Dec 19, 2025

Some questions for maintainers:

* Should I include these flags in the base-level Dockerfile? Since backends are now built/shipped separately, I assume the usage in the base-level Dockerfile are no longer necessary.

Correct, unless there are runtime deps that are really required.

* Should I make similar changes to the other backend Dockerfiles? I likely will only be able to test llama-cpp.

I would suggest to just do the changes that you are able to test. We can have a phased approach for the other backends as necessary

* Would you accept changing the github pipelines to build a v6 and v7 version of ROCm, similar to how we have a v11 and v12 build for CUDA?

Yes absolutely!

  * Assuming the answer to the above question is "yes", would it be OK to make a breaking change to the ROCm backend name/tag, to better match the CUDA build? So we'd have something like `*-gpu-amd-rocm-6-llama-cpp` and `*-gpu-amd-rocm-7-llama-cpp`.

yes sounds good. We are actually really close for the next release and there are already few changes to the images (introducing cuda 13 for example), so this fits perfectly in line with that

@sredman sredman force-pushed the work/sredman/rocm-version-control branch from 238e1a8 to 9bc5fba Compare December 20, 2025 01:34
@sredman sredman force-pushed the work/sredman/rocm-version-control branch from 9bc5fba to 104376a Compare December 20, 2025 01:34
@sredman
Copy link
Contributor Author

sredman commented Dec 20, 2025

Slight commit history disaster due to the merge 😄

I will mark this PR as non-draft now. I have made the CI changes and the breaking changes to the image names. While I cannot reliably test this change, I believe it at least will do no harm. The v6 images should be identical to before, and if the v7 images are in some way broken, consumers can choose to not use them.

@sredman sredman marked this pull request as ready for review December 20, 2025 01:36
@sredman sredman force-pushed the work/sredman/rocm-version-control branch from 104376a to e8f17b7 Compare December 20, 2025 01:38
context: "./backend"
ubuntu-version: '2204'
- build-type: 'hipblas'
cuda-major-version: ""
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand this comment correctly. Do you mean simply add rocm-*-version around L15 of backend_build.yaml? I have done that. If you mean something else, please guide me 😄

@sredman sredman force-pushed the work/sredman/rocm-version-control branch from 4ce161b to cfeeff8 Compare December 21, 2025 21:08
@github-actions github-actions bot added the kind/documentation Improvements or additions to documentation label Dec 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies kind/documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants