-
-
Notifications
You must be signed in to change notification settings - Fork 3.3k
feat(ROCm): Allow selecting ROCm version when building llama.cpp backend #7615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for localai ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
55c9137 to
3540ea3
Compare
|
Unfortunately my testing for this has been unsuccessful. LocalAI loads the model, then at the point where it says it's doing a dry run to warm the caches it gives a generic error saying it has failed. I will debug some more, but any tips are welcome! |
The same shape of crash happens even with the current release of the ROCm backend! Curious. It looks like the release version is actually using a v6 version of ROCm so should regardless be compatible with my GPUs. I'll see if I can figure out what is going wrong. |
9004896 to
aa2a232
Compare
|
Responding to myself:
Yes. The base-level dockerfile needs to contain all the runtime libraries.
I don't know where I read this, but it is apparently false. I don't know what the last version was which supported GCN5.0, but it was quite ancient (before 5.0.0 which was released in 2022). Given the lack of support, it appears likely that a llama.cpp change is trying to use a feature which my cards do not support. LocalAi v2.29.0 works fine with ROCm backend, while the Local AI ROCm backend tagged v3.2.0 throws an error indicating "MUL_MAT" is not supported. I will experiment a bit more, but I think the best move for me is to switch to Vulkan. |
Dockerfile
Outdated
| ARG CUDA_MAJOR_VERSION=12 | ||
| ARG CUDA_MINOR_VERSION=0 | ||
| ARG ROCM_MAJOR_VERSION=5 | ||
| ARG ROCM_MINOR_VERSION=5.1 # ROCm version to append to the major version, in the format of their apt repo (https://repo.radeon.com/rocm/apt/). Like `0_alpha` or `3.4`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is to update the ROCM_MAJOR_VERSION and ROCM_MINOR_VERSION to match whatever is deployed in LocalAI today. I haven't 100% figured this out. I believe it is v5.5.1, as that is the version in the Ubuntu 22.04 repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD has tricked me again :)
hipblas-dev and rocblas-dev are not in the Ubuntu repos. This Dockerfile gets them because in the build we override BASE_IMAGE with BASE_IMAGE="rocm/dev-ubuntu-22.04:{tag}" -- This defines which version of ROCm libraries become available.
As of this writing, v3.8.0 of LocalAI was built with v6.4.3 of ROCm. That's what I'll use for the defaults here.
|
Vega 10 cards aside, I think this PR still has value. I have tested it with the latest release (7.1.1) on my laptop with an integrated 780m (gfx1103). It fails because it is the primary GPU for my system so ROCm ends up fighting over the memory with my display, but it does appear to basically work, so I suspect this PR would enable support for other newer GPUs as well. |
a7f7e9c to
2c45d57
Compare
Correct, unless there are runtime deps that are really required.
I would suggest to just do the changes that you are able to test. We can have a phased approach for the other backends as necessary
Yes absolutely!
yes sounds good. We are actually really close for the next release and there are already few changes to the images (introducing cuda 13 for example), so this fits perfectly in line with that |
238e1a8 to
9bc5fba
Compare
9bc5fba to
104376a
Compare
|
Slight commit history disaster due to the merge 😄 I will mark this PR as non-draft now. I have made the CI changes and the breaking changes to the image names. While I cannot reliably test this change, I believe it at least will do no harm. The v6 images should be identical to before, and if the v7 images are in some way broken, consumers can choose to not use them. |
Signed-off-by: Simon Redman <[email protected]>
Signed-off-by: Simon Redman <[email protected]>
Signed-off-by: Simon Redman <[email protected]>
Signed-off-by: Simon Redman <[email protected]>
…have been replaced by the libraries from AMD Signed-off-by: Simon Redman <[email protected]>
Signed-off-by: Simon Redman <[email protected]>
… as previously needed Signed-off-by: Simon Redman <[email protected]>
Signed-off-by: Simon Redman <[email protected]>
Signed-off-by: Simon Redman <[email protected]>
Signed-off-by: Simon Redman <[email protected]>
Signed-off-by: Simon Redman <[email protected]>
Signed-off-by: Simon Redman <[email protected]>
Signed-off-by: Simon Redman <[email protected]>
Signed-off-by: Simon Redman <[email protected]>
104376a to
e8f17b7
Compare
| context: "./backend" | ||
| ubuntu-version: '2204' | ||
| - build-type: 'hipblas' | ||
| cuda-major-version: "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should actually add the new rocm arguments here: https://github.com/mudler/LocalAI/blob/master/.github/workflows/backend_build.yml#L15
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand this comment correctly. Do you mean simply add rocm-*-version around L15 of backend_build.yaml? I have done that. If you mean something else, please guide me 😄
Signed-off-by: Simon Redman <[email protected]>
Signed-off-by: Simon Redman <[email protected]>
…of 'amd' Signed-off-by: Simon Redman <[email protected]>
Signed-off-by: Simon Redman <[email protected]>
4ce161b to
cfeeff8
Compare
Signed-off-by: Simon Redman <[email protected]>
Signed-off-by: Simon Redman <[email protected]>
Description
This PR implements the ability to select which version of ROCm to use when building llama.cpp, in a similar fashion to how we are able to select which version of CUDA to use.
Notes for Reviewers
The motivation is that perfectly usable AMD cards are unsupported by newer ROCm versions. The Vulkan build works excellently on these cards but is not good for distributed inference as it cannot do parallel processing, resulting in sequential layer processing, according to the docs and my testing. Some say it is also not as performant.
I have some questions about how to finish this PR and I have not yet been able to test the actual backeds in my setup, thus leaving this PR as draft for now.
I have built with both version 6.3.3 and 7.1.1 (latest). I will only be able to tests with 6.3.3 since I only have Vega 10 GPUs available.
Some questions for maintainers:
*-gpu-amd-rocm-6-llama-cppand*-gpu-amd-rocm-7-llama-cpp.Signed commits