Skip to content

Conversation

@jeffbolznv
Copy link
Collaborator

Just need to add the fusion detection logic, this is a combination of existing modes (early softmax, bias, norm, scale), and is covered by the existing backend tests.

before

Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo>llama-bench -m c:\models\GLM-4.7-Flash-Q4_K_M.gguf -r 10 -fa 1 -p 512 -n 128
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 5090 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
load_backend: loaded Vulkan backend from Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo\ggml-vulkan.dll
load_backend: loaded CPU backend from Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo\ggml-cpu.dll
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| deepseek2 ?B Q4_K - Medium     |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |           pp512 |      8434.22 ± 37.67 |
| deepseek2 ?B Q4_K - Medium     |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |           tg128 |       185.26 ± 16.12 |

after

Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo>llama-bench -m c:\models\GLM-4.7-Flash-Q4_K_M.gguf -r 10 -fa 1 -p 512 -n 128
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 5090 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
load_backend: loaded Vulkan backend from Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo\ggml-vulkan.dll
load_backend: loaded CPU backend from Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo\ggml-cpu.dll
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| deepseek2 ?B Q4_K - Medium     |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |           pp512 |      8504.07 ± 57.02 |
| deepseek2 ?B Q4_K - Medium     |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |           tg128 |       206.38 ± 16.00 |

Just need to add the fusion detection logic, this is a combination of
existing modes (early softmax, bias, norm, scale), and is covered by
the existing backend tests.
@jeffbolznv jeffbolznv requested a review from 0cc4m as a code owner January 20, 2026 04:39
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jan 20, 2026
@jeffbolznv
Copy link
Collaborator Author

This may not be needed after #18980, I'll check once that lands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant