mps: add nf4 dequantize/quantize kernel #1790

bghira · 2025-10-29T15:09:51Z

this ports the CUDA NF4 support to Metal.

so far, I've targeted nf4 quant/dequant because it's one of the least-accessible formats for Mac users.

we're using uint8 under the hood. for what it's worth, Metal (and the underlying hardware) lacks fp8/fp4 support.

performance has not been the forefront of this effort, as most of the time was spent determining how to plug metallib into bitsandbytes and correctly build it.

I'd like some feedback on this approach, because due to my inexperience with your build toolchain, it's highly likely I've done things in ways that can be improved.

I'm building on lessons I'd learnt while building a pytorch custom op for universal-metal-flash-attention, namely the way the MTLBuffers are retrieved from torch MPSGraph objects, which required the use of the torch headers.

bghira · 2025-10-29T16:14:56Z

@rickardp cc

mps: add nf4 dequantize/quantize kernel

b09241c

matthewdouglas added the macOS label Oct 29, 2025

matthewdouglas self-assigned this Oct 29, 2025

matthewdouglas self-requested a review October 29, 2025 17:44

bghira added 2 commits October 29, 2025 22:40

Merge branch 'main' into feature/mps-nf4

679d517

Merge branch 'main' into feature/mps-nf4

2bac93a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

mps: add nf4 dequantize/quantize kernel #1790

mps: add nf4 dequantize/quantize kernel #1790

Uh oh!

bghira commented Oct 29, 2025

Uh oh!

bghira commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

mps: add nf4 dequantize/quantize kernel #1790

Are you sure you want to change the base?

mps: add nf4 dequantize/quantize kernel #1790

Uh oh!

Conversation

bghira commented Oct 29, 2025

Uh oh!

bghira commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants