Add state to logits processing #425

xhr15 · 2025-10-21T15:54:01Z

Logits processing is a powerful tool, particularly for using smaller language models for tasks such as named entity recognition. @seanmor5 started work in this area with #354.

Whatever the approach, it will require some kind of state.

This pull request is a proposal to allow logits processors to be stateful.

This would enable the use of deterministic finite automata (DFAs) or pushdown automata (PDAs) for processing constrained grammars in logits processing. bitcrowd#6 shows how this would be used. We will follow up on this PR if this approach is favoured.

https://bitcrowd.atlassian.net/browse/SAMPLE-6

jonatanklosko

Hey @xhr15 and @joelpaulkoch, thanks for the PR!

I dropped a few comments, but the main one is about the API. I know it's a bit more involved, but probably worth it. Let me know what you think, and if you have any concerns!

test/bumblebee/text/generation/logits_processing_test.exs

test/bumblebee/text/generation_test.exs

jonatanklosko · 2025-10-22T15:04:28Z

test/bumblebee/text/generation_test.exs

+      context =
+        put_in(
+          context,
+          [:logits_processor_state, :next_suppressed_token_id],


With the current API, the state is always initialized to %{} and then first invocation of the processor adds a key, here %{next_suppressed_token_id: %Nx.Tensor{...}}.

This can be problematic in defn while loop, which requires the accumulation sate to always have the same shape. In other words, the initial state should already include :next_suppressed_token_id with the default tensor. It is possible that this didn't come up during your tests, because depending on the model/input, we do the first generation step outside of the while loop, and the first call would initialize the state. However, if we are going to support stateful, I would rather do it in a more robust way.

Given the above, a stateless logits processor would involve two steps (functions):

Building an initial state.

Performing logits processing, which receives logits and state, and returns update logits and state.

This way we can call (1) when initializing the generation context, and for the actual processing we call (2).

The behaviour can be similar to Bumblebee.Scheduler. Something like this:

defmodule Bumblebee.LogitsProcessor do @moduledoc """ An interface for configuring and using logits processors. Logits processors are used during autoregressive generation to modify predicted scores at each generation step. This allows for applying certain rules to the model output to control which tokens are picked at each generation step, and which are not. Every module implementing this behaviour is expected to also define a configuration struct. """ @type t :: Bumblebee.Configurable.t() @type state :: Nx.Container.t() @doc """ Initializes state for a new logits processor. Returns `state`, which is an opaque `Nx.Container`, and it is then passed to and returned from `process/2`. Oftentimes logits processors are stateless, in which case this function can return an empty continer, such as `{}`. """ @callback init(t(), context) :: state() when context: %{ prng_key: Nx.Tensor.t() } @doc """ Processes logits, applying specific rules. """ @callback process( t(), state(), logits :: Nx.Tensor.t(), context :: context ) :: {state :: map(), logits :: Nx.Tensor.t()} when context: %{ sequence: Nx.Tensor.t(), length: Nx.Tensor.t(), input_length: Nx.Tensor.t() } end

Technically, the :logits_processors options is public API, but we can make it backward-compatible. For example, we can define %Bumblebee.Text.Generation.StatelessLogitsProcessor{fun: fun}, where the state is always empty and process just invokes the fun. I would even use that for the built-in processors, so that we don't need to define a bunch of new modules.

@jonatanklosko Thank you very much for your comments! I think esp. the two step call makes sense. We'll move in that direction :)

@jonatanklosko
as an afterthought:

What is the use case for context here:

@callback init(t(), context) :: state() when context: %{ prng_key: Nx.Tensor.t() }

Later in the loop, context holds:

context = %{ sequences: sequences, input_length: length, length: length, }

I am wondering how those would influence the initialisation of the logits processors?

Or are you planning of using additional keys? E.g. from the state as returned by init squence:

%{ sequences: sequences, input_length: length, length: length, finished_length: finished_length, ignored: Nx.broadcast(0, {batch_size}) }

If that was the case, we should probably rename the parameter to state or initial_state.

Wdyt?

What is the use case for context here:

I picked "context" in both functions as a generic name for state/metadata that may be relevant to the logits processor. You can see that in my snippet the context type is different for init and process. Technically all of the context fields could be separate arguments, but keeping it as a map makes the signature more manageable, and more importantly allows us to add more fields in the future without breaking compatibility.

Does that make sense?

xhr15 · 2025-10-24T13:48:24Z

@jonatanklosko Before we add more test and do further refactorings: Do you think this goes in the right direction? Please let me know if you have concerns or anything could be improved.

joelpaulkoch · 2025-10-27T13:27:09Z

We might not want to vectorize all the state of the logits processors e.g. when we want to read from a shared state tensor while processing the vectorized logits we would otherwise have to duplicate the shared state tensor across the vectorized axis, right?
We can instead vectorize only the state that needs vectorization inside the logits processor.

That's basically the reason for 2ba5e0a, I'm not entirely sure if this is alright or if it has negative implications for defn.

jonatanklosko · 2025-11-04T21:58:41Z

lib/bumblebee/text/generation.ex

+      Enum.reduce(processors, %{}, fn processor, state_acc ->
+        state = Bumblebee.logits_processor_init(processor, context)


Currently each processor needs to be careful to namespace its state and avoid conflicts. Ideally the processor would not need to care about it, so in your example:

def init(logits_processor, _context) do initial_enforced_token_ids = Enum.map(logits_processor.initial_enforced_token_ids, &List.wrap(&1)) initial_enforced_batch_token_id = Nx.tensor(initial_enforced_token_ids) - %{ - sfp_state: %{ - next_enforced_token_id: initial_enforced_batch_token_id - } - } + %{ + next_enforced_token_id: initial_enforced_batch_token_id + } end

To do this, instead of having a single map and Map.merge into it, we can instead have a list of processor states. We init them separately, and we zip processors with their states for updates. Something like this:

init_fun = fn context -> processors |> Enum.map(fn processor -> Bumblebee.logits_processor_init(processor, context) end) |> List.to_tuple() end process_fun = fn logits, context, processor_states -> {processor_states, logits} = processors |> Enum.zip(Tuple.to_list(processor_states)) |> Enum.map_reduce(logits, fn {processor, processor_state}, logits -> Bumblebee.logits_processor_process(processor, processor_state, logits, context) end) {List.to_tuple(processor_states), logits} end

Note that we want to keep the states as a tuple instead of list, so that it is a valid Nx container and can be passed to while and around defn calls.

Thank you @jonatanklosko, we were wondering if having this kind of interface to reach into other processors would be a good or bad thing :)

We'll take it out!

jonatanklosko · 2025-11-04T22:32:43Z

test/bumblebee/text/generation_test.exs

+      Bumblebee.Text.Generation.build_generate(model, spec, generation_config,
+        logits_processors: [
+          Bumblebee.configure(Bumblebee.Text.GenerationTest.StatefulLogitsProcessing,
+            initial_enforced_token_ids: [78, 20]


This is an arbitrary example, but I don't think we would ever have different value for each entry in the batch, because the batch entries should generally be interchangable. So it should rather be initial_enforced_token_id: 78, and then you check that it is enforced in the same way for both batch entries.

jonatanklosko · 2025-11-04T23:19:51Z

We might not want to vectorize all the state of the logits processors e.g. when we want to read from a shared state tensor while processing the vectorized logits we would otherwise have to duplicate the shared state tensor across the vectorized axis, right? We can instead vectorize only the state that needs vectorization inside the logits processor.

That's basically the reason for 2ba5e0a, I'm not entirely sure if this is alright or if it has negative implications for defn.

Correct, Bumblebee should not call vectorize on the logits processor state. Ideally we want vectorization to happen automatically.

For example, schedulers have a similar init, here's one of them:

bumblebee/lib/bumblebee/diffusion/pndm_scheduler.ex

Lines 97 to 108 in bc1b452

    
           def init(scheduler, num_steps, sample_template, _prng_key) do 
        
             timesteps = 
        
               timesteps( 
        
                 scheduler.num_train_steps, 
        
                 num_steps, 
        
                 scheduler.timesteps_offset, 
        
                 scheduler.reduce_warmup 
        
               ) 
        
             alpha_bars = init_parameters(scheduler: scheduler) 
        
             empty = Nx.fill(sample_template, 0)

alpha_bars is generated as a flat tensor and it is shared state (not duplicated across batch). On the other hand, the caller (Bumblebee) can pass sample_template with vectorized axis and then empty = Nx.fill(sample_template, 0) would be vectorized state. What's nice is that the scheduler is not aware about the vectorization, and a non-vectoriezd input works just fine.

For this to work automatically though, we need something to derive state of off (like sample_template), so that it gets automatically vectorized. I'm not yet sure how it would look for the processor, I need to think more about this.

jonatanklosko · 2025-11-04T23:20:10Z

Sorry for the late reply, I was off last week :)

jonatanklosko · 2025-11-05T15:10:21Z

For this to work automatically though, we need something to derive state of off (like sample_template), so that it gets automatically vectorized. I'm not yet sure how it would look for the processor, I need to think more about this.

Let's just pass sequence: Nx.vectorize(state.sequences, :batch) in the init context too. Depending on what per-sequence state the user creates, they may need to take special care to make it vectorization friendly (e.g. Nx.iota({2, 2}, vectorized_axes: sequence.vectorized_axes), or using Nx.broadcast_vectors), but I think it's fine.

lib/bumblebee/logits_processor.ex

test/bumblebee/text/generation_test.exs

``` ** (RuntimeError) unexpected vectorized axes in evaluator for operation :add: #Nx.Tensor< vectorized[batch: 1] s32[1] Nx.Defn.Expr tensor a s32[1] b = reshape a s32[1][1] ```

Co-authored-by: Jonatan Kłosko <[email protected]>

…er all batches now

…ictly related to statefull processing

xhr15 · 2025-11-14T23:48:18Z

@jonatanklosko thank you for the late night review. Please let me know what you think. I added two livebooks about logits processing in the last commit. They are not strictly related to state, but I found them useful to explain logits processing in talks. I could open up a separate PR for them if you like, it was just too tempting to include them :)

joelpaulkoch and others added 12 commits October 17, 2025 11:59

[#SAMPLE-6] Add state to logits processing

fc0825a

https://bitcrowd.atlassian.net/browse/SAMPLE-6

stateful logits processors

01ab3af

adding another test

5413662

fix test so compilation works

9d4ef39

demonstrate stateful logits processor through test assertions

4ce01cc

independent state for batch entries

2161b77

renamed initial_suppressed_token_index for clarity

fefc9fd

renamend next_suppressed_index -> :next_suppressed_token_index

6e8612a

logits_processor_states -> logits_processor_state in batch tests

e43254a

added a test with batch size 1 for clarity

a2f0015

renaming suppressed_id -> suppressed_token_id

0cdc0ad

more comments

cc6d6e3

jonatanklosko reviewed Oct 22, 2025

View reviewed changes

xhr15 added 4 commits October 23, 2025 14:39

changed to to demonstrate stack functionality

3816e7c

merged tests

fe58712

removed test for processor only used in test

c97890a

introduces LogitsProcessor module

fbf5ef3

joelpaulkoch added 4 commits October 27, 2025 13:59

clean up

dfa223c

mix format

9098bda

vectorized sequences are called sequence in context

544d80f

don't vectorize all the logits processor state

2ba5e0a

xhr15 requested a review from jonatanklosko November 3, 2025 22:18

jonatanklosko reviewed Nov 4, 2025

View reviewed changes

joelpaulkoch added 2 commits November 5, 2025 15:54

swap {logits, state} to {state, logits}

196c8f0

rename logits_processor_state to logits_processor_states

ee2a01e

joelpaulkoch added 2 commits November 5, 2025 15:56

states as tuples

3563ff0

update test

6db771e

joelpaulkoch added 2 commits November 5, 2025 16:58

single initial state for all batch entries

c8442e0

vectorize sequence for init, derive vectorized state

41dd2ad

jonatanklosko reviewed Nov 14, 2025

View reviewed changes

lib/bumblebee/logits_processor.ex Outdated Show resolved Hide resolved

test/bumblebee/text/generation_test.exs Outdated Show resolved Hide resolved

test/bumblebee/text/generation_test.exs Show resolved Hide resolved

xhr15 and others added 4 commits November 14, 2025 23:26

switch to EXLA as the evaluator is lacking vectorisation support:

311f77c

``` ** (RuntimeError) unexpected vectorized axes in evaluator for operation :add: #Nx.Tensor< vectorized[batch: 1] s32[1] Nx.Defn.Expr tensor a s32[1] b = reshape a s32[1][1] ```

Apply suggestion from @jonatanklosko

ec92264

Co-authored-by: Jonatan Kłosko <[email protected]>

removed comments

201e103

slimmed down comments more

70d7f65

xhr15 force-pushed the task/sample-6-add-state-to-logits-processing branch from 604b60e to 572b748 Compare November 14, 2025 22:54

introduced types for init_context and process_context

ce92584

xhr15 force-pushed the task/sample-6-add-state-to-logits-processing branch from 572b748 to ce92584 Compare November 14, 2025 22:57

xhr15 added 2 commits November 15, 2025 00:10

don't vectorize initial_enforced_token_id in test as it's the same ov…

6d8f494

…er all batches now

bonus track: two more livebooks concerning logits processing. Not str…

578ce11

…ictly related to statefull processing

xhr15 requested a review from jonatanklosko November 14, 2025 23:48

		Enum.reduce(processors, %{}, fn processor, state_acc ->
		state = Bumblebee.logits_processor_init(processor, context)

Add state to logits processing #425

Are you sure you want to change the base?

Add state to logits processing #425

Conversation

xhr15 commented Oct 21, 2025

Uh oh!

jonatanklosko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonatanklosko Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

xhr15 Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

xhr15 Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

jonatanklosko Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

xhr15 commented Oct 24, 2025

Uh oh!

joelpaulkoch commented Oct 27, 2025

Uh oh!

jonatanklosko Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

xhr15 Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

jonatanklosko Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

jonatanklosko commented Nov 4, 2025

Uh oh!

jonatanklosko commented Nov 4, 2025

Uh oh!

jonatanklosko commented Nov 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xhr15 commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants