Make `skip_special_tokens` configurable #4521

taha-yassine · 2025-11-13T15:04:25Z

What does this PR do?

As mentioned in the original issue, special tokens can be especially useful when computing reward functions.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

qgallouedec · 2025-11-13T15:30:51Z

Thanks for the PR. I went through the issue. I understand that the motivation is to compute a reward that depends on the special tokens.
Would computing the reward based on the completion_ids (ints) instead of on the completion (strs) would do the trick?

taha-yassine · 2025-11-13T18:30:06Z

Would computing the reward based on the completion_ids (ints) instead of on the completion (strs) would do the trick?

I think completion_ids is useful when the reward depends on whether or not some specific predetermined tokens are present. For example, it can be used to check if the policy respects a given answer format.
Now say we're training a policy to answer questions and to put the result between <answer> and </answer>, and say these are special tokens. completion_ids would allow us to detect whether the completion contains those tokens, but we won't be able to precisely locate what part of completion they surround. Also, we can't manually decode completion_ids because we don't have access to the tokenizer inside the reward functions.

taha-yassine · 2025-11-21T22:46:32Z

Hi @qgallouedec , just checking in to see if you've had a chance to look at my last comment. Lmk what you think.

qgallouedec · 2025-11-21T23:35:44Z

Actually, I'm still wondering about this feature. In your example, in my opinion, <answer> shouldn't be a special token in the first place. In most models, these kind of tokens are not special:

Another difficulty I foresee is that we will soon be relying on "response parsing" (see #4300) instead of simple decoding. And in that case, I don't even see how we could include special tokens; in fact, I don't think this question would even make sense.

I recommend the following:

re-consider the choice of making a token "special" (in the vast majority of cases, only BOS, EOS and PAD should be special), or

have a tokenizer in the reward function:

tokenizer = AutoTokenizer.from_pretrained(...)

def reward_func(completion_ids, **kwargs):
    text = tokenizer.decode(completion_ids, skip_special_tokens=False)
    ...
    return rewards

taha-yassine · 2025-11-22T19:55:34Z

I agree that it doesn't make sense to have these kinds of tokens marked as special. The issue is that some models still do that (example). And for some more context, the model I struggled with the most and that led me to open this PR was gpt-oss because of their harmony response format. Extracting the content of each channel requires access to the special tokens.

Another difficulty I foresee is that we will soon be relying on "response parsing" (see #4300) instead of simple decoding. And in that case, I don't even see how we could include special tokens; in fact, I don't think this question would even make sense.

I wasn't aware of this new response parsing feature. It seems useful and it would indeed solve a lot of the headache around extracting the different parts of the response for reward design. My only concern is the time it would take for it to make it into TRL and for general adoption by past and future models.

On the other hand, what issues do you see with this PR in its current?

In the mean time, while having a globally available tokenizer that one could use for rewards isn't the most elegant of solutions, I guess it's an acceptable workaround.

Make skip_special_tokens configurable

ef34077

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make `skip_special_tokens` configurable #4521

Make `skip_special_tokens` configurable #4521

Uh oh!

taha-yassine commented Nov 13, 2025 •

edited

Loading

Uh oh!

qgallouedec commented Nov 13, 2025 •

edited

Loading

Uh oh!

taha-yassine commented Nov 13, 2025

Uh oh!

taha-yassine commented Nov 21, 2025

Uh oh!

qgallouedec commented Nov 21, 2025 •

edited

Loading

Uh oh!

taha-yassine commented Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Make skip_special_tokens configurable #4521

Are you sure you want to change the base?

Make skip_special_tokens configurable #4521

Uh oh!

Conversation

taha-yassine commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

qgallouedec commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taha-yassine commented Nov 13, 2025

Uh oh!

taha-yassine commented Nov 21, 2025

Uh oh!

qgallouedec commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taha-yassine commented Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Make `skip_special_tokens` configurable #4521

Make `skip_special_tokens` configurable #4521

taha-yassine commented Nov 13, 2025 •

edited

Loading

qgallouedec commented Nov 13, 2025 •

edited

Loading

qgallouedec commented Nov 21, 2025 •

edited

Loading