-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Make skip_special_tokens configurable
#4521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks for the PR. I went through the issue. I understand that the motivation is to compute a reward that depends on the special tokens. |
I think |
|
Hi @qgallouedec , just checking in to see if you've had a chance to look at my last comment. Lmk what you think. |
|
Actually, I'm still wondering about this feature. In your example, in my opinion,
Another difficulty I foresee is that we will soon be relying on "response parsing" (see #4300) instead of simple decoding. And in that case, I don't even see how we could include special tokens; in fact, I don't think this question would even make sense. I recommend the following:
|
|
I agree that it doesn't make sense to have these kinds of tokens marked as special. The issue is that some models still do that (example). And for some more context, the model I struggled with the most and that led me to open this PR was gpt-oss because of their harmony response format. Extracting the content of each channel requires access to the special tokens.
I wasn't aware of this new response parsing feature. It seems useful and it would indeed solve a lot of the headache around extracting the different parts of the response for reward design. My only concern is the time it would take for it to make it into TRL and for general adoption by past and future models. On the other hand, what issues do you see with this PR in its current? In the mean time, while having a globally available tokenizer that one could use for rewards isn't the most elegant of solutions, I guess it's an acceptable workaround. |
What does this PR do?
Fixes #2897
As mentioned in the original issue, special tokens can be especially useful when computing reward functions.
Before submitting
Pull Request section?
to it if that's the case.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.