Skip to content

Conversation

@krissetto
Copy link
Contributor

Calculate max_tokens value for Anthropic requests based on the available context using the official token counting API.

The PR fixes errors such as:

400 Bad Request

{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "input length and `max_tokens` exceed context limit: 178959 + 64000 > 200000, decrease input length or `max_tokens` and try again"
  },
  "request_id":"req_011CUB2XUHWkFnQzxcKBbjpF"
}

Closes #569

@krissetto krissetto requested a review from a team as a code owner October 21, 2025 12:56
@krissetto krissetto added kind/bug Something isn't working area/anthropic labels Oct 21, 2025
@krissetto
Copy link
Contributor Author

Right now I implemented this calculation using the official Anthropic count tokens API.

Another option would be to do some approximated calculation with some external library, and use that estimate instead of an additional API call.

…context using official token counting api

Signed-off-by: Christopher Petito <[email protected]>
@krissetto krissetto force-pushed the set-max-tokens-based-on-remaining-ctx branch from b3c61b1 to c3167b2 Compare October 24, 2025 10:30
@krissetto krissetto self-assigned this Oct 24, 2025
@rumpl
Copy link
Member

rumpl commented Oct 24, 2025

Rather than doing double calls all the time, can we parse the error and do it only when needed?

@krissetto
Copy link
Contributor Author

Rather than doing double calls all the time, can we parse the error and do it only when needed?

Sure, I'll take a look at that before merging

@dgageot
Copy link
Member

dgageot commented Oct 25, 2025

@krissetto I'll merge it first. Please look at the improvement in a second PR

@dgageot dgageot merged commit 6dbff96 into docker:main Oct 25, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/anthropic kind/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Anthropic - input tokens + max_tokens > context length

3 participants