You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 24, 2024. It is now read-only.
I have discovered that running the same model with the same parameters from llm (gguf branch) and llama.cpp results in a different behavior. llm seems to have not been reading EOS token and thus the model creates output until max tokens is reached.
Here is llama.cpp:
And the same model from llm:
According to discord "discussion" it might be indeed a bug.