EOS is not read from gguf format

I have discovered that running the same model with the same parameters from llm (gguf branch) and llama.cpp results in a different behavior.  llm seems to have not been reading EOS token and thus the model creates output until max tokens is reached.
Here is llama.cpp:
![llamares](https://github.com/rustformers/llm/assets/4137964/51855649-7853-4dc3-85f3-b6666884c8c8)
And the same model from llm:
![llm](https://github.com/rustformers/llm/assets/4137964/b122d6cb-f8cc-4886-9483-64421c2ed0ed)

According to discord "discussion" it might be indeed a bug. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

EOS is not read from gguf format #446

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

EOS is not read from gguf format #446

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions