Tied word embeddings

Many models have an option to share parameters between the input embedding layer and an output dense layer. We need a solution for that in Axon, but I open this issue, since we already have a lot of TODOs in the code for this specific issue.

The reason loading currently works is that the PyTorch `.bin` export includes both layers and both point to a tensor. In case of safetensors, only one of the layers may be present, so this issues is a prerequisite for defaulting to safetensors (#255).

For additional discussion see #263.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tied word embeddings #339

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tied word embeddings #339

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions