Skip to content

Tied word embeddings #339

@jonatanklosko

Description

@jonatanklosko

Many models have an option to share parameters between the input embedding layer and an output dense layer. We need a solution for that in Axon, but I open this issue, since we already have a lot of TODOs in the code for this specific issue.

The reason loading currently works is that the PyTorch .bin export includes both layers and both point to a tensor. In case of safetensors, only one of the layers may be present, so this issues is a prerequisite for defaulting to safetensors (#255).

For additional discussion see #263.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind:choreInternal improvementsnote:upstreamThe issue must be tackled upstream

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions