Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions serverless/endpoints/model-caching.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,44 @@ flowchart TD
```
</div>

## Where models are stored

Cached models are stored on the worker container's local disk, separate from any attached network volumes. Runpod automatically manages this internal storage to optimize loading speed.

The cache persists across requests on the same worker, so once a worker initializes, you'll see consistent performance. Since the models live on local disk rather than network volumes, they won't appear on your attached network volumes.

## Accessing cached models

Cached models are stored at `/runpod-volume/huggingface-cache/hub/`. The directory structure follows Hugging Face cache conventions, where forward slashes (`/`) in the model name are replaced with double dashes (`--`).

The path structure follows this pattern:

```
/runpod-volume/huggingface-cache/hub/models--{organization}--{model-name}/
```

For example, `meta-llama/Llama-3.2-1B-Instruct` would be stored at:

```
/runpod-volume/huggingface-cache/hub/models--meta-llama--Llama-3.2-1B-Instruct/
```

## Using cached models in applications

You can access cached models in your application two ways:

**Direct configuration**: Configure your application to load models directly from `/runpod-volume/huggingface-cache/hub/`. Many frameworks and tools let you specify a custom cache directory for Hugging Face models.

**Symbolic links**: Create symbolic links from your application's expected model directory to the cache location. This is particularly useful for applications like ComfyUI that expect models in specific directories.

For example, create a symbolic link like this:

```bash
ln -s /runpod-volume/huggingface-cache/hub/models--meta-llama--Llama-3.2-1B-Instruct/ /workspace/models/llama-3.2
```

This lets your application access cached models without modifying its configuration.

## Enabling cached models

Follow these steps to select and add a cached model to your Serverless endpoint:
Expand Down