runpod · promptless · Nov 4, 2025
diff --git a/serverless/endpoints/model-caching.mdx b/serverless/endpoints/model-caching.mdx
@@ -59,6 +59,44 @@ flowchart TD
 ```
 </div>
 
+## Where models are stored
+
+Cached models are stored on the worker container's local disk, separate from any attached network volumes. Runpod automatically manages this internal storage to optimize loading speed.
+
+The cache persists across requests on the same worker, so once a worker initializes, you'll see consistent performance. Since the models live on local disk rather than network volumes, they won't appear on your attached network volumes.
+
+## Accessing cached models
+
+Cached models are stored at `/runpod-volume/huggingface-cache/hub/`. The directory structure follows Hugging Face cache conventions, where forward slashes (`/`) in the model name are replaced with double dashes (`--`).
+
+The path structure follows this pattern:
+
+```
+/runpod-volume/huggingface-cache/hub/models--{organization}--{model-name}/
+```
+
+For example, `meta-llama/Llama-3.2-1B-Instruct` would be stored at:
+
+```
+/runpod-volume/huggingface-cache/hub/models--meta-llama--Llama-3.2-1B-Instruct/
+```
+
+## Using cached models in applications
+
+You can access cached models in your application two ways:
+
+**Direct configuration**: Configure your application to load models directly from `/runpod-volume/huggingface-cache/hub/`. Many frameworks and tools let you specify a custom cache directory for Hugging Face models.
+
+**Symbolic links**: Create symbolic links from your application's expected model directory to the cache location. This is particularly useful for applications like ComfyUI that expect models in specific directories.
+
+For example, create a symbolic link like this:
+
+```bash
+ln -s /runpod-volume/huggingface-cache/hub/models--meta-llama--Llama-3.2-1B-Instruct/ /workspace/models/llama-3.2
+```
+
+This lets your application access cached models without modifying its configuration.
+
 ## Enabling cached models
 
 Follow these steps to select and add a cached model to your Serverless endpoint: