This project provides a containerized, offline-capable LLM API using Ollama. It automatically pulls and serves a chosen model on first run and exposes a simple REST API for interaction. Designed to be portable and ready for homelab or production environments.
- Fully offline after initial model pull
 - Configurable model selection (
llama3,mistral,phi3, etc.) - Dockerized for consistent deployment
 - Health checks and persistent model storage
 - Simple REST API powered by FastAPI
 
- Docker
 - Docker Compose
 
git clone https://github.com/yourusername/ollama-offline-agent.git
cd ollama-offline-agentdocker compose up --buildOn first run, the container will:
- Start the Ollama daemon.
 - Pull the specified model if it is not already cached.
 - Launch the API server on port 8000.
 
Subsequent runs will skip the model pull if the model is already present.
The default model is llama3. To change it, edit docker-compose.yml:
environment:
  - OLLAMA_MODEL=mistralRebuild the container to apply changes:
docker compose up --buildGET /Response:
{
  "message": "Ollama Offline Agent is running."
}POST /ask
Content-Type: application/jsonBody:
{
  "prompt": "What is the capital of France?"
}Response:
{
  "prompt": "What is the capital of France?",
  "response": "The capital of France is Paris."
}The Docker volume models ensures that downloaded models persist across container rebuilds.
The container includes a Docker healthcheck that monitors the API’s availability.
This project is licensed under the MIT License. See LICENSE for details.