Daft Structured Outputs

Canonical multimodal workloads Sandbox for Structured Ouputs with Daft

Featuring HuggingFace, vLLM, Gemma 3n, and OpenAI

Core Deliverable

Project Content

References contains reference examples from ray, vllm, and sglang on structured outputs, as well as a full suite of llm_generate inference calls across the most common structured output methods.
Friction contains the original (giant) "Scaling Multimodal Structured Outputs with Gemma-3, vLLM, and Daft", as well as notebooks focused on individal pain points seperated for easier review.
Workload contains both a full walkthrough notebook and atomic python script for evaluating multimodal model performance on image understanding.
Integration tests for openai and llm_generate structured outputs usage patterns

Prerequisites

Python: 3.12+
uv: Fast Python package/venv manager. Install:

pip install uv

Install and Setup

Clone this repository and then run

cd daft-structured-outputs
uv venv && uv sync

This creates a local .venv and syncs dependencies from pyproject.toml.
Prefer running commands with uv run without activating the venv.

Environment Variables

These are read by tests and examples. A .env.examples has been provided as a template.

OPENAI_API_KEY: Any non-empty value when using a local vLLM server (e.g., none).
OPENAI_BASE_URL: Defaults to None. vLLM examples default to localhost:8000
HF_TOKEN: Hugging Face token for model pulls. If not set, use make hf-auth.
MODEL_ID: for integration tests and CI

Start the vLLM OpenAI server

Defaults are aligned with project notebooks:

uv run vllm.entrypoints.openai.api_server \
  --model google/gemma-3n-e4b-it \
  --enable-chunked-prefill \
  --guided-decoding-backend guidance \
  --dtype bfloat16 \
  --gpu-memory-utilization 0.85 \
  --host 0.0.0.0 --port 8000

You will need authenticate with Hugging Face to access Gemma-3

hf auth login

Run the workload scripts/notebooks

Python scripts (example):

uv run python workload/daft_mm_so_gemma3.py

Notebooks: open in your IDE or Jupyter and ensure the environment variables above are set in the session.

Integration tests

Run against a live vLLM server (skips if unreachable):

uv run pytest -q tests/test_openai_vllm_integration.py

Environment variables used by the tests:

OPENAI_BASE_URL (default http://0.0.0.0:8000/v1)
OPENAI_API_KEY (default none)
MODEL_ID (default google/gemma-3n-e4b-it)
TEST_IMAGE_URL (optional; enables the vision test)

Common issues

vLLM server not reachable: Ensure make vllm-serve is running; confirm OPENAI_BASE_URL and PORT.
HF auth required: Run hf auth login to authenticate if HF_TOKEN is not set.
GPU memory: Adjust GPU_MEM_UTIL in make vllm-serve for your hardware.
Dependencies: Re-run uv sync after modifying pyproject.toml.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
.vscode		.vscode
friction		friction
references		references
tests		tests
workload		workload
.env.example		.env.example
.gitignore		.gitignore
FRICTION_LOG.md		FRICTION_LOG.md
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Daft Structured Outputs

Prerequisites

Install and Setup

Environment Variables

Start the vLLM OpenAI server

Run the workload scripts/notebooks

Integration tests

Common issues

About

Uh oh!

Releases

Packages

Languages

everettVT/daft-structured-outputs

Folders and files

Latest commit

History

Repository files navigation

Daft Structured Outputs

Prerequisites

Install and Setup

Environment Variables

Start the vLLM OpenAI server

Run the workload scripts/notebooks

Integration tests

Common issues

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages