Canonical multimodal workloads Sandbox for Structured Ouputs with Daft
Featuring HuggingFace, vLLM, Gemma 3n, and OpenAI
Core Deliverable
Project Content
- References contains reference examples from ray, vllm, and sglang on structured outputs, as well as a full suite of
llm_generateinference calls across the most common structured output methods. - Friction contains the original (giant) "Scaling Multimodal Structured Outputs with Gemma-3, vLLM, and Daft", as well as notebooks focused on individal pain points seperated for easier review.
- Workload contains both a full walkthrough notebook and atomic python script for evaluating multimodal model performance on image understanding.
- Integration tests for openai and llm_generate structured outputs usage patterns
- Python: 3.12+
- uv: Fast Python package/venv manager. Install:
pip install uvClone this repository and then run
cd daft-structured-outputs
uv venv && uv sync- This creates a local
.venvand syncs dependencies frompyproject.toml. - Prefer running commands with
uv runwithout activating the venv.
These are read by tests and examples. A .env.examples has been provided as a template.
OPENAI_API_KEY: Any non-empty value when using a local vLLM server (e.g.,none).OPENAI_BASE_URL: Defaults to None. vLLM examples default to localhost:8000HF_TOKEN: Hugging Face token for model pulls. If not set, usemake hf-auth.MODEL_ID: for integration tests and CI
Defaults are aligned with project notebooks:
uv run vllm.entrypoints.openai.api_server \
--model google/gemma-3n-e4b-it \
--enable-chunked-prefill \
--guided-decoding-backend guidance \
--dtype bfloat16 \
--gpu-memory-utilization 0.85 \
--host 0.0.0.0 --port 8000You will need authenticate with Hugging Face to access Gemma-3
hf auth login- Python scripts (example):
uv run python workload/daft_mm_so_gemma3.py- Notebooks: open in your IDE or Jupyter and ensure the environment variables above are set in the session.
Run against a live vLLM server (skips if unreachable):
uv run pytest -q tests/test_openai_vllm_integration.pyEnvironment variables used by the tests:
OPENAI_BASE_URL(defaulthttp://0.0.0.0:8000/v1)OPENAI_API_KEY(defaultnone)MODEL_ID(defaultgoogle/gemma-3n-e4b-it)TEST_IMAGE_URL(optional; enables the vision test)
- vLLM server not reachable: Ensure
make vllm-serveis running; confirmOPENAI_BASE_URLandPORT. - HF auth required: Run
hf auth loginto authenticate ifHF_TOKENis not set. - GPU memory: Adjust
GPU_MEM_UTILinmake vllm-servefor your hardware. - Dependencies: Re-run
uv syncafter modifyingpyproject.toml.