A production-grade platform that schedules GPU jobs, streams live telemetry to a React dashboard, and exposes an OpenAPI-first API. Built as a demonstration project for the Planet Software Engineer position.
Overfly demonstrates full-stack development capabilities with a focus on:
- Real-time data streaming and event-driven architecture
- OpenAPI-first API design with type generation
- Modern cloud-native deployment (Kubernetes on GKE)
- Comprehensive testing (80%+ backend coverage)
- Production DevOps practices (CI/CD, monitoring)
Status: Milestone 1 Complete ✅ (Core infrastructure, CRUD operations, deployment, testing)
# Install dependencies
brew install postgresql@18
brew services start postgresql@18
# Setup database
createdb planet
make venv
make db-upgrade
# Start API
cd apps/api
uvicorn src.main:app --reload
# Start Web (new terminal)
cd apps/web
npm install
npm run dev
# Visit http://localhost:3000See Getting Started Guide for detailed instructions.
- Overview - Project purpose and alignment with Planet
- Getting Started - Installation and setup
- Architecture - System design and tech stack
- Development - Development workflow and testing
- Deployment - Production deployment (GKE, Render, etc.)
- Roadmap - Future milestones and features
- Planet Job Description - Role alignment
- NATS Integration - NEW! Real-time messaging with JetStream + SSE
Older detailed guides are preserved in docs/old/ for reference.
- Frontend: React + TypeScript, Next.js, TanStack Query (data), Emotion CSS, WebSocket client. Optional: TanStack Router, TanStack Table.
- Backend (Python): FastAPI, Pydantic v2, asyncio + process pool, SQLAlchemy 2.0/SQLModel (if needed), OpenTelemetry.
- Database (primary): Postgres 18 for orchestration metadata and system of record.
- Events/RT: NATS for streams; WebSockets from API to UI.
- Go service:
chi+pgx+ OpenTelemetry, aggregates usage summaries. - Observability: Prometheus, Grafana, Tempo/Jaeger, structured logs (correlation IDs).
- Spatial (optional): PostGIS on Postgres for advanced spatial; later analytics via DuckDB +
duckdb_spatialif needed.
- TanStack Query for data fetching/caching; SSR where appropriate.
- WebSocket client for live job state/logs and GPU metrics.
- Optional: TanStack Router for file‑based routes; TanStack Table for job/usage grids.
- Later: Explore TanStackDB once it stabilizes for local data/state modeling.
- Postgres 18 strengths:
- Production‑grade concurrency and transactions, roles/permissions, rich indexing, mature ecosystem.
- Excellent ORM support, queue patterns (e.g., SKIP LOCKED), native logical replication.
- PostGIS for advanced spatial use cases and indexing.
- DuckDB strengths (for later analytics):
- Embedded, zero‑ops, extremely fast OLAP/columnar analytics; Parquet/Arrow native.
- Ideal for ad‑hoc analytics, dashboards over result sets, local or server‑side analytical flows.
- Spatial via
duckdb_spatialfor manyST_*functions.
- Primary (production‑like): Postgres 18 for job metadata/orchestration and all transactional data.
- Jobs/runs/gpus/events in Postgres; API and worker coordinate via DB + NATS.
- UI reads metadata from Postgres via the API; real‑time updates via WebSockets from NATS‑backed streams.
- Later analytics: Introduce DuckDB for analytical queries over result artifacts (e.g., Parquet) and aggregated tables.
- Maintain ETL/ELT from Postgres → Parquet/Arrow → DuckDB for BI‑style panels.
jobs(id UUID PK, name TEXT, params JSONB, priority INT, state TEXT, created_at TIMESTAMPTZ, submitted_by TEXT)runs(id UUID PK, job_id UUID FK, gpu_id UUID FK, started_at TIMESTAMPTZ, finished_at TIMESTAMPTZ, state TEXT, exit_code INT, metrics JSONB)gpus(id UUID PK, uuid TEXT UNIQUE, name TEXT, total_mem_mb INT, compute_capability TEXT)events(id BIGSERIAL PK, type TEXT, job_id UUID NULL, run_id UUID NULL, payload JSONB, ts TIMESTAMPTZ DEFAULT now())- Optional spatial:
areas(id UUID PK, name TEXT, geom GEOGRAPHY/GEOMETRY)using PostGIS.
Notes:
- Use
JSONBfor flexible payloads; validate at app layer. - Add indexes with
CREATE INDEXon frequent filters (e.g.,jobs(state),runs(job_id)), include partial indexes where helpful.
- Writes: API and worker both write to Postgres 18 with transactional consistency.
- Readers: UI and Go service read via the API; Go service may read directly for batch summaries.
- Eventing: NATS carries real‑time signals; DB remains the source of truth for durable state.
POST /jobssubmit a jobGET /jobs/{id}job statusGET /jobs/{id}/logsstream logs (SSE) orWS /streamchannel multiplexedGET /gpusinventory + current utilizationGET /usage/accounts/{id}(Go service)GET /usage/leaderboard(Go service)
Codegen:
- Generate TS client (
openapi-typescript) and Go client from the same spec.
- Python worker selects devices via NVML (
pynvml) + CUDA_VISIBLE_DEVICES. - Parallel/serial modes, per‑GPU queues, retries/backoff, idempotent runs.
- Demo workloads:
- CNN inference on small satellite tiles (PyTorch), record per‑tile metrics.
- Write per‑run metrics to Postgres; push events for real‑time updates via NATS.
- OpenTelemetry traces: UI → API → worker → DB.
- Metrics: request latency, queue depth, GPU utilization, job lifecycle counters.
- Dashboards in Grafana; traces in Tempo/Jaeger.
- Use PostGIS for AOI footprints (GeoJSON import) and spatial joins.
- UI: deck.gl/MapLibre to render processed coverage; click through to runs/jobs.
- Purpose: fast OLAP over aggregated results and Parquet artifacts without burdening Postgres.
- Storage: export selected Postgres tables or materialized views to Parquet; keep a versioned dataset.
- Engine: DuckDB with
duckdb_spatialfor geospatial analytics. - Access: API endpoints that run read‑only analytical queries in DuckDB and return datasets to the UI.
- Concurrency: rely on Postgres 18 transactional semantics; use SKIP LOCKED for queues.
- Real‑time: rely on NATS for instant updates; DB refresh on intervals for reconciliation.
- Migrations: maintain SQL migration files (Alembic) and an app-level migration runner.
- Milestone 1 (week 1): OpenAPI draft, Postgres 18 schema & migrations, worker skeleton, GPU discovery, WebSocket event stream backed by NATS, minimal dashboard with TanStack Query.
- Milestone 2 (week 2): Full job lifecycle, live logs, usage summaries in Go, metrics + dashboards, tests.
- Milestone 3 (week 3): Spatial view (PostGIS), load tests, polish docs and screencast.
- Future: TanStackDB exploration; add DuckDB analytics layer (ETL to Parquet + DuckDB queries).
make dev: runs API, worker, UI, NATS, Grafana/Prom.make test: runs unit + e2e tests.make demo: seeds jobs and opens dashboard.
- 2025‑10‑13: Adopt React + TanStack Query; Postgres 18 as primary DB; NATS for events; plan DuckDB analytics later.
Focus: Execute Milestone 1 Excellence Over Scope Creep
The architecture is solid and production-grade. Rather than spreading thin across all three milestones, prioritize delivering an impressive, complete Milestone 1 that demonstrates senior-level full-stack capabilities for the Planet application.
-
OpenAPI Spec First (2-3 hours)
- Define 5-6 core endpoints to drive codegen:
POST /jobs- Submit job with paramsGET /jobs- List jobs with filters (state, submitted_by)GET /jobs/{id}- Job detail with run historyGET /gpus- GPU inventory with utilizationWS /stream- WebSocket for real-time events
- Generate TS/Python clients immediately to verify workflow
- Define 5-6 core endpoints to drive codegen:
-
FastAPI Backend (Core Features)
- GPU discovery with
pynvml(simulate if no GPU available) - Job submission → Postgres persistence
- State machine: PENDING → RUNNING → COMPLETED/FAILED
- WebSocket endpoint multiplexing NATS streams
- GPU discovery with
-
React Dashboard (TanStack Query)
- Job submission form (name, params, priority)
- Job list with real-time status updates
- GPU status cards (utilization, memory, temp)
- Live logs via WebSocket connection
- Use Emotion CSS for theming
-
Worker Implementation
- Job picker using
SELECT ... FOR UPDATE SKIP LOCKEDpattern - Simple demo workload (doesn't require real GPU)
- Publish events to NATS for real-time UI updates
- Store metrics in
runstable
- Job picker using
Satellite Imagery Simulation (Bonus Points)
- Frame jobs as "Process satellite tile at (lat, lon)"
- Demo workload: image classification on sample tiles (use small dataset)
- Store tile footprints in PostGIS as
GEOGRAPHY(POLYGON) - Display processed coverage on deck.gl/MapLibre map
- Add spatial query: "Find jobs within 100km of Berlin"
This directly demonstrates understanding of Planet's satellite imaging domain and shows spatial data expertise.
Defer to Post-Demo:
- Go usage service (nice-to-have, not critical for initial demo)
- DuckDB analytics layer (already marked as "later")
- Full observability stack (keep Prometheus + basic metrics; skip Tempo/Jaeger initially)
- TanStackDB exploration (experimental, not production-ready yet)
Keep Simple:
- 1-2 demo workloads maximum
- Focus on real-time updates (your differentiator)
- Ensure
make demois impressive with seed data
Don't over-test for a demo, but show best practices:
- Unit tests for critical paths (job state transitions, queue logic)
- One Playwright e2e: submit job → observe real-time completion
- OpenAPI contract validation tests
- k6 load test script (even if not run in demo)
For Interview/Application:
- Screencast (2-3 min): Job submission → real-time updates → GPU metrics → map view
- Architecture diagram (one page): Data flow from UI → API → Worker → NATS → WebSocket → UI
- README: Clear
make demothat seeds data and opens dashboard - Emphasize OpenAPI-first approach (Planet uses this extensively)
Your tech stack is nearly perfect for Planet's role:
- ✅ React + Python + Go (exact match)
- ✅ OpenAPI/REST codegen (job requirement)
- ✅ Complex data visualization (TanStack Query, real-time dashboards)
- ✅ Event-driven pipelines (NATS + WebSocket)
- ✅ Spatial data (PostGIS + deck.gl - huge bonus for satellite company)
- ✅ Customer telemetry focus (usage summaries, aggregation)
- Multi-tenant ready: Jobs have
submitted_byfield (shows production thinking) - Usage telemetry page: Aggregate stats by user (aligns with Planet's customer reporting focus)
- Correlation IDs: OpenTelemetry trace context through entire stack
- Idempotent operations: Job runs are idempotent with retry logic
- Fix configuration issues (✅ Done: Postgres port 5432)
- Design & implement OpenAPI spec
- Generate TS/Python clients and verify
- Build FastAPI with 3 core endpoints
- Create React dashboard (job list + submit)
- Implement worker with dummy workload
- Add WebSocket streaming for real-time
- Polish with demo data and
make demo - (Bonus) Add one spatial feature if time permits
A successful demo shows:
- Submit job via UI → immediately see "PENDING" state
- Worker picks up job → status updates to "RUNNING" in real-time
- Logs stream to UI as job executes
- Job completes → metrics visible, state updates to "COMPLETED"
- GPU utilization visible throughout
- (Bonus) Processed tiles appear on map
This demonstrates: full-stack skills, real-time systems, async coordination, production patterns, and domain understanding.
Top-level monorepo using workspaces and shared tooling. Targets one-command dev via Make and Docker Compose.
planet/
apps/
api/ # FastAPI service (OpenAPI-first), Postgres 18, NATS, WebSockets
src/
tests/
pyproject.toml
worker/ # GPU job runner (Python), NVML/CUDA, subscribes to NATS
src/
tests/
pyproject.toml
web/ # Next.js + React + TS, TanStack Query, WebSocket client, dashboards
src/
e2e/
package.json
usage-go/ # Go microservice for usage summaries/aggregation, exposes small API
cmd/usagego/
internal/
go.mod
packages/
shared-types/ # OpenAPI-generated TS/Go/Python clients & shared DTOs
ts/
go/
py/
ui/ # Shared UI components (TS), charts/tables, Emotion theme
src/
package.json
opentelemetry/ # Shared OTEL setup for Python/Go/TS
python/
go/
ts/
db/
migrations/ # Alembic migrations for Postgres 18 (DDL/DML)
seeds/ # Seed data for demo
alembic.ini
README.md
analytics/ # DuckDB later: Parquet datasets, SQL notebooks, ETL scripts
datasets/
etl/
notebooks/
README.md
openapi/
spec.yaml # Single source of truth; codegen to packages/shared-types
README.md
infra/
docker-compose.yml # Postgres 18, NATS, Grafana, Prometheus, Tempo/Jaeger, app services
Dockerfiles/
k6/ # Load test scripts
grafana/ # Dashboards
prometheus/
.github/
workflows/ # CI: lint, test, build, codegen, preview envs
Makefile # make dev/test/demo/format/lint
PROJECT_SCOPE.md # Living scope doc (this file)
README.md
Conventions:
- Single OpenAPI spec in
openapi/spec.yaml; codegen pipelines output TS/Go/Python clients intopackages/shared-typesconsumed byapps/*. - Env via
.envfiles per app; compose overrides for local dev. Secrets kept out of VCS. - Testing: unit tests colocated; e2e tests in
apps/web/e2e(Playwright) andinfra/k6. - Observability: common OTEL setup in
packages/opentelemetryto enforce consistent tracing/metrics.
- Staging: https://staging.api.overfly.ing
- Production: https://production.api.overfly.ing
- API Docs: https://production.api.overfly.ing/docs
- ✅ FastAPI backend with full CRUD operations
- ✅ React frontend with TanStack Query
- ✅ PostgreSQL 18 with Alembic migrations
- ✅ Comprehensive test suite (pytest + Playwright)
- ✅ OpenAPI specification with TypeScript codegen
- ✅ Kubernetes deployment on GKE
- ✅ CI/CD pipeline with GitHub Actions
- ✅ Custom domains with SSL/TLS
- 🔜 Worker service for GPU job execution
- 🔜 Real-time updates via WebSocket
- 🔜 NATS event streaming
- 🔜 Live log streaming
- 🔮 Spatial features (PostGIS + deck.gl)
- 🔮 Go usage service for customer reporting
- 🔮 OpenTelemetry observability
- 🔮 DuckDB analytics layer
See Roadmap for details.
planet/
├── apps/
│ ├── api/ # FastAPI service
│ ├── web/ # Next.js frontend
│ ├── worker/ # GPU job runner (future)
│ └── usage-go/ # Go usage service (future)
├── packages/
│ ├── shared-types/ # OpenAPI-generated types
│ ├── ui/ # Shared UI components
│ └── opentelemetry/# Observability setup
├── db/
│ ├── migrations/ # Alembic migrations
│ └── seeds/ # Demo data
├── k8s/
│ ├── api/ # Kubernetes manifests
│ └── ingress/ # Ingress + cert-manager
├── docs/ # Documentation
└── infra/ # Infrastructure as code
This project directly aligns with Planet's Software Engineer role requirements:
| Requirement | Implementation |
|---|---|
| React + Python + Go | ✅ Next.js, FastAPI, Go service (planned) |
| OpenAPI codegen | ✅ OpenAPI-first with automated type generation |
| Relational databases | ✅ PostgreSQL 18 with complex queries |
| Spatial data | 🔜 PostGIS + deck.gl (Milestone 4) |
| Event-driven pipelines | ✅ Architecture designed for NATS + WebSocket |
| CI/CD | ✅ GitHub Actions with staging/production |
| Customer telemetry | 🔜 Go usage service (Milestone 5) |
See Planet Job Description for detailed alignment.
# 1. Make API changes
# 2. Check drift
make openapi-diff
# 3. Update spec
make openapi-sync
# 4. Generate types
make codegen
# 5. Use in frontend (automatic type safety!)- Backend: 80%+ coverage with pytest, PostgreSQL test database
- Frontend: Unit tests (Vitest) + E2E tests (Playwright)
- CI/CD: Automated testing on every push
- Kubernetes Autopilot on GKE
- Staging + Production environments
- Automatic SSL/TLS with cert-manager
- Blue/green deployments via GitHub Actions
This is a portfolio project, but suggestions and feedback are welcome!
- Create feature branch
- Write tests first
- Implement feature
- Update documentation
- Create pull request
See Development Guide for workflow details.