This project demonstrates a multi-LLM orchestration system written in Go. It shows how to:
- Orchestrate multiple LLMs (OpenAI GPT-4o-mini models) with different prompts.
- Stream answers to the caller using Server-Sent Events (SSE).
- Enrich answers with domain data (a MongoDB collection of fictional flight data).
- Support multilingual queries (English and Spanish).
- Run everything locally with Docker Compose
graph TD
subgraph Client
A[Browser / curl]
end
subgraph Server
H((/api)) -->|SSE| A
H[HTTP Handler]
H --> O[Orchestrator]
O -->|prompt| L1[(LLM 1)]
O -->|prompt| L2[(LLM 2)]
O -->|aggregate| L3[(LLM 3)]
O -->|query| DB[(MongoDB flights)]
end
subgraph Docker
DB ---|network| Server
end
- LLM 1 – concise, formal replies (or list of flights when the question is about flights).
- LLM 2 – verbose, friendly replies (or duration & cost when the question is about flights).
- LLM 3 – aggregation layer that combines LLM1 and LLM2 responses.
When the user's question mentions flights (in English or Spanish) the orchestrator:
- Extracts origin / destination city names using a simple synonym map.
- Queries MongoDB for matching flights (case-insensitive, supports wildcard searches).
- Feeds the flight list to both LLMs with different prompts.
- LLM3 aggregates both responses into a unified, well-formatted answer.
- Streams back the final aggregated response via SSE.
For non-flight questions, LLM1/LLM2 are given the user's question with their respective style prompts, and LLM3 combines the formal and friendly perspectives into one balanced response.
The system supports queries in both English and Spanish:
- Flight queries work in both languages (e.g., "flights to London" / "vuelos a Londres")
- General questions are processed in the language they're asked
- City name variations are automatically mapped (e.g., "Londres" → "London", "Madrid" → "Madrid")
- Airport codes are supported (e.g., "JFK" → "New York", "MAD" → "Madrid")
The LLMs maintain the original language in their responses, providing a seamless multilingual experience.
- Go 1.22+
- Docker & Docker Compose (optional but easiest)
- OpenAI API key (required - set
OPENAI_API_KEYenvironment variable) - Internet connection (to access OpenAI API)
# 1. Clone the repo
$ git clone https://github.com/Cris245/go-llm-chat.git
$ cd go-llm-chat
# 2. Set your OpenAI API key (choose one method):
# Option A: Export environment variable
$ export OPENAI_API_KEY="sk-…"
# Option B: Create .env file
$ echo "OPENAI_API_KEY=sk-…" > .env
# 3. Start everything
$ docker-compose up --buildDocker Compose spins up:
- MongoDB on
mongodb://mongo:27017(aliased asMONGO_URI). - Go server on
http://localhost:8080.
On first start the server seeds the flightdb.flights collection with a set of 20 sample flights (Madrid ↔ Paris, London ↔ Berlin, Tokyo → LA, …). Seeding is done via upsert, so re-starts won't duplicate data.
- Start MongoDB locally (
brew services start mongodb-community@7or similar). - Export environment variables:
export MONGO_URI="mongodb://localhost:27017" export OPENAI_API_KEY="sk-…"
- Run the server:
go run ./cmd/server
POST /api with plain-text body. The response is an SSE stream.
Event Type |
Meaning | Example Data |
|---|---|---|
Status |
Internal status update (invoking LLM) | Invoking LLM 1 |
Message |
Final aggregated answer | See example below |
List all flights:
curl -N -X POST -d "Que vuelos hay en general?" http://localhost:8080/apiAsk for specific route:
curl -N -X POST -d "hay vuelos a londres?" http://localhost:8080/apiGeneral (non-flight) question:
curl -N -X POST -d "Explain quantum teleportation in simple terms" http://localhost:8080/apiThe -N flag keeps the connection open so you see the Status events followed by the Message.
"Incorrect API key provided" error:
- Ensure your OpenAI API key is valid and has sufficient credits
- Check that the key is properly set in your environment or
.envfile - Verify the key doesn't have extra characters or line breaks
"No flights found" for valid queries:
- The system includes price filtering - try removing price constraints
- Check that city names are spelled correctly
- Supported cities: Madrid, Paris, London, Barcelona, Valencia, Seville, Tokyo, New York, Los Angeles, Berlin, Rome
Connection refused on localhost:8080:
- Ensure Docker containers are running:
docker-compose ps - Check container logs:
docker-compose logs app - Verify port 8080 is not used by another application
The project includes a load testing script to validate concurrent request handling:
# Test 5 simultaneous requests
./scripts/load_test.sh 5
# Test 10 simultaneous requests
./scripts/load_test.sh 10The script sends multiple concurrent requests and validates that:
- All requests receive responses
- No requests are lost or dropped
- Server maintains performance under load
- SSE streams work correctly with multiple clients
Challenge: Coordinating three LLMs with different roles while maintaining response quality.
Solution: Implemented parallel processing with sync.WaitGroup and channels, ensuring LLM1 and LLM2 run concurrently, then LLM3 aggregates their results.
Challenge: Extracting flight parameters (cities, dates, prices) from natural language in multiple languages. Solution: Built comprehensive regex patterns and city synonym maps supporting airport codes (JFK → New York), multi-language variations (Londres → London), and price constraints.
Challenge: Providing live status updates during multi-step LLM processing. Solution: Implemented Server-Sent Events with proper event channels, allowing clients to see real-time progress through each LLM invocation.
Challenge: Ensuring the system can handle multiple simultaneous users without losing requests or mixing responses. Solution: Used Go's goroutines and proper channel management to handle concurrent requests independently, with each request getting its own SSE stream.
Challenge: Managing flight data with upsert operations and ensuring data consistency across restarts. Solution: Implemented upsert-based seeding that prevents duplicates while ensuring fresh data on each startup.
Challenge: Building a robust system that gracefully handles LLM failures, network issues, and partial responses. Solution: Implemented fallback mechanisms where if LLM3 aggregation fails, the system combines LLM1 and LLM2 responses directly.
cmd/
server/ # main.go – HTTP + SSE + orchestration wiring
internal/
db/ # MongoDB client, models & seed data
llmclient/ # Thin wrapper around OpenAI ChatCompletion
orchestrator/ # Core logic (detect flights, prompt LLMs, merge)
sse/ # Minimal SSE helper
scripts/
load_test.sh # Concurrent request testing script
Dockerfile # Builds the Go binary for prod
Docker-compose.yml # Mongo + server services
- Ability to connect to actual GDS (Global Distribution System) and PSS (Passenger Service System) to get real flight information.
- Hot reload for Docker development so each code change deploys new containers automatically.
- Simple web UI to make the system more user-friendly and visually clear.
- API rate monitor