RAG App — Minimal Speech Bot

Building for Ubuntu 24, driver versio 550.163.01.

NOTE: This is not a realtime model-based app

This is a minimal speech-in/speech-out bot using LiveKit Agents, modeled after conversify. It wires STT -> LLM -> TTS using OpenAI-compatible endpoints.

Prerequisites

Python 3.10+
LiveKit Server running (local or cloud)
An OpenAI-compatible LLM endpoint (e.g., SGLang, vLLM, Ollama w/ openai API)

Setup

Set local env:

cp .env.example .env.local

We will create a common docker image/container for both LLM and app.

cd rag_app
sudo docker build -t rapp:v0 .

But the base image I used, is missing cudnn (find / -name libcudnn_ops_infer.so* returns nothing!). We need to install it. Inside container

sudo docker run -it --gpus all --network host docker.io/library/rapp:v0
apt-get update
# try simple installation
apt install nvidia-cudnn

if you get "forbidden IP" errors, its probably because the ubuntu repositories are mentioned as http links and not https. We fix this manually.

# backup
cp /etc/apt/sources.list /etc/apt/sources.list.backup
nano /etc/apt/sources.list
# -> now replace every http with https 
apt update
apt install nvidia-cudnn
find / -name libcudnn_ops_infer.so*
# should return 
# /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8
# /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.4
# /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so
# /tmp/nvidia-cudnn/cuda/lib64/libcudnn_ops_infer.so.8
# /tmp/nvidia-cudnn/cuda/lib64/libcudnn_ops_infer.so.8.2.4
# /tmp/nvidia-cudnn/cuda/lib64/libcudnn_ops_infer.so

I wanted to make this part of Dockerfile but editing the sources.list via script would make it harder for future reader (including me).. I want to commit these changes into an image.

We don't setup open source & local instances for STT or TTS for now.

Auto (for TTS)

Using easiest method by directly using "Sarvam.ai" for TTS. So I signed-up and got the API-KEY. The plugin installation is handled in requirements.txt. Just save the KEY in .env.local. My final env file looks like this

LIVEKIT_API_KEY=APIBJ2NC...4
LIVEKIT_API_SECRET=0fOS0QOJ7...vikNlnyWjsinEOHe9Gv
LIVEKIT_URL=wss://test-os3..z5c.livekit.cloud
ASSEMBLYAI_API_KEY=f1cc818ea2..5d1e2cc36
SARVAM_API_KEY=sk_e5zjorg7_6...mMCMIut

FasterWhisper for STT

We write a wrapper (MinimalWhisperSTT) for the FasterWhisper method such that it inherits the structure of LiveKit's stt.STT.

As we use (Faster)Whisper-STT, or any other non-streaming models, we need to add a VAD or Voice Activity Detection model (e.g. silero here).

Update config.yaml with your endpoints and model names.

Now we'll launch 2 terminals - for LLM serving and main app.

Run

SGLang (for LLM) - Terminal 1

Added instructions for linux to Dockerfile. Build this

# Launch using
sudo docker run -it --gpus all --network host docker.io/library/rapp:v0
# if docker image exists, use 
# sudo docker run -it --gpus all --network host rapp_min_env:v1
# Inside container
bash scripts/run_llm.sh 
# Runs at http://127.0.0.1:30000

Note that --network host only works for linux. For other OS, -p 30000:30000 should work.

Test it using python+curl

$ python3 test_sglang.py
{'id': '2513443d7e2b480b97f5612772e3561e', 'object': 'chat.completion', 'created': 1759490544, 'model': 'qwen/qwen2.5-0.5b-instruct', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': "<think>\nOkay, the user is asking about the capital of France. I know France's capital is Paris. But wait, maybe they're confused with another country? Let me confirm. The capital of France is indeed Paris. No, that's correct. I should state that clearly and maybe explain why it's the capital. Oh, and maybe mention that it's also the most populous city in the world. But the user just asked for the answer, so maybe just state the capital without extra details. Let me check the facts again. Yes, Paris is the capital and most populous city. So the answer is Paris.\n</think>\n\nThe capital of France is **Paris**.", 'reasoning_content': None, 'tool_calls': None}, 'logprobs': None, 'finish_reason': 'stop', 'matched_stop': 151645}], 'usage': {'prompt_tokens': 15, 'total_tokens': 152, 'completion_tokens': 137, 'prompt_tokens_details': None, 'reasoning_tokens': 0}, 'metadata': {'weight_version': 'default'}}

LiveKit - Terminal 2

Mounting local code for now

sudo docker run -it --gpus all -v /home/prashant/Documents/asmr/rag_app:/root/apps/rag_app --network host rapp_min_env:v1

Create a token (example):

lk token create \
  --api-key devkey --api-secret secret \
  --join --room my-first-room --identity user1 --valid-for 24h
# Returns
#Project URL: http://localhost:7880
#Access token: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE3NTk0MDMxOTIsImlkZW50aXR5IjoidXNlcjEiLCJpc3MiOiJkZXZrZXkiLCJuYW1lIjoidXNlcjEiLCJuYmYiOjE3NTkzMTY3OTIsInN1YiI6InVzZXIxIiwidmlkZW8iOnsicm9vbSI6Im15LWZpcnN0LXJvb20iLCJyb29tSm9pbiI6dHJ1ZX19.NBT7Dtna2T1jC8SdSQAQrXUjxjpE7R1_IqhgOoz1USQ
# Not sure where to use it. Just keep it for now.

Start playground

Sign up on https://cloud.livekit.io/
Next, start playground room here https://agents-playground.livekit.io/
Find the room id on right (e.g. playground-37zS-5BS9)
Obtain LIVEKIT credentials: Go to docs and on any of the code snippets, click on "Reveal API Key and secret". See below SS. Not sure if this is the best way.

Now paste the Livekit credentials in .env.local.

Run

# conda activate parakeet
# You have 4 options - 
lk cloud auth # Link your LiveKit Cloud project to the CLI
# You will be given a link to authenticate
# Rather using the same container 'rapp'
sudo docker run -it --gpus all --network host rapp_min_env:v1
python3 run.py connect --room=playground-kpLt-B1xk

Then visit the LiveKit Agents Playground and join the room to talk with the bot.

Notes

LLM uses OpenAI chat completions API. Configure llm.base_url, llm.api_key, and llm.model in config.yaml.
The default model of Whisper sucks. See SS below The one that actually listens and understands is "deepdml/faster-whisper-large-v3-turbo-ct2".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG App — Minimal Speech Bot

Prerequisites

Setup

Auto (for TTS)

FasterWhisper for STT

Run

SGLang (for LLM) - Terminal 1

LiveKit - Terminal 2

Run

Notes

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
scripts		scripts
.DS_Store		.DS_Store
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
requirements.txt		requirements.txt
run.py		run.py
test_sglang.py		test_sglang.py

License

pra-dan/minimal_livekit_app

Folders and files

Latest commit

History

Repository files navigation

RAG App — Minimal Speech Bot

Prerequisites

Setup

Auto (for TTS)

FasterWhisper for STT

Run

SGLang (for LLM) - Terminal 1

LiveKit - Terminal 2

Run

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages