Skip to content

VoiceRAG #374

@placerda

Description

@placerda

Why are we doing this?
Voice is a natural way to interact with AI. By adding real-time voice to gpt-rag, we make retrieval-augmented assistants more engaging, accessible, and useful in scenarios like meetings, customer support, and live collaboration where hands-free or multilingual interaction is essential.

What does it do?

  • Voice-enabled RAG – Adds “speech in, speech out” to gpt-rag, letting users query enterprise knowledge by voice and receive spoken, retrieval-grounded responses.

  • Phone Integration – Lets user call a phone number and interact with the assistant or assistant doing outbound calls.

  • Realtime reasoning – Uses the Azure OpenAI GPT Realtime API for low-latency transcription, retrieval, and response synthesis over enterprise data sources.

  • Use cases – Meeting assistants, customer service bots, live Q&A in Teams, and multilingual knowledge agents.

  • Nice to have: Teams integration – Lets VoiceRAG join Microsoft Teams calls, capture live audio queries, and provide contextual answers in real time.

Technical Guidelines

High Level Solution Architecture

Image

References

Image

Other

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions