-
Notifications
You must be signed in to change notification settings - Fork 294
Description
Why are we doing this?
Voice is a natural way to interact with AI. By adding real-time voice to gpt-rag, we make retrieval-augmented assistants more engaging, accessible, and useful in scenarios like meetings, customer support, and live collaboration where hands-free or multilingual interaction is essential.
What does it do?
-
Voice-enabled RAG – Adds “speech in, speech out” to gpt-rag, letting users query enterprise knowledge by voice and receive spoken, retrieval-grounded responses.
-
Phone Integration – Lets user call a phone number and interact with the assistant or assistant doing outbound calls.
-
Realtime reasoning – Uses the Azure OpenAI GPT Realtime API for low-latency transcription, retrieval, and response synthesis over enterprise data sources.
-
Use cases – Meeting assistants, customer service bots, live Q&A in Teams, and multilingual knowledge agents.
-
Nice to have: Teams integration – Lets VoiceRAG join Microsoft Teams calls, capture live audio queries, and provide contextual answers in real time.
Technical Guidelines
High Level Solution Architecture
References
- Related IP:
Other
- https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/upgrade-your-voice-agent-with-azure-ai-voice-live-api/4458247
- Integrate Microsoft Teams real-time media bots via Graph Cloud Communications API or Bot Framework.
- https://learn.microsoft.com/en-us/microsoftteams/platform/bots/calls-and-meetings/calls-meetings-bots-overview
- https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/realtime-audio-webrtc