Skip to content

Commit eacb892

Browse files
Merge pull request #177 from jamesrochabrun/claude/update-openai-api-011CUydfJyQkbmowkGChjeoK
Update app to latest OpenAI API
2 parents b4d4745 + 178164b commit eacb892

33 files changed

+2955
-37
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,9 @@ DerivedData/
66
.swiftpm/configuration/registries.json
77
.swiftpm/xcode/package.xcworkspace/contents.xcworkspacedata
88
.netrc
9+
Package.resolved
910

1011
# Xcode Swift Package Manager
1112
**/xcshareddata/swiftpm/
1213
**/project.xcworkspace/xcshareddata/swiftpm/
14+
**/xcshareddata/IDEWorkspaceChecks.plist

.swiftpm/xcode/package.xcworkspace/xcshareddata/IDEWorkspaceChecks.plist

Lines changed: 0 additions & 8 deletions
This file was deleted.

Examples/RealtimeExample/README.md

Lines changed: 267 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,267 @@
1+
# OpenAI Realtime API Example
2+
3+
This example demonstrates how to use SwiftOpenAI's Realtime API for bidirectional voice conversations with OpenAI's GPT-4o models.
4+
5+
## Features
6+
7+
- Real-time bidirectional audio streaming
8+
- Voice Activity Detection (VAD) for automatic turn-taking
9+
- Audio transcription of both user and AI speech
10+
- Function calling support
11+
- Interrupt handling when user starts speaking
12+
13+
## Requirements
14+
15+
- iOS 15+, macOS 12+, watchOS 9+
16+
- Microphone permissions
17+
- OpenAI API key
18+
19+
## Setup
20+
21+
### 1. Add Microphone Permission
22+
23+
Add the following to your `Info.plist`:
24+
25+
```xml
26+
<key>NSMicrophoneUsageDescription</key>
27+
<string>We need access to your microphone for voice conversations with AI</string>
28+
```
29+
30+
### 2. macOS Sandbox Configuration
31+
32+
If targeting macOS, enable the following in your target's Signing & Capabilities:
33+
34+
- **App Sandbox**:
35+
- Outgoing Connections (Client) ✓
36+
- Audio Input ✓
37+
- **Hardened Runtime**:
38+
- Audio Input ✓
39+
40+
## Usage
41+
42+
### Basic Example
43+
44+
```swift
45+
import SwiftUI
46+
import OpenAI
47+
48+
struct ContentView: View {
49+
let realtimeManager = RealtimeManager()
50+
@State private var isActive = false
51+
52+
var body: some View {
53+
Button(isActive ? "Stop" : "Start") {
54+
isActive.toggle()
55+
if isActive {
56+
Task {
57+
try? await realtimeManager.startConversation()
58+
}
59+
} else {
60+
Task {
61+
await realtimeManager.stopConversation()
62+
}
63+
}
64+
}
65+
}
66+
}
67+
68+
@RealtimeActor
69+
final class RealtimeManager {
70+
private var session: OpenAIRealtimeSession?
71+
private var audioController: AudioController?
72+
73+
func startConversation() async throws {
74+
// Initialize service
75+
let service = OpenAIServiceFactory.service(apiKey: "your-api-key")
76+
77+
// Configure session
78+
let config = OpenAIRealtimeSessionConfiguration(
79+
inputAudioFormat: .pcm16,
80+
inputAudioTranscription: .init(model: "whisper-1"),
81+
instructions: "You are a helpful assistant",
82+
modalities: [.audio, .text],
83+
outputAudioFormat: .pcm16,
84+
voice: "shimmer"
85+
)
86+
87+
// Create session
88+
session = try await service.realtimeSession(
89+
model: "gpt-4o-mini-realtime-preview-2024-12-17",
90+
configuration: config
91+
)
92+
93+
// Setup audio
94+
audioController = try await AudioController(modes: [.playback, .record])
95+
96+
// Handle microphone input
97+
Task {
98+
let micStream = try audioController!.micStream()
99+
for await buffer in micStream {
100+
if let base64Audio = AudioUtils.base64EncodeAudioPCMBuffer(from: buffer) {
101+
await session?.sendMessage(
102+
OpenAIRealtimeInputAudioBufferAppend(audio: base64Audio)
103+
)
104+
}
105+
}
106+
}
107+
108+
// Handle AI responses
109+
Task {
110+
for await message in session!.receiver {
111+
switch message {
112+
case .responseAudioDelta(let audio):
113+
audioController?.playPCM16Audio(base64String: audio)
114+
case .inputAudioBufferSpeechStarted:
115+
audioController?.interruptPlayback()
116+
default:
117+
break
118+
}
119+
}
120+
}
121+
}
122+
123+
func stopConversation() {
124+
audioController?.stop()
125+
session?.disconnect()
126+
}
127+
}
128+
```
129+
130+
## Configuration Options
131+
132+
### Voice Options
133+
134+
- `alloy` - Neutral and balanced
135+
- `echo` - Friendly and warm
136+
- `shimmer` - Gentle and calming
137+
138+
### Turn Detection
139+
140+
#### Server VAD (Voice Activity Detection)
141+
142+
```swift
143+
turnDetection: .init(type: .serverVAD(
144+
prefixPaddingMs: 300, // Audio to include before speech
145+
silenceDurationMs: 500, // Silence duration to detect end
146+
threshold: 0.5 // Activation threshold (0.0-1.0)
147+
))
148+
```
149+
150+
#### Semantic VAD
151+
152+
```swift
153+
turnDetection: .init(type: .semanticVAD(
154+
eagerness: .medium // .low, .medium, or .high
155+
))
156+
```
157+
158+
### Modalities
159+
160+
```swift
161+
modalities: [.audio, .text] // Both audio and text
162+
modalities: [.text] // Text only (disables audio)
163+
```
164+
165+
## Handling Different Events
166+
167+
```swift
168+
for await message in session.receiver {
169+
switch message {
170+
case .error(let error):
171+
print("Error: \(error ?? "Unknown")")
172+
173+
case .sessionCreated:
174+
print("Session started")
175+
176+
case .sessionUpdated:
177+
// Trigger first response if AI speaks first
178+
await session.sendMessage(OpenAIRealtimeResponseCreate())
179+
180+
case .responseAudioDelta(let base64Audio):
181+
audioController.playPCM16Audio(base64String: base64Audio)
182+
183+
case .inputAudioBufferSpeechStarted:
184+
// User started speaking, interrupt AI
185+
audioController.interruptPlayback()
186+
187+
case .responseTranscriptDone(let transcript):
188+
print("AI said: \(transcript)")
189+
190+
case .inputAudioTranscriptionCompleted(let transcript):
191+
print("User said: \(transcript)")
192+
193+
case .responseFunctionCallArgumentsDone(let name, let args, let callId):
194+
print("Function \(name) called with: \(args)")
195+
// Handle function call and return result
196+
197+
default:
198+
break
199+
}
200+
}
201+
```
202+
203+
## Function Calling
204+
205+
Add tools to your configuration:
206+
207+
```swift
208+
let config = OpenAIRealtimeSessionConfiguration(
209+
tools: [
210+
.init(
211+
name: "get_weather",
212+
description: "Get the current weather for a location",
213+
parameters: [
214+
"type": "object",
215+
"properties": [
216+
"location": [
217+
"type": "string",
218+
"description": "City name"
219+
]
220+
],
221+
"required": ["location"]
222+
]
223+
)
224+
],
225+
toolChoice: .auto
226+
)
227+
```
228+
229+
Handle function calls in the message loop:
230+
231+
```swift
232+
case .responseFunctionCallArgumentsDone(let name, let args, let callId):
233+
// Parse arguments and execute function
234+
let result = handleFunction(name: name, args: args)
235+
236+
// Return result to OpenAI
237+
await session.sendMessage(
238+
OpenAIRealtimeConversationItemCreate(
239+
item: .init(role: "function", text: result)
240+
)
241+
)
242+
```
243+
244+
## Troubleshooting
245+
246+
### No Audio Output
247+
248+
- Check that `.playback` mode is included in AudioController initialization
249+
- Verify audio permissions are granted
250+
- Ensure `outputAudioFormat` is set to `.pcm16`
251+
252+
### No Microphone Input
253+
254+
- Check that `.record` mode is included in AudioController initialization
255+
- Verify microphone permissions in Info.plist
256+
- Check System Settings > Privacy & Security > Microphone
257+
258+
### WebSocket Connection Fails
259+
260+
- Verify API key is correct
261+
- Check that `openai-beta: realtime=v1` header is included (SwiftOpenAI handles this automatically)
262+
- Ensure you're using a compatible model (gpt-4o-mini-realtime-preview or newer)
263+
264+
## Resources
265+
266+
- [OpenAI Realtime API Documentation](https://platform.openai.com/docs/api-reference/realtime)
267+
- [SwiftOpenAI GitHub](https://github.com/jamesrochabrun/SwiftOpenAI)

0 commit comments

Comments
 (0)