feat(voice): add first-class voice support primitives #185

drewdrewthis · 2025-12-03T21:26:43Z

Add AudioData type and utilities (audioFromFile, audioFromBase64, audioFromBuffer)
Add textToSpeech utility for TTS conversion
Add scenario.user.speak() and scenario.agent.speak() for TTS messages
Add voice option to userSimulatorAgent for audio output
Add audio option to judgeAgent for multimodal evaluation

Proposed API:

    const result = await scenario.run({
      name: "fixed voice messages",
      description: "Test with predetermined voice messages",
      agents: [
        new VoiceAgent(),
        scenario.userSimulatorAgent({
          voice: "nova",
        }), // Text sim (not used in this script)
        scenario.judgeAgent({
          criteria: ["Agent responds appropriately to greeting"],
          // audio: "transcribe" | true | undefined,
        }),
      ],
      script: [
        // Fixed user voice message via TTS
        scenario.user.speak("Hello! Can you help me with something?"),
        scenario.agent(), // Agent generates audio response
        async (ctx) => {
          await saveConversationAudio(
            ctx,
            path.join(
              outputPath,
              `${StringUtils.kebabCase(ctx.config.name)}.wav`
            )
          );
        },
        scenario.judge(),
      ],
      setId,
    });
    ```

- Add AudioData type and utilities (audioFromFile, audioFromBase64, audioFromBuffer) - Add textToSpeech utility for TTS conversion - Add scenario.audio() script primitive for injecting audio - Add scenario.user.speak() and scenario.agent.speak() for TTS messages - Add voice option to userSimulatorAgent for audio output - Add audio option to judgeAgent for multimodal evaluation

- multimodal-audio-to-audio: use audioFromFile() + message() - multimodal-audio-to-text: use audioFromFile() + message() - multimodal-voice-to-voice: showcase all voice patterns - scenario.user.speak() for fixed TTS - scenario.userSimulatorAgent({ voice }) for generated audio - scenario.judgeAgent({ audio: true }) for multimodal eval - vegetarian-recipe-realtime: use audio judge option - scenario-expert-realtime: use audio judge option Remove scenario.audio() - users should use message() for file injection

…o: true

- Add audio mode options: 'transcribe' (Whisper) and true (multimodal) - Use OpenAI directly when audio content detected for tool support - Convert AI SDK tools to OpenAI function format - Add StringUtils.kebabCase utility - Update voice-to-voice conversation examples

drewdrewthis · 2025-12-09T13:44:55Z

javascript/examples/vitest/tests/multimodal-audio-to-audio.test.ts

+        scenario.judgeAgent({
+          model: openai("gpt-4o"),
+          criteria: ["The agent guesses the voice gender"],
+          audio: true,


Suggested change

audio: true,

transcribeOnly?: true

drewdrewthis · 2025-12-09T13:47:09Z

javascript/src/agents/user-simulator-agent.ts

 *     ],
 *     script: [
- *       user(),
+ *       user("Help me with billing"),  // Fixed text


Suggested change

* user("Help me with billing"), // Fixed text

* user.text("Help me with billing"), // Fixed text

otherwise it uses its default?

drewdrewthis added 4 commits December 3, 2025 16:04

feat(judge): auto-transcribe audio by default, pass through when audi…

9466681

…o: true

drewdrewthis requested review from 0xdeafcafe and rogeriochaves December 8, 2025 17:33

drewdrewthis linked an issue Dec 8, 2025 that may be closed by this pull request

Design and propose library API primitives for making voice a first class citzen in Scenario #181

Open

drewdrewthis self-assigned this Dec 9, 2025

drewdrewthis commented Dec 9, 2025

View reviewed changes

rogeriochaves force-pushed the main branch 2 times, most recently from 77a92af to 9fdb87c Compare December 16, 2025 15:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(voice): add first-class voice support primitives #185

feat(voice): add first-class voice support primitives #185

Uh oh!

drewdrewthis commented Dec 3, 2025 •

edited

Loading

Uh oh!

drewdrewthis Dec 9, 2025

Uh oh!

drewdrewthis Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	* user("Help me with billing"), // Fixed text
	* user.text("Help me with billing"), // Fixed text

feat(voice): add first-class voice support primitives #185

Are you sure you want to change the base?

feat(voice): add first-class voice support primitives #185

Uh oh!

Conversation

drewdrewthis commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drewdrewthis Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

drewdrewthis Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

drewdrewthis commented Dec 3, 2025 •

edited

Loading