Skip to content

Conversation

@drewdrewthis
Copy link
Collaborator

@drewdrewthis drewdrewthis commented Dec 3, 2025

  • Add AudioData type and utilities (audioFromFile, audioFromBase64, audioFromBuffer)
  • Add textToSpeech utility for TTS conversion
  • Add scenario.user.speak() and scenario.agent.speak() for TTS messages
  • Add voice option to userSimulatorAgent for audio output
  • Add audio option to judgeAgent for multimodal evaluation

Proposed API:

    const result = await scenario.run({
      name: "fixed voice messages",
      description: "Test with predetermined voice messages",
      agents: [
        new VoiceAgent(),
        scenario.userSimulatorAgent({
          voice: "nova",
        }), // Text sim (not used in this script)
        scenario.judgeAgent({
          criteria: ["Agent responds appropriately to greeting"],
          // audio: "transcribe" | true | undefined,
        }),
      ],
      script: [
        // Fixed user voice message via TTS
        scenario.user.speak("Hello! Can you help me with something?"),
        scenario.agent(), // Agent generates audio response
        async (ctx) => {
          await saveConversationAudio(
            ctx,
            path.join(
              outputPath,
              `${StringUtils.kebabCase(ctx.config.name)}.wav`
            )
          );
        },
        scenario.judge(),
      ],
      setId,
    });
    ```

- Add AudioData type and utilities (audioFromFile, audioFromBase64, audioFromBuffer)
- Add textToSpeech utility for TTS conversion
- Add scenario.audio() script primitive for injecting audio
- Add scenario.user.speak() and scenario.agent.speak() for TTS messages
- Add voice option to userSimulatorAgent for audio output
- Add audio option to judgeAgent for multimodal evaluation
- multimodal-audio-to-audio: use audioFromFile() + message()
- multimodal-audio-to-text: use audioFromFile() + message()
- multimodal-voice-to-voice: showcase all voice patterns
  - scenario.user.speak() for fixed TTS
  - scenario.userSimulatorAgent({ voice }) for generated audio
  - scenario.judgeAgent({ audio: true }) for multimodal eval
- vegetarian-recipe-realtime: use audio judge option
- scenario-expert-realtime: use audio judge option

Remove scenario.audio() - users should use message() for file injection
- Add audio mode options: 'transcribe' (Whisper) and true (multimodal)
- Use OpenAI directly when audio content detected for tool support
- Convert AI SDK tools to OpenAI function format
- Add StringUtils.kebabCase utility
- Update voice-to-voice conversation examples
scenario.judgeAgent({
model: openai("gpt-4o"),
criteria: ["The agent guesses the voice gender"],
audio: true,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
audio: true,
transcribeOnly?: true

* ],
* script: [
* user(),
* user("Help me with billing"), // Fixed text
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* user("Help me with billing"), // Fixed text
* user.text("Help me with billing"), // Fixed text

otherwise it uses its default?

@rogeriochaves rogeriochaves force-pushed the main branch 2 times, most recently from 77a92af to 9fdb87c Compare December 16, 2025 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Design and propose library API primitives for making voice a first class citzen in Scenario

2 participants