Voice Agents Overview¶
The Swarms framework supports the creation of interactive, voice-enabled agents through integration with the voice-agents package. These agents can perceive and respond using human-like speech, enabling hands-free interaction and more natural user experiences.
Core Features¶
- Real-time Streaming TTS: Leveraging OpenAI's TTS models to speak responses sentence-by-sentence as they are generated, minimizing latency.
- Differentiated Voices: Multiple voice profiles (alloy, onyx, nova, etc.) to give each agent in a swarm a unique personality.
- Speech-to-Text (STT): Integration for voice-based task inputs, allowing users to talk directly to their agent swarms.
- Seamless Integration: Works with the standard
Agentclass and complex multi-agent architectures like debates and sequential workflows.
Available Tutorials¶
In this section, you will find step-by-step guides on implementing voice capabilities:
- Single Speech Agent: Learn how to add real-time text-to-speech to a single standalone agent.
- Multi-Agent Speech Debate: A complex example showing two agents debating a topic using different voices, with optional voice-to-text input.
Getting Started¶
To use these features, you'll need to install the voice-agents package alongside swarms:
You'll also need a valid OPENAI_API_KEY to access the TTS and STT models.