Overview
Synthesizers are used to convert text into speech; this guide will show you how to configure and use synthesizers in Vocode.Supported Synthesizers
Vocode currently supports the following synthesizers:- Azure (Microsoft)
- Eleven Labs
- Rime
- Play.ht
- Coqui TTS
- GTTS (Google Text-to-Speech)
- Stream Elements
- Bark
SynthesizerConfig class.
Configuring Synthesizers
To use a synthesizer, you need to create a configuration object for the synthesizer you want to use. Here are some examples of how to create configuration objects for different synthesizers:Example 1: Using Eleven Labs with a phone call
ElevenLabsSynthesizerConfig.from_telephone_output_device() method is used to create a configuration object for the Eleven Labs synthesizer.
The method hardcodes some values like the sampling_rate and audio_encoding for compatibility with telephone output devices.
Example 2: Using Azure in StreamingConversation locally
AzureSynthesizerConfig.from_output_device() method is used to create a configuration object for the Azure synthesizer.
The method takes a speaker_output object as an argument, and extracts the sampling_rate and audio_encoding from the output device.
When to Use Configs vs. Synthesizer Objects
- For everything except
StreamingConversation, you must use configuration objects. - For
StreamingConversation, you can use the actual synthesizer object, but you still need to initialize it with a configuration object.
Synthesizer Comparisons
| Provider | Latency | Voice Cloning | Natural Sounding | Notes |
|---|---|---|---|---|
| Azure (Microsoft) | Low | No | ||
| Low | No | |||
| Eleven Labs | High | Yes | ||
| Rime | Low | No | ||
| Play.ht | High | Yes | ||
| Coqui TTS | Open source | |||
| GTTS | ||||
| Stream Elements | ||||
| Bark |
