Overview

Synthesizers are used to convert text into speech; this guide will show you how to configure and use synthesizers in Vocode.

Supported Synthesizers

Vocode currently supports the following synthesizers:

Azure (Microsoft)
Google
Eleven Labs
Rime
Play.ht
Coqui TTS
GTTS (Google Text-to-Speech)
Stream Elements
Bark

These synthesizers are defined using their respective configuration classes, which are subclasses of the SynthesizerConfig class.

Configuring Synthesizers

To use a synthesizer, you need to create a configuration object for the synthesizer you want to use. Here are some examples of how to create configuration objects for different synthesizers:

Example 1: Using Eleven Labs with a phone call

from vocode.streaming.telephony.hosted.inbound_call_server import InboundCallServer
from vocode.streaming.models.synthesizer import ElevenLabsSynthesizerConfig

server = InboundCallServer(
    ...
    synthesizer_config=ElevenLabsSynthesizerConfig.from_telephone_output_device(
        api_key=os.getenv("ELEVENLABS_API_KEY"),
        voice_id=os.getenv("YOUR VOICE ID")
    )
    ...
)

In this example, the ElevenLabsSynthesizerConfig.from_telephone_output_device() method is used to create a configuration object for the Eleven Labs synthesizer. The method hardcodes some values like the sampling_rate and audio_encoding for compatibility with telephone output devices.

Example 2: Using Azure in StreamingConversation locally

from vocode.streaming.models.synthesizer import AzureSynthesizerConfig
from vocode.helpers import create_microphone_input_and_speaker_output

microphone_input, speaker_output = create_microphone_input_and_speaker_output(
        streaming=True, use_default_devices=False
)

conversation = StreamingConversation(
    ...
    synthesizer=AzureSynthesizer(
        AzureSynthesizerConfig.from_output_device(speaker_output)
    ),
    ...
)

In this example, the AzureSynthesizerConfig.from_output_device() method is used to create a configuration object for the Azure synthesizer. The method takes a speaker_output object as an argument, and extracts the sampling_rate and audio_encoding from the output device.

When to Use Configs vs. Synthesizer Objects

For everything except StreamingConversation, you must use configuration objects.
For StreamingConversation, you can use the actual synthesizer object, but you still need to initialize it with a configuration object.

Synthesizer Comparisons

Provider	Latency	Voice Cloning	Notes
Azure (Microsoft)	Low	No
Google	Low	No
Eleven Labs	High	Yes
Rime	Low	No
Play.ht	High	Yes
Coqui TTS			Open source
GTTS
Stream Elements
Bark

Guides

Python

Using synthesizers

Overview

Supported Synthesizers

Configuring Synthesizers

Example 1: Using Eleven Labs with a phone call

Example 2: Using Azure in StreamingConversation locally

When to Use Configs vs. Synthesizer Objects

Synthesizer Comparisons

Guides

Python

​Overview

​Supported Synthesizers

​Configuring Synthesizers

​Example 1: Using Eleven Labs with a phone call

​Example 2: Using Azure in StreamingConversation locally

​When to Use Configs vs. Synthesizer Objects

​Synthesizer Comparisons

Overview

Supported Synthesizers

Configuring Synthesizers

Example 1: Using Eleven Labs with a phone call

Example 2: Using Azure in StreamingConversation locally

When to Use Configs vs. Synthesizer Objects

Synthesizer Comparisons