How it works - Vocode

Conversation orchestration as a service

In order to have a back-and-forth conversation, you have to do several things:

Stream audio/receive audio asynchronously
Generate responses & understand when to generate responses
Handle innacuracies and interruptions

And all of this is done via orchestration of:

Speech Recognition
AI/NLU Layer
Speech Synthesis

Vocode conveniently abstracts away much of the complexity while giving developers the flexibility to control every piece of the conversation.

Our core abstraction: the Conversation

Vocode breaks down a Conversation into 5 core pieces:

Transcriber (used for speech recognition)
Agent (AI/NLU layer)
Synthesizer (used for speech synthesis)
Input Device (microphone for audio in)
Output Device (speaker for audio out)

In order to run an entire conversation, developers can specify each of these 5 pieces with the various types provided by Vocode. As an example, there are several Transcriber options (ex. DeepgramTranscriber, AssemblyAITranscriber, GoogleTranscriber) that allow you to specify which providers you would like to use and their parameters. After specifying all of the types, Vocode handles everything else necessary to have the conversation.

​Conversation orchestration as a service

​Our core abstraction: the Conversation

Conversation orchestration as a service

Our core abstraction: the Conversation