Conversation orchestration as a service
In order to have a back-and-forth conversation, you have to do several things:- Stream audio/receive audio asynchronously
- Generate responses & understand when to generate responses
- Handle innacuracies and interruptions
- Speech Recognition
- AI/NLU Layer
- Speech Synthesis
Our core abstraction: the Conversation
Vocode breaks down a Conversation into 5 core pieces:- Transcriber (used for speech recognition)
- Agent (AI/NLU layer)
- Synthesizer (used for speech synthesis)
- Input Device (microphone for audio in)
- Output Device (speaker for audio out)
Transcriber options (ex.
DeepgramTranscriber, AssemblyAITranscriber, GoogleTranscriber) that allow you to specify
which providers you would like to use and their parameters.
After specifying all of the types, Vocode handles everything else necessary
to have the conversation.