Introduction

You can use Vocode to interact with open-source transcription, large language, and synthesis models. Many of these models have been optimized to run on CPU, which means that you can have a conversation with an AI locally without Internet (and thus for free!). Disclaimer: Many of these models are optimized for Apple Silicon, so this may work best on a M1 or M2 Mac computer.

Setting up the conversation

Start by copying the StreamingConversation quickstart. This example uses Deepgram for transcription, ChatGPT for LLM, and Azure for synthesis - weโ€™ll be replacing each piece with a corresponding open-source model.
conversation = StreamingConversation(
    output_device=speaker_output,
    transcriber=DeepgramTranscriber(
        DeepgramTranscriberConfig.from_input_device(microphone_input)
    ),
    agent=ChatGPTAgent(
        ChatGPTAgentConfig(
            initial_message=BaseMessage(text="Hello!"),
            prompt_preamble="The AI is having a pleasant conversation about life"
        )
    ),
    synthesizer=AzureSynthesizer(
        AzureSynthesizerConfig.from_output_device(speaker_output)
    ),
    logger=logger,
)

Whisper.cpp

Follow the steps in the whisper.cpp repo to download one of the models. As of now (2023/05/01), hereโ€™s an example flow to do this:
  1. Clone the whisper.cpp repo
  2. From the whisper.cpp directory, run:
bash ./models/download-ggml-model.sh tiny.en
make
Find your (absolute) paths for the whisper.cpp shared library file and the model youโ€™ve just downloaded. If whisper.cpp is downloaded at /whisper.cpp, the paths from the previous example would be:
  • /whisper.cpp/libwhisper.so
  • /whisper.cpp/models/ggml-tiny.bin
Set up your streaming WhisperCPPTranscriber in StreamingConversation as follows:
from vocode.streaming.models.transcriber import WhisperCPPTranscriberConfig
from vocode.streaming.transcriber.whisper_cpp_transcriber import WhisperCPPTranscriber

StreamingConversation(
    ...
    transcriber=WhisperCPPTranscriber(
        WhisperCPPTranscriberConfig.from_input_device(
            microphone_input,
            libname="/whisper.cpp/libwhisper.so",
            fname_model="/whisper.cpp/models/ggml-tiny.bin",
        )
    )
    ...
)

GPT4All

Install the pygpt4all package by running:
pip install pygpt4all
Download the latest GPT4All-J model from the pygpt4all repo. As of today (2023/05/01), you can download it by visiting: https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin Set up your agent in StreamingConversation as follows:
from vocode.streaming.models.agent import GPT4AllAgentConfig
from vocode.streaming.agent.gpt4all_agent import GPT4AllAgent

StreamingConversation(
    ...
    agent=GPT4AllAgent(
        GPT4AllAgentConfig(
            model_path="path/to/ggml-gpt4all-j-...-.bin",
            initial_message=BaseMessage(text="Hello!"),
            prompt_preamble="The AI is having a pleasant conversation about life"
        )
    )
    ...
)

Llama.cpp

You can use any model supported by llama.cpp with Vocode. This includes LLaMA, Alpaca, Vicuna, Koala, WizardLM, and more. We will use NousResearch/Nous-Hermes-13b in this example because it currently ranks highly on HuggingFaceโ€™s Open LLM Leaderboard. Our implementation is built on top of langchain, which integrates with llama.cpp through llama-cpp-python. Install llama-cpp-python by running the following:
pip install llama-cpp-python
or run the following to install it with support for offloading model layers to a GPU via CUDA:
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python
llama-cpp-python has more installation commands for different BLAS backends. Set up your agent in StreamingConversation as follows:
from vocode.streaming.models.agent import LlamacppAgentConfig
from vocode.streaming.agent.llamacpp_agent import LlamacppAgent

StreamingConversation(
    ...
    agent=LlamacppAgent(
        LlamacppAgentConfig(
            prompt_preamble="The AI is having a pleasant conversation about life",
            llamacpp_kwargs={"model_path": "path/to/nous-hermes-13b.ggmlv3.q4_0.bin", "verbose": True},
            prompt_template="alpaca",
            initial_message=BaseMessage(text="Hello!"),
        )
    )
    ...
)
You can add the key n_gpu_layers to the llamacpp_kwargs to offload some of the modelโ€™s layers to a GPU.

Coqui TTS

Install the Coqui TTS package by running:
pip install TTS
See the Coqui TTS repo for more instructions in case you run into bugs. Find which OS speech synthesis model youโ€™d like to use. One way to do this is to run:
tts --list_models
For this example, weโ€™ll use Tacotron2. Set up your synthesizer in StreamingConversation as follows:
from vocode.streaming.models.synthesizer import CoquiTTSSynthesizerConfig,
from vocode.streaming.synthesizer.coqui_tts_synthesizer import CoquiTTSSynthesizer

StreamingConversation(
    ...
    synthesizer=CoquiTTSSynthesizer(
        CoquiTTSSynthesizerConfig.from_output_device(
            speaker_output,
            tts_kwargs = {
                "model_name": "tts_models/en/ljspeech/tacotron2-DDC_ph"
            }
        )
    )
    ...
)

Run the conversation

Putting this all together, our StreamingConversation instance looks like:
StreamingConversation(
    output_device=speaker_output,
    transcriber=WhisperCPPTranscriber(
        WhisperCPPTranscriberConfig.from_input_device(
            microphone_input,
            libname="path/to/whisper.cpp/libwhisper.so",
            fname_model="path/to/whisper.cpp/models/ggml-tiny.bin",
        )
    ),
    agent=GPT4AllAgent(
        GPT4AllAgentConfig(
            model_path="path/to/ggml-...-.bin",
            initial_message=BaseMessage(text="Hello!"),
            prompt_preamble="The AI is having a pleasant conversation about life"
        )
    ),
    synthesizer=CoquiTTSSynthesizer(
        CoquiTTSSynthesizerConfig.from_output_device(
            speaker_output,
            tts_kwargs = {
                "model_name": "tts_models/en/ljspeech/tacotron2-DDC_ph"
            }
        )
    ),
    logger=logger,
)
Start the conversation by running:
python quickstarts/streaming_conversation.py