[Beta] Benchmarking script

The benchmarking script is located at playground/streaming/benchmark.py. You can execute the benchmarking script using the CLI which will enable you to evaluate and compare transcribers, agents, and synthesizers. You can use it primarily to benchmark latency – but it can also be used to compare the quality of the different providers as well. The feature is in Beta and will continue to be improved upon – feel free to open an issue with any ideas.

Using the CLI

To access the options of the benchmarking script, run

python playground/streaming/benchmark.py --help

This will display all available options.

To conduct multiple trials and get averaged results, you can control num_cycles

--{transcriber,agent,synthesizer}_num_cycles 3  # component specific
--all_num_cycles 3                              # all components

To perform a comprehensive test across all supported transcribers, agents, and synthesizers, use the --all command.

With the CLI, you can get the raw output, write them to a file, and create graphs. To access your results and visualize them, they will be stored in the benchmark_results directory by default. You can also change this location using the --results_dir and --results_file options. If you want to create visual graphs, add the --create_graphs option when running your test.

Example: comparing synthesizers

To compare different synthesizers, use the --synthesizers flag followed by the names of the synthesizers you wish to compare. For instance,

python playground/streaming/benchmark.py --synthesizers Google Azure --synthesizer_text "Your text here"

Example: comparing transcribers

To compare different transcribers, you can use the --transcribers flag followed by the names of the transcribers you wish to compare. For example,

python playground/streaming/benchmark.py --transcribers deepgram assemblyai --transcriber_audio sample.wav

You can specify transcriber_use_mic instead of --transcriber_audio to use your microphone as the audio source.

Example: comparing agents

To compare different agents, use the --agents flag followed by the names of the agents you want to compare. For example,

python playground/streaming/benchmark.py --agents openai anthropic

You can set the prompt preamble with the --agent_prompt_preamble argument and the first input with the --agent_first_input option.

Tracing your application

At the top of quickstarts/streaming_conversation.py, include the following code:

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, SpanExporter
from opentelemetry.sdk.resources import Resource


class PrintDurationSpanExporter(SpanExporter):
    def __init__(self):
        super().__init__()
        self.spans = defaultdict(list)

    def export(self, spans):
        for span in spans:
            duration_ns = span.end_time - span.start_time
            duration_s = duration_ns / 1e9
            self.spans[span.name].append(duration_s)

    def shutdown(self):
        for name, durations in self.spans.items():
            print(f"{name}: {sum(durations) / len(durations)}")


trace.set_tracer_provider(TracerProvider(resource=Resource.create({})))
trace.get_tracer_provider().add_span_processor(
    SimpleSpanProcessor(PrintDurationSpanExporter())
)

This will print out stats about the conversation after it ends.