Use Cases

Blog

Voice chat

API Doc

Get Started for free

Use Cases

Blog

Voice chat

API Doc

Get Started for free

Use Cases

Blog

Voice chat

API Doc

Get Started for free

Use Cases

Blog

Voice chat

API Doc

Get Started for free

Back to blog

Resources

How to Build Voice Agents with Spitch & LiveKit

Learn how to integrate Spitch STT and TTS models into Livekit in less than 5 minutes

Ifeoluwa Oduwaiye

Jul 11, 2025

Building voice agents has become significantly easier with our new integration with Livekit. What does this mean for developers? Well, you can now develop and deploy voice agents in Yoruba, Igbo, Hausa, Amharic or English in less than 5 minutes.

You can now bring natural-sounding, African voice capabilities to your Livekit applications in minutes. In this blog, we’ll guide you on how to integrate Spitch STT (Speech-to-Text) and TTS (Text-to-Speech) models into LiveKit. But before we dive in, here are some prerequisites you’ll need to follow along.

Prerequisites

A Spitch API key
Livekit API key, API Secret Key, and URL
A Python IDE
Optional: An OpenAI LLM API Key

Setting up your environment variables and modules

Install the following Python packages on your local computer

pip install spitch \

  "livekit-agents[deepgram,openai,cartesia,silero,turn-detector]~=1.0" \

  "livekit-plugins-noise-cancellation~=0.2" \

  "python-dotenv"

Set up your environment variables with the following API secrets

SPITCH_API_KEY=<Your Spitch API Key>

LIVEKIT_API_KEY=<your API Key>

LIVEKIT_API_SECRET=<your API Secret>

LIVEKIT_URL=<your URL>

OPENAI_API_KEY=<Your OpenAI API Key>

Speech‑to‑Text (STT) Integration

Spitch STT currently supports 5 African languages with more languages coming really soon. To get started with implementing our STT API into Livekit, use the code sample below, and feel free to check out our docs for more information.

from dotenv import load_dotenv
from livekit import agents
from livekit.agents import AgentSession, Agent, RoomInputOptions
from livekit.plugins import (
    spitch
)

load_dotenv()

class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(instructions="You are a helpful voice AI assistant.")

async def entrypoint(ctx: agents.JobContext):
    session = AgentSession(
        stt=spitch.STT(language="en")
        # , llm, tts, ...
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_input_options=RoomInputOptions(
            # LiveKit Cloud enhanced noise cancellation
            # - If self-hosting, omit this parameter
            # - For telephony applications, use `BVCTelephony` for best results
            noise_cancellation=noise_cancellation.BVC(), 
        ),
    )

    await ctx.connect()

    await session.generate_reply(
        instructions="Greet the user and offer your assistance."
    )

if __name__ == "__main__":
    agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))

Text‑to‑Speech (TTS) Integration

Spitch TTS currently also supports 5 African languages, and 22 voices with more languages and voices coming soon. To get started with implementing our TTS API into Livekit, use the code sample below, and feel free to check out our docs for more information.

from dotenv import load_dotenv
from livekit import agents
from livekit.agents import AgentSession, Agent, RoomInputOptions
from livekit.plugins import (
    spitch
)

load_dotenv()

class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(instructions="You are a helpful voice AI assistant.")

async def entrypoint(ctx: agents.JobContext):
    session = AgentSession(
        tts=spitch.TTS(language="en", voice="kani")
        # , llm, tts, ...
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_input_options=RoomInputOptions(
            # LiveKit Cloud enhanced noise cancellation
            # - If self-hosting, omit this parameter
            # - For telephony applications, use `BVCTelephony` for best results
            noise_cancellation=noise_cancellation.BVC(), 
        ),
    )

    await ctx.connect()

    await session.generate_reply(
        instructions="Greet the user and offer your assistance."
    )

if __name__ == "__main__":
    agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))

Integrating a STT-LLM-TTS pipeline with Spitch, OpenAI, and Livekit

This pipeline is going to take in voice as input, use the Spitch TTS to transcribe the audio. The generated transcript would be passed to an LLM for processing through predefined prompts. The output is then converted to audio using the Spitch TTS.

This pipeline can be used to develop multilingual voice agents that can be applied for a lot of use cases by simply changing the LLM prompt and language. To implement this pipeline, make use of the code below.

import asyncio
from asyncio import windows_events

# Force the SelectorEventLoopPolicy
asyncio.set_event_loop_policy(windows_events.WindowsSelectorEventLoopPolicy())


from dotenv import load_dotenv

from livekit import agents
from livekit.agents import AgentSession, Agent, RoomInputOptions
from livekit.plugins import (
    spitch,
    openai,
    noise_cancellation,
    silero,
)
from livekit.plugins.turn_detector.multilingual import MultilingualModel

load_dotenv()

class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(instructions="You are a helpful voice AI assistant.")


async def entrypoint(ctx: agents.JobContext):
    session = AgentSession(
        stt=spitch.STT(language="en"),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=spitch.TTS(language="en", voice="kani"),
        vad=silero.VAD.load(),
        turn_detection=MultilingualModel(),
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_input_options=RoomInputOptions(
            # LiveKit Cloud enhanced noise cancellation
            # - If self-hosting, omit this parameter
            # - For telephony applications, use `BVCTelephony` for best results
            noise_cancellation=noise_cancellation.BVC(),
        ),
    )

    await ctx.connect()

    await session.generate_reply(
        instructions="Greet the user and offer your assistance."
    )


if __name__ == "__main__":
    agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))

Next Steps & Resources

Concluding Remarks

Now that you are armed with all the resources needed to build voice agents with Spitch, it's time to put that knowledge into production! Feel free to experiment with the different voices and languages on our platform, and don't be a stranger to our docs. We can’t wait to see what you build!

Explore more

Research

Top Speech Generation Models for Agentic AI Use Cases

Speech generation can make or break your Agentic AI solution. Explore popular TTS models, metrics

Ifeoluwa Oduwaiye

May 16, 2025

Company

Spitch Agents: The AI Workforce Your Business Needs

Spitch Agents presents businesses with voice Agentic AI for customer support, education, operations

Ifeoluwa Oduwaiye

May 9, 2025

Resources

Making Media Smarter: How Real-Time Speech Tech is Changing Broadcasting

Explore speech AI for media & broadcasting firms like captioning, transcription & speech generation

Ifeoluwa Oduwaiye

Apr 15, 2025

Research

Top Speech Generation Models for Agentic AI Use Cases

Speech generation can make or break your Agentic AI solution. Explore popular TTS models, metrics

Ifeoluwa Oduwaiye

May 16, 2025

Company

Spitch Agents: The AI Workforce Your Business Needs

Spitch Agents presents businesses with voice Agentic AI for customer support, education, operations

Ifeoluwa Oduwaiye

May 9, 2025

Research

Top Speech Generation Models for Agentic AI Use Cases

Speech generation can make or break your Agentic AI solution. Explore popular TTS models, metrics

Ifeoluwa Oduwaiye

May 16, 2025

Company

Spitch Agents: The AI Workforce Your Business Needs

Spitch Agents presents businesses with voice Agentic AI for customer support, education, operations

Ifeoluwa Oduwaiye

May 9, 2025

Speak to the Future Africa

Our AI voice technology is built to understand, speak, and connect with Africa like never before.

Terms & Conditions