Building voice agents has become significantly easier with our new integration with Livekit. What does this mean for developers? Well, you can now develop and deploy voice agents in Yoruba, Igbo, Hausa, Amharic or English in less than 5 minutes.
You can now bring natural-sounding, African voice capabilities to your Livekit applications in minutes. In this blog, we’ll guide you on how to integrate Spitch STT (Speech-to-Text) and TTS (Text-to-Speech) models into LiveKit. But before we dive in, here are some prerequisites you’ll need to follow along.
Prerequisites
A Spitch API key
Livekit API key, API Secret Key, and URL
A Python IDE
Optional: An OpenAI LLM API Key
Setting up your environment variables and modules
Install the following Python packages on your local computer
pip install spitch \
"livekit-agents[deepgram,openai,cartesia,silero,turn-detector]~=1.0" \
"livekit-plugins-noise-cancellation~=0.2" \
"python-dotenv"
Set up your environment variables with the following API secrets
SPITCH_API_KEY=<Your Spitch API Key>
LIVEKIT_API_KEY=<your API Key>
LIVEKIT_API_SECRET=<your API Secret>
LIVEKIT_URL=<your URL>
OPENAI_API_KEY=<Your OpenAI API Key>
Speech‑to‑Text (STT) Integration
Spitch STT currently supports 5 African languages with more languages coming really soon. To get started with implementing our STT API into Livekit, use the code sample below, and feel free to check out our docs for more information.
from dotenv import load_dotenv
from livekit import agents
from livekit.agents import AgentSession, Agent, RoomInputOptions
from livekit.plugins import (
spitch
)
load_dotenv()
class Assistant(Agent):
def __init__(self) -> None:
super().__init__(instructions="You are a helpful voice AI assistant.")
async def entrypoint(ctx: agents.JobContext):
session = AgentSession(
stt=spitch.STT(language="en")
# , llm, tts, ...
)
await session.start(
room=ctx.room,
agent=Assistant(),
room_input_options=RoomInputOptions(
# LiveKit Cloud enhanced noise cancellation
# - If self-hosting, omit this parameter
# - For telephony applications, use `BVCTelephony` for best results
noise_cancellation=noise_cancellation.BVC(),
),
)
await ctx.connect()
await session.generate_reply(
instructions="Greet the user and offer your assistance."
)
if __name__ == "__main__":
agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))
Text‑to‑Speech (TTS) Integration
Spitch TTS currently also supports 5 African languages, and 22 voices with more languages and voices coming soon. To get started with implementing our TTS API into Livekit, use the code sample below, and feel free to check out our docs for more information.
from dotenv import load_dotenv
from livekit import agents
from livekit.agents import AgentSession, Agent, RoomInputOptions
from livekit.plugins import (
spitch
)
load_dotenv()
class Assistant(Agent):
def __init__(self) -> None:
super().__init__(instructions="You are a helpful voice AI assistant.")
async def entrypoint(ctx: agents.JobContext):
session = AgentSession(
tts=spitch.TTS(language="en", voice="kani")
# , llm, tts, ...
)
await session.start(
room=ctx.room,
agent=Assistant(),
room_input_options=RoomInputOptions(
# LiveKit Cloud enhanced noise cancellation
# - If self-hosting, omit this parameter
# - For telephony applications, use `BVCTelephony` for best results
noise_cancellation=noise_cancellation.BVC(),
),
)
await ctx.connect()
await session.generate_reply(
instructions="Greet the user and offer your assistance."
)
if __name__ == "__main__":
agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))
Integrating a STT-LLM-TTS pipeline with Spitch, OpenAI, and Livekit

This pipeline is going to take in voice as input, use the Spitch TTS to transcribe the audio. The generated transcript would be passed to an LLM for processing through predefined prompts. The output is then converted to audio using the Spitch TTS.
This pipeline can be used to develop multilingual voice agents that can be applied for a lot of use cases by simply changing the LLM prompt and language. To implement this pipeline, make use of the code below.
import asyncio
from asyncio import windows_events
# Force the SelectorEventLoopPolicy
asyncio.set_event_loop_policy(windows_events.WindowsSelectorEventLoopPolicy())
from dotenv import load_dotenv
from livekit import agents
from livekit.agents import AgentSession, Agent, RoomInputOptions
from livekit.plugins import (
spitch,
openai,
noise_cancellation,
silero,
)
from livekit.plugins.turn_detector.multilingual import MultilingualModel
load_dotenv()
class Assistant(Agent):
def __init__(self) -> None:
super().__init__(instructions="You are a helpful voice AI assistant.")
async def entrypoint(ctx: agents.JobContext):
session = AgentSession(
stt=spitch.STT(language="en"),
llm=openai.LLM(model="gpt-4o-mini"),
tts=spitch.TTS(language="en", voice="kani"),
vad=silero.VAD.load(),
turn_detection=MultilingualModel(),
)
await session.start(
room=ctx.room,
agent=Assistant(),
room_input_options=RoomInputOptions(
# LiveKit Cloud enhanced noise cancellation
# - If self-hosting, omit this parameter
# - For telephony applications, use `BVCTelephony` for best results
noise_cancellation=noise_cancellation.BVC(),
),
)
await ctx.connect()
await session.generate_reply(
instructions="Greet the user and offer your assistance."
)
if __name__ == "__main__":
agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))
Next Steps & Resources
Concluding Remarks
Now that you are armed with all the resources needed to build voice agents with Spitch, it's time to put that knowledge into production! Feel free to experiment with the different voices and languages on our platform, and don't be a stranger to our docs. We can’t wait to see what you build!