Use Cases

Blog

Voice chat

API Doc

Get Started for free

Use Cases

Blog

Voice chat

API Doc

Get Started for free

Use Cases

Blog

Voice chat

API Doc

Get Started for free

Use Cases

Blog

Voice chat

API Doc

Get Started for free

Back to blog

Research

Building AI-Powered Speech Transcription App: A Step-by-Step Guide

Learn how to build an AI-powered speech transcription app for real-time audio-to-text conversion using Spitch.

Ifeoluwa Oduwaiye

Mar 14, 2025

Introduction

The global speech transcription market has experienced immense growth with the rise of the adoption of AI (Artificial Intelligence) and is estimated to exceed $56 billion by 2030. Over the years, we have gone from manually converting text in one language to another, to having Microsoft Teams automatically transcribe meeting audio. Awesome, right?

But there is a major issue with these commercial systems. They focus on the more profitable high-resource languages and ignore the low-resource ones. This leaves a lot of people dissatisfied and unreachable by businesses.

Spitch is tackling this challenge by creating small language models for African languages. It now offers a feature that transcribes audio in Yoruba, Igbo, Hausa, and English, with more languages coming soon. These models are built for businesses looking to connect with African audiences and developers who want to develop more inclusive applications using a reliable API.

This blog is here to guide developers on how to create a speech transcription service or application using Spitch.

Pre-requisites

This blog is designed for all developers regardless of their experience level but there are some specific things you would need to participate in this demonstration.

Intermediate knowledge of Python programming
A Streamlit account
A GitHub account
Any IDE of your choice like VS Code, etc

Development Workflow

To build our speech transcription app, we’ll be going through the following steps:

Setting up the Python environment
Develop the application with Streamlit
Deploy the application to Streamlit Community Cloud

Setting up the Python environment

A Python environment would be needed to install all the requirements for this project. Either a conda or virtual environment would suffice, but if you are a bit unsure of how to create a Python environment, please check out this tutorial over here. The modules we’ll be needing for this project are:

Spitch
Streamlit
Dotenv (Optional for handling API keys)

You can install these three modules at once by running:

pip install spitch streamlit dotenv

Once that’s done, you can head over to the next part of this tutorial, writing the code for the application.

Develop the application with Streamlit

In this part of this part of the tutorial, we will be developing the Streamlit application that handles both the backend and UI of our speech transcription app. Streamlit was chosen for its simplicity and ease of use. Feel free to make use of any other module you prefer such as Flask, Django, or FastAPI.

The first thing we’ll want to do is to integrate the Spitch API. If you haven’t yet done so, head to spitch.app, sign up and save your API keys in a secure place. You’ll need that later.

Spitch Playground

Once that’s done, we can write the code to accept input audio from a user and the audio language, process and transcribe it using Spitch, and output the transcribed text to the user. You can find more details about the transcribe function in our documentation here.

# Import the required modules
import streamlit as st
from spitch import Spitch
import os
import tempfile


def transcribe_audio(audio_file, lang):
    # Instantiate a Spitch client
    os.environ["SPITCH_API_KEY"] = "YOUR-API-KEY"
    client = Spitch()


    # Save the uploaded audio file to a temporary file
    with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as temp_file:
        temp_file.write(audio_file.read())
        temp_path = temp_file.name


    # Transcribe the uploaded audio file to text using the Spitch transcribe function
    with open(temp_path, "rb") as f:
        response = client.speech.transcribe(
            language=lang,
            content=f.read()
        )
    return response.text


def main():
    st.title("My Spitch Transcription App")
    st.write("Upload an audio file to transcribe it to text.")
    audio_file = st.file_uploader("Upload Audio", type=["wav", "mp3"])
   
    if audio_file is not None:
        st.audio(audio_file, format='audio/wav')
        st.write("Select the language of the audio:")
        language = st.selectbox("Language", ["English", "Yoruba", "Igbo", "Hausa"])


        # Encode selected language
        lang = {'English': 'en', 'Yoruba': 'yo', 'Igbo': 'ig'}.get(language, 'ha')


        if st.button("Transcribe"):
            with st.spinner("Transcribing..."):
                transcript = transcribe_audio(audio_file, lang)
                st.success("Transcription completed!")
                st.text_area("Transcript", transcript, height=200)
               
                # Make the transcript downloadable as a txt file
                with open("transcript.txt", "w", encoding="utf-8") as f:
                    f.write(transcript)
                st.download_button("Download Transcript", data=transcript, file_name="transcript.txt", mime="text/plain")


if __name__ == "__main__":
    main()

Deployment

Streamlit streamlines the deployment process for developers into four easy steps listed below.

Deploy speech transcription app on Streamlit cycle

Speech Transcription App Deployment Cycle

Create a Streamlit account: Sign up on Streamlit Community Cloud to access free app hosting. This will allow you to deploy and share your speech transcription app with ease.
Host your code on GitHub: Push your project files, including your Python script and requirements file, to a public or private GitHub repository. This ensures that Streamlit can access and deploy your app directly from GitHub.
Connect your account to GitHub: Link your Streamlit account to GitHub by granting necessary permissions. This enables you to select the repository containing your speech transcription app for deployment.
Deploy your app: In the Streamlit interface, choose your GitHub repository, branch, and main script file, then click Deploy. Streamlit will automatically set up your app, making it accessible via a shareable URL.

If you need any extra guidance while deploying your application to Streamlit, feel free to check out their documentation here.

Some Real-world Applications of Speech Transcription Apps

Now that you have the technical foundation to build a speech transcription application with Spitch, here are some practical use cases where low-resource language transcription applications can be applied or optimized for real-world impact.

Education & E-Learning: With Spitch, texts, literature, and academic materials can be automatically transcribed into native languages to make education more accessible for African learners.
Media & Journalism: This application can also allow news organizations to transcribe and translate interviews, speeches, and reports from and into local languages.
Customer Support & Call Centers: Speech transcription applications can enhance multilingual customer service by automating real-time audio transcription and analysis of conversations in local dialects, thus enabling businesses to reach a wider market with their products.
Healthcare & Telemedicine: Have you ever considered how much important medical information gets lost because doctors and patients don't speak the same language? Speech transcription apps can help convert speech into text, making it easier for doctors to communicate with non-English-speaking patients and keep accurate medical records.
Accessibility for the Deaf & Hard of Hearing: Accessibility has always been an issue of concern for most organizations. Large organizations invest huge amounts of money to ensure their products are accessible to their target audience but it still isn’t enough. STT applications can provide real-time captioning in local languages and promote the inclusivity of auditory-impaired people around the world.

Common Challenges in Speech Transcription

When building speech transcription applications or features within applications, here are a few things you should look out for as a developer.

Dialects & Variability: Many African and indigenous languages have multiple dialects with significant pronunciation differences. These dialects often vary by location and could pose difficulty when training and getting inferences from these small language models.
No Standardized Spelling: Some languages are primarily oral, making consistent text representation difficult. For example, the name Timileyin can be spelt as Timilehin, but both variations have the same meaning.
Noisy Environments: Poor audio quality and background noise affect audio transcription accuracy, especially in real-world use cases. You should account for this possibility through noise filter systems.
Language-Switching: In the real world, speakers often mix languages (e.g., English and Yoruba), making it harder for language models to transcribe speech accurately. Spitch is tackling this head-on with our advanced ASR (Automatic Speech Recognition) technology for African languages. We’re also developing multilingual speech-to-text models and automated language detection, allowing users to seamlessly transcribe audio with multiple languages without manual language selection.

Concluding Remarks

To wrap up, building a speech transcription app with Spitch opens up a world of possibilities for developers and businesses looking to create more inclusive and accessible applications. With support for African languages like Yoruba, Igbo, and Hausa, Spitch is breaking barriers and ensuring more people can interact with technology in their native tongue.

Now, it’s your turn to bring this vision to life! Whether you’re building a real-time audio transcription service, integrating speech-to-text into an existing app, or exploring new AI-powered solutions, the Spitch API is designed to make the process seamless. Head over to our developer portal to get started. The code used in this demo can be accessed here.

Explore more

Research

Top Speech Generation Models for Agentic AI Use Cases

Speech generation can make or break your Agentic AI solution. Explore popular TTS models, metrics

Ifeoluwa Oduwaiye

May 16, 2025

Company

Spitch Agents: The AI Workforce Your Business Needs

Spitch Agents presents businesses with voice Agentic AI for customer support, education, operations

Ifeoluwa Oduwaiye

May 9, 2025

Resources

Making Media Smarter: How Real-Time Speech Tech is Changing Broadcasting

Explore speech AI for media & broadcasting firms like captioning, transcription & speech generation

Ifeoluwa Oduwaiye

Apr 15, 2025

Research

Top Speech Generation Models for Agentic AI Use Cases

Speech generation can make or break your Agentic AI solution. Explore popular TTS models, metrics

Ifeoluwa Oduwaiye

May 16, 2025

Company

Spitch Agents: The AI Workforce Your Business Needs

Spitch Agents presents businesses with voice Agentic AI for customer support, education, operations

Ifeoluwa Oduwaiye

May 9, 2025

Research

Top Speech Generation Models for Agentic AI Use Cases

Speech generation can make or break your Agentic AI solution. Explore popular TTS models, metrics

Ifeoluwa Oduwaiye

May 16, 2025

Company

Spitch Agents: The AI Workforce Your Business Needs

Spitch Agents presents businesses with voice Agentic AI for customer support, education, operations

Ifeoluwa Oduwaiye

May 9, 2025

Speak to the Future Africa

Our AI voice technology is built to understand, speak, and connect with Africa like never before.

Terms & Conditions