
Odunola Jenrola
Aug 13, 2025
Today, we're excited to unveil Mansa.v1, a 1.4B parameter model, a preview and the first in our series of automatic speech recognition models designed with a singular focus: delivering exceptional performance on African content. While global ASR giants have made impressive strides, they often stumble when encountering the rich linguistic diversity of African names, places, and cultural references. Mansa.v1 is one step to change this.
What Makes Mansa.v1 Different
Mansa.v1 excels at recognising African named entities within English speech. It captures every name, every place, verbatim with impressive accuracy.
Key Features at a Glance:
African named entity recognition in English contexts
Custom spelling guidance for names and specialised terms.
Sentence or Word-level timestamps for audio up to 30 minutes (25MB)
Performance Comparison
Here’s how our model stacks up against other commercial STT solutions!
Comparative Transcription Results
Original Audio Content: "After the handshake formalities, the Bayelsa governor takes the kickoff at the Samson Siaisia Stadium, where the All Stars International Football Club of Yenagoa took off to a flying start with Charles Gabriel getting the curtain raiser."
Model | Output | Named Entity Accuracy |
Whisper Large-v3 | after the handshake formalities the bayota governor takes the kickoff at the samson cscs stadium where the all-stars international football club of yanagawa took off to a flying start with charles gabriel getting the curtain raiser few minutes later | 0/4 (0%) |
ElevenLabs Scribe | After the handshake formalities, the Bayelsa governor takes the kickoff at the Sampson Siyasiya Stadium, where the All Stars International Football Club of Yenagoa took off to a flying start with Charles Gabriel getting the curtain raiser. | 3/4 (75%) |
GPT-4o Transcribe | After the handshake formalities, the Bayelsa governor takes the kickoff at the Samson Siasia Stadium where the All Stars International Football Club of Yenagoa took off to a flying start with Charles Gabriel getting the curtain raiser. | 4/4 (100%) |
Mansa.v1 | After the handshake formalities, the Bayelsa governor takes the kickoff at the Samson Siaisia Stadium, where the All Stars International Football Club of Yenagoa took off to a flying start with Charles Gabriel getting the curtain raiser. | 4/4 (100%) |
Advanced Features
Named Entity Specification: You're in Control
Mansa.v1 allows the ability to guide the model's spelling for specific terms. This is useful for content creators, journalists, and businesses working with nuanced African words like place names, personal names, and cultural concepts that language models often mistranscribe.
How It Works
To make use of this feature, simply provide a list of special words before transcription. You can access this feature either through the playground or through our SDKs. You can head over to our docs for more details on how to integrate with our SDKs.

Timestamps: Precision Down to the Second
Mansa.v1 provides word-level timestamps for audio files up to 30 minutes, making it perfect for:
Subtitle generation with precise synchronisation
Content navigation for podcasts and interviews
Quote extraction for journalists and researchers
Accessibility compliance with accurate closed captions

Timestamp Accuracy Comparison
Our timestamp alignment is consistently on par with competitors, with an average deviation of just ±0.1 seconds compared to ±0.3-0.5 seconds for other models.
Real-World Applications
Here are some of the real world applications of Mansa.v1.
Film & Media Production
Creating accurate subtitles for Nollywood films or African documentaries becomes effortless with Mansa. Simply provide cast member names and location spellings upfront, and watch Mansa.v1 ensure consistency throughout transcription.
Corporate Communications
Transcribing earnings calls or board meetings with African executives? Pre-load company names, executive names, and technical terms for flawless transcripts.
News & Journalism
Covering African elections, business news, or cultural events? Mansa.v1 ensures every politician's name, every constituency, every cultural reference is captured correctly.
Getting Started
Visit our API Documentation to get your API key and start transcribing. All new users get $1 worth of Spitch credits on sign up to try out our services.
For questions or partnership inquiries: info@spitch.app
Current Limitations
In our internal tests we noticed some scenarios where the model degrades:
Extreme Code-switching makes the model enter a repetition loop.
Extremely spontaneous Audio and overlapping speakers.
These would be addressed in our upcoming multilingual release.
What's Next
Here are some exciting features to look out for in the next version of our model.
Multi-language support: Native transcription for Yoruba, Swahili, Amharic, and more
Code Switching: Multiple languages in one sequence.
Speaker diacrization: Identifying who said what in multi-speaker scenarios
Real-time streaming: Live transcription for broadcasts and events