
Ifeoluwa Oduwaiye
Apr 8, 2025
Introduction
Diacritics refers to the glyphs or little marks added to letters. The term originated from a Greek word meaning “to distinguish.” And that makes a lot of sense since diacritics distinguish the pronunciation of letters in different languages. Languages that have these glyphs are called tonal languages, while those without these glyphs are called non-tonal languages.

Source: Merriam-Webster
Diacritics play a huge role in written language and distance communication. The same word (that is, formed from a combination of letters) can have diverse meanings based on the glyphs that appear on it. This is seen in languages like Yorùbá, where, with different diacritics, "owó" means "money," while "òwò" means "trade." Without these glyphs, errors could occur, which can be very costly.
In this blog, we will discuss the importance of diacritics in African languages, the challenges facing diacritics, and how linguists, developers, and enthusiasts can use the tone-marking feature on Spitch to restore diacritics.
A Brief History of Diacritics in the Digital Space
With over 67.9% of the world’s population using the internet, a large chunk of distance communication occurs digitally. This trend has an adverse effect on language diacritics. Most computer keyboards are designed to be configured for specific languages, which can introduce limitations when implementing diacritics and other specialized glyphs.
Unlike handwriting, where adding diacritics is straightforward and intuitive, digital keyboards often require additional steps, such as using key combinations or switching language settings. These restrictions not only slow down typing but can also discourage consistent use of diacritics, which may lead to their gradual omission in digital communication.
The American Standard Code for Information Interchange (ASCII) is a character encoding standard that represents text in computers by assigning each character a unique numerical value. ASCII, which supports only 128 characters, which include the Latin alphabet, numbers, and the alphabet, has limited the representation of diacritics in digital systems and documents. Developers often omitted diacritics in digital texts to maintain compatibility, and even up till now, many legacy systems still remain ASCII-based.
The Unicode Consortium has been at the forefront of standardizing digital text representation since its inception in 1991. It currently supports thousands of characters, including extensive diacritic marks essential for accurately representing global languages, and is continuously being updated to include more languages.
The African Network for Localization (ANLoc) emerged around 2012 as a collaborative initiative aimed at bridging the digital language gap in Africa. ANLoc has worked closely with international standards bodies like the Unicode Consortium to incorporate diacritic support for numerous African languages into digital platforms. Their efforts have significantly enhanced the digital representation of African scripts, ensuring that local languages are accurately and respectfully maintained online.
Another such company is Spitch, a language company that is very passionate about the inclusion of African languages in the tech ecosystem. Since its inception in 2022, Spitch has developed 4 features—tone marking or diacritization, speech generation, speech transcription, and language translation—that currently support 4 African languages (Hausa, Ìgbò, Yorùbá, and English), with more expansions coming soon.
Diacritics are crucial in many African languages for indicating tone and meaning. However, many existing speech and text processing systems overlook these critical markers, leading to misinterpretations and loss of meaning. By developing models that accurately handle diacritics, Spitch ensures that speech-to-text and text-to-speech applications can faithfully represent the nuances of African languages.
The Importance of Diacritics in African Languages
In tonal languages, diacritics play a crucial role in defining the meaning and pronunciation of words. Without diacritics, the same sequence of letters can become ambiguous, leading to confusion in spoken and written communication. This is particularly significant in languages like Igbo, where tone is integral to conveying the intended message.

For example, in Ìgbò, the word "ákwa" (with a high tone) can mean "cry," while "àkwà" (with a low tone) might refer to "cloth" or "egg." Removing or misplacing diacritics in such cases can result in a loss of essential meaning and even lead to miscommunication. Here are some key reasons why diacritics play an important role in African languages:
Preserving Meaning and Avoiding Ambiguity: Many African languages use diacritics to distinguish words that are otherwise spelled the same. For example, in Yorùbá, òjó (rain) and ójò (Ojo, a name) have different meanings. Without diacritics, sentences can become unclear or misleading.
Accurate Pronunciation and Phonetics: Diacritics indicate tone and pronunciation, which are crucial in tonal languages like Ìgbò, Hausa, and Yorùbá. Mispronunciation due to missing diacritics can lead to communication breakdowns or misunderstandings.
Effective Language Processing in AI and NLP Applications: For AI-based transcription and translation tools, diacritics help reduce the ambiguity in words and improve natural language processing (NLP) accuracy. Without them, automatic systems can misinterpret context, leading to incorrect translations or speech recognition errors.
Enhancing Language Documentation and Preservation: Many African languages rely on diacritics for their written form. If diacritics are lost in digital communication and AI processing, it can hinder the preservation of these languages and impact future generations' ability to learn them properly.
Challenges in Diacritics Restoration
There are many challenges facing the representation of diacritics in this digital world. For instance, many speech-to-text systems ignore diacritics because many models and training datasets focus on high-resource languages where these marks are not consistently used.
Additionally, reconstructing diacritics from spoken language requires sophisticated models that can grasp context and tone. This need increases the computational complexity and data requirements of digital systems. Some other challenges facing diacritics restoration in today’s digital world are:
Loss of Diacritics in Text-Based Digital Communication: Most standard keyboards and digital platforms do not support diacritics by default, leading to their omission in everyday typing and messaging. Over time, this contributes to their decline in written communication.
Limited Availability of Large Annotated Datasets: Unlike English and other widely spoken languages, African languages often lack extensive, high-quality datasets with diacritic annotations. This makes training AI models for diacritic restoration more challenging.
Complexity of Tone Marking and Contextual Ambiguity: Unlike simple spelling corrections, diacritic restoration requires understanding tonal variations and context. AI models must analyze entire sentences to determine the correct diacritics, which is computationally intensive.
Variability Across Dialects and Regional Pronunciations: Many African languages have multiple dialects with slight tonal variations. A diacritic restoration model trained on one dialect may struggle with another, making universal restoration more difficult.
How to use the Tone Marking Feature on Spitch
There are two ways to make use of the tone marking feature on Spitch; through the studio or through our API. If you’re a developer, you would want to gravitate toward the API option, while non-technical personnel would favour the studio access option. In this section, we will be explaining how you can utilize the feature through both options.
Through the Studio
To get started with tone marking, head over to the studio.spitch.app and sign up on our platform. All new users get $1 worth of credits to try out our services, so you don’t have to worry about paying upfront. Once you’re done registering, you should land on the starting page, where you can select the Diacritics option from the right-hand panel, as shown below.

Select your preferred language and enter your text in the text box. Finally, click the Transcribe button to generate the markings on your text. And that’s it!

Through the API
To get started with our tone marking API, head over to our docs here to get acquainted with our services. For this step, you would need to have a Spitch API key. To generate yours, head to our developer studio here, sign up, and save your API key in a secure place. Our security measures allow the full API key to be only shown once, so save it securely. If you lose the key, you can always generate a new one, so don’t fret.
If Python is your vibe, you can make use of the sample code below.
For developers who are not familiar with Python, we support cURL, Python, JavaScript, PHP, Go, and Java. Head over to our API references to get started.
Conclusion
Diacritics are crucial in many African languages as they indicate tone and meaning, ensuring accurate communication and preserving linguistic heritage. However, most speech-to-text systems overlook these subtle markers due to technical challenges in reconstructing them from audio signals.
Spitch addresses this problem with its powerful and effective tone-marking feature, specifically designed to capture and restore diacritics accurately, thus preserving the integrity of African languages in digital communication. Embracing Spitch’s innovative technology not only enhances customer engagement and data quality but also supports cultural preservation.
Try Spitch’s tone marking feature today and experience the difference in accuracy and inclusivity.