Lexi Pronunciation Dictionaries
Make our AI systems speak more like a human :)
Often when working with TTS, the models can fail to accurately pronounce specific words. Commonly mispronounced words often include brands, names, or locations - however, sometimes even very common words will be off. This has to do with voice modeling but luckily there's an easy fix within api.audio.
Introducing Lexi
In api.audio you can create dictionaries and save alternate pronunciations for specific words in them. Every time the word comes up again it will be fixed by Lexi.
A word of caution for all use cases:
Whenever you change the pronunciation of a word, please be aware that the synthetic voice will now always apply the new rule you've created. For example, if you apply a fix for the word "wind" (as in the flow of air) in order to make it sound like wind (for example wind up a clock), this might create issues down the road.
Also, there is no distinction between lower and uppercase. This is particularly important when working with acronyms.
Before we dive into the code, let's have a look at why you'll want to use this.
Use cases - What do you need this for?
1. Fixing a single word
This is the most common use case. All you need to do is to re-write your word in the way you'll want the voice to pronounce it. For example, the Irish female name "Aoife" is pronounced "Eefa" by an English voice or "Ihfa" by a German speaker.
2. Fixing a single word in a foreign language
Quite often we deal with foreign names, brands, or expressions. An example would be the car maker "Porsche". The correct way to pronounce that word would be "Porr shay" using an English voice. Note that you can have several dictionaries per language.
3. Fixing Acronyms
There are two possible issues to fix when it comes to abbreviations. Most synthetic voices will already do the first one for you and read "CIA" as "C I A" instead of "sia". However, if for whatever reason that shouldn't be the case you could just replace "CIA" with "C I A". The second, more common issue is when you have an acronym and always want it to be read as full words. For example, you could replace the acronym "AP" with "Associated Press". However, keep in mind that whenever the words "AP" (lower and upper case!) occur in any text, it will from now onwards be read as "Associated Press".
4. Words with more than one pronunciation
Most voice models will be able to distinguish the most basic cases but every once in a while, you might come across this issue. For example, take the French city of "Nice" and the word "nice". In order to make this work you'll need a slightly different approach because you'll need to determine if the city or the adjective is meant. More on that later.
5. Fixing multiple words
As of today, you can only fix single words. However, we're working on a solution for this.
Often, when it comes to names or very specific groups of words the default pronunciation of the two of them together might be wrong. Let's say a person is called "Adam Wind" (as in the flow of air) and your voice will pronounce that as "Adam Wind" (as in winding up a clock). You'll only want this correction to happen whenever the words "Adam" and "Wind" are occurring in that order. Otherwise, your correction might end up creating other issues in the future.
6. Using phonemes (IPA) to fix words
This is an advanced way to create correct pronunciations of specific words. Instead of characters, you can also use the International Phonetic Alphabet (IPA), which is a lot more versatile as it includes all possible sounds that can be made (with certain limitations). This is particularly useful when you want foreign words to be pronounced correctly. However, it's often hard to find the (correct) phonetic spelling of words, and building it requires linguistic expertise. We recommend not surfacing this with your users unless they are trained linguists.
How does it work?
Pronunciation dictionary methods are:
list()
Lists the publicly available dictionaries and their words
Parameters:
- none
# returns a list of public dictionaries
dictionaries = apiaudio.Lexi.list()
list_custom_dicts()
Lists the custom dictionaries and their respective words
Parameters:
- none
# returns a list of custom dictionaries
types = apiaudio.Lexi.list_custom_dicts()
Parameters:
- lang [required] (string) - Language family, e.g. en or es - use global to list language agnostic words.
# lists all words in the dictionary along with their replacements
words = apiaudio.Lexi.list_custom_words(lang="en")
register_custom_word Adds a new word to a custom dictionary.
- lang [required] (string) - Language family, e.g. en or es.dictionary - use global to register a word globally.
- word [required] (string) - The word that will be replaced
- replacement [required] (string) - The replacement token. Can be either a plain string or a IPA token.
- contentType [optional] (string) - The content type of the supplied replacement, can be either basic (default) or ipa for phonetic replacements.
specialization [optional] (string) - by default the supplied replacement will apply regardless of the supplied voice, language code or provider. However edge cases can be supplied, these can be either a valid; provider name, language code (i.e. en-gb) or voice name.
# correct the word sapiens
r = apiaudio.Lexi.register_custom_word(word="sapiens", replacement="saypeeoons", lang="en")
print(r)
In order to "switch on" the Lexi dictionary, add this to your speech call(s):
speech = apiaudio.Speech.create(scriptId=script["scriptId"], voice=voice, useDictionary=True)
Words with more than one pronunciation
Our lexi flag works in a similar way to SSML. For example, adding <!peadar> instead of Peadar (who is one of our founders) to your script will cause the model to produce an alternative pronunciation of this name. This is particularly useful in cases where words can have multiple pronunciations, for example, the cities ‘Reading’ and ‘Nice’. In this instance placing <!nice> will ensure that these are pronounced correctly, given the script:
"The city of <!nice> is a really nice place in the south of france."
Full example (German) including input (before) and output (after) audio files to check pronunciation:
import apiaudio
# set your api key here
#apiaudio.api_key = "paste your api key here!"
# inputs
problematic_word = "TabulaRaaza"
replacement_word = "Tabuhla Rahsa"
lang = "de"
voice = "matthias"
# register this word in the dictionary
r = apiaudio.Lexi.register_custom_word(word=problematic_word, replacement=replacement_word, lang=lang, specialization=voice)
print(r)
# generate a preview of this
preview_text = f"Bisherige Aussprache: {problematic_word}. Neue Aussprache: {replacement_word}."
script = apiaudio.Script.create(scriptText=preview_text)
print(script)
speech = apiaudio.Speech.create(scriptId=script["scriptId"], voice=voice)
print(speech)
r = apiaudio.Speech.download(scriptId=script["scriptId"])
print("Downloaded for preview: ", r)
# uncomment this line to delete a word
#apiaudio.Lexi.delete_custom_word(word=problematic_word, lang=lang)
# in production make sure you set useDictionary to be true
#speech = apiaudio.Speech.create(scriptId=script["scriptId"], voice=voice, useDictionary=True)
Beware of our precedence
For each language, only a single word entry is permitted. However, each word can have multiple specializations. When a word is first registered a default specialization is always created, which will match what is passed in. Subsequent calls with different specializations will only update the given specialization.
The exact replacement that will be used is determined by the following order of preference:voice name > language dialect > provider name > default
For example, a replacement specified for voice name sara will be picked over a replacement specified for provider azure.
python list_custom_words()
Lists all the words contained in a custom dictionary.
Updated almost 2 years ago