Voice Intelligence

Normalization of speech is a vital ingredient in converting text into audio, especially when creating content at scale. Unfortunately, it doesn't work the same way across different speech providers. Voice Intelligence fixes that.

What is normalization?
In short, normalization converts text items, such as symbols into words. When we write, we use specific placeholders and convert them into the right words in our head or when we're reading out loud.

Here are some things that require normalization when converted into speech:

Abbreviations
Measurements
Time formats
Data formats
Currency formats
Fractions
Number formats (cardinal: 1, 2.., 1,234)
Order formats (1st, 2nd, 3rd)

How are different voice providers handling this?
You would think voice providers would take normalization into account, and to be fair, they try. However, it's not consistent (not even across different voices from a single provider).

What does voice intelligence do?
When working with api.audio we want you to have the same speech output, no matter the provider and voice.
Voice Intelligence basically creates normalization or it overwrites the existing normalization. The result is a consistent pronunciation of text that includes symbols or commonly used regular expressions.

How can you use voice intelligence in api.audio?
To use Voice Intelligence all you need to do is add a parameter to your speech call and set it to TRUE. Here is how it works:

# Creates text to speech, multispeaker + audience
r = apiaudio.Speech().create(
  scriptId=script["scriptId"],
  voice="liam",
  speed=100,
  silence_padding=0,
  useTextNormalizer=True)

Note: At the time of writing (27 July 2022), voice intelligence is only available in German. We'll update this page as new languages, most and foremost English, have been added.

Want to learn more? Read this blogpost.