Text-to-Speech With Multiple Speakers

API.audio enables you to create speech with multiple AI speakers from several different providers for dynamic and personalised audio content. Choose from our catalogue of 180+ voices from all major providers. Mix and match these speakers in your script and create beautiful audio tracks.

Getting Started

In order to use multiple speakers in your script text, you first need to create a script annotated by sections.

A script can be divided into script sections. A script section does not only help to organise and better manage your content but also makes it possible to make your content more dynamic by changing parameter settings within a script (such as: speakers, effects, speed, etc.) A section is a comment, and will be included in the script but ignored by processing.

If you do not explicitly define script sections, they will be defaulted.

A sample script with two sections can look something like this:

scriptText="<<sectiongreeting>> Hello world.<<sectionName::goodbye>> It was great to meet you."

After defining your script, you can move on to creating speech. To assign multiple speakers to each of your script sections you need to define sections parameter in the speech call. (See more here)

sections is a dictionary (key, value pairs), where the key is a section name, and the value is another dictionary with the section configuration. Valid parameters are: voice, speed, effect, silence_padding

sections={
        "question": {
            "voice": "Matthew",
            "speed": 110,
            "silence_padding": 100,
            "effect": "dark_father"
        },
        "answer": {
            "voice": "en-GB-RyanNeural",
            "speed": 100
        }
    }

📘

See more here

Example

Let's create an audio file with two different speakers in it.

1️⃣ Let's create a script with two sections
Let's name them <<sectionName::question>> and <<sectionName::answer>>. We want to use two different voices from different providers for each of these sections to create a dynamic dialogue style conversation. (More on script creation here)

text = "<<sectionName::question>> Hey do you know we support multiple voices from different providers in the same script? I am a google voice now. <<sectionName::answer>> Yes! I am a polly voice from Amazon. Great, right?"
script = aflr.Script().create(scriptText=text, scriptName="multiple_speakers")
print(script)

2️⃣ Let's turn that text into speech!
In the speech call, define the two sections you created in your script with sections parameter.
You can define voice, speed, apply an effect or even silence_padding to each section.
If a section is not found here, the section will automatically inherit the voice, speed, effect and silence_padding values you defined above (or the default ones if you don't provide them).

See an example below with 2 sections and different configuration parameters being used.

response = aflr.Speech().create(
    scriptId="id-1234",
    voice="Matthew",
    speed=100,
    effect="dark_father",
    silence_padding= 1000,
    audience=[{"username": "Elon", "lastname": "Musk"}],
    sections={
        "question": {
            "voice": "Matthew",
            "speed": 110,
            "silence_padding": 100,
            "effect": "dark_father"
        },
        "answer": {
            "voice": "en-GB-RyanNeural",
        }
    }
)

End-to-end Example

Create a script, define multiple sections, assign different configuration parameters, master it with a background track of your choice, and voilà! You have a fully produced beautiful audio track.

import aflr
aflr.api_key="APIKEY"
 
text="<<sectionName::question>> Hey do you know we support multiple voices from different providers in the same script? I am a polly voice from Amazon. <<sectionName::answer>> Yes! I am Azure voice from Microsoft. Nice to meet you!"
script = aflr.Script().create(scriptText=text, scriptName="multiple_speakers")
print(script) 
r = aflr.Speech().create(
    scriptId=script["scriptId"],
    voice="Joanna",
    speed=90,
    silence_padding=0,
     sections={
        "question": {
            "voice": "Amy",
            "speed": 110,
            "silence_padding": 1000
        },
        "answer": {
            "voice": "en-AU-WilliamNeural",
            "speed": 100,
        }
     }
)
print(r)
r = aflr.Mastering().create(scriptId=script["scriptId"], backgroundTrackId="full__deepsea.wav")
print(r)
r = aflr.Mastering().retrieve(scriptId=script["scriptId"])
print(r)

Did this page help you?