The Basics

The latest generation of AI models can transform written text into high quality speech and offer a lot of possibilities to make speech expressive. However, it involves specific knowledge (for example understanding machine learning) and careful consideration to make synthetic speech sound convincing for any given use case: it often requires careful annotation, parameter tuning, content editing to account for inefficiencies in the voice, as well as additional processing of the speech produced.

Api.audio not only offer a wide variety of different speech models from various voice providers, but also offers guardrails to make the production of synthetic speech robust and scalable. This has been honed over nearly 2 years of development, and over 20 user research interviews.

API.audio also makes it easy to create a frontend for your user to interact with synthetic speech, assures quality for well known TTS challenges such as the pronunciation of names or to directly connect to an established source to minimise failure rate.

Create speech from text script

When to create a script

We strongly recommend creating a script and rendering speech from that script. This means - rather than using the api.audio text-to-speech API, you first create a script using the script API, and then issue a production request creates speech or even a fully produced audio track based on this script .


  • Additional Guard rails
  • Additional Annotation: Makes it possible to switch speakers within the same track and access batteries-included parameters to personalise content or make it dynamic.
  • Logically decouple content from speech creation.
  • Make the connection between a script and SoundDesign template which allows you to automate production.
  • You can use our input connectors to conveniently create content directly from a source.
  • You can use api.audio’s tooling and best practices to build a front end that allows your users craft great sounding audio from text.

You can create a script with the ''create script'' API call

Synchronous and Asynchronous text-to-speech

What’s Next

Start Building

Did this page help you?