Coqui

Introducing Prompt-to-Voice - Describe It to Hear It

Eren Gölge and Kelly Davis

Feb 17, 2022

Table of Contents

Coqui Studio: The Studio for Generative AI Voice
What is Prompt-to-Voice?
How Does Prompt-to-Voice Work?
Why is Prompt-to-Voice Useful?
Efficiency
Give It a Try

Coqui Studio: The Studio for Generative AI Voice

Coqui Studio is an AI voice directing platform that allows users to generate, clone, and control AI voices for video games, audio post-production, dubbing, and more. It features a large set of generative AI voices, an advanced editor for tuning each voice, tools for managing projects & scripts, and tons of tools for editing timelines, all to help users streamline their workflow. We’re now beta-launching “prompt-to-voice”, an unprecedented way to generate unique voices.

What is Prompt-to-Voice?

Prompt-to-voice is a feature that allows users to generate realistic and expressive AI voices from natural language prompts. A prompt is a short text that describes what kind of voice or speech you want to generate. For example:

An older man with a British accent and a pleasing, deep voice.

A female senior with a guttural voice and an American accent. Their voice should be hoarse and have a boisterous intensity.

A female young adult with a childlike voice and an American accent. Their voice should be polished and breathy and have a ringing but feathery intensity with a calming delivery.

By using prompts, users can easily specify the characteristics and style of desired voice. Coqui Studio uses state-of-the-art generative AI technology to turn prompts into high-quality AI voices that are ready for your project.

How Does Prompt-to-Voice Work?

Prompt-to-voice works by using generative AI models that are trained on large amounts of speech data. These models learn how to map text inputs to a space of possible voices. Then they can use this mapping to generate novel and diverse AI voices that match the input specifications.

One way to understand how prompt-to-voice works is by making an analogy to image generation models like Dall-E or Midjourney. These models can generate realistic and creative images from natural language prompts. Similarly, prompt-to-voice can generate realistic and creative voices from natural language prompts.

Why is Prompt-to-Voice Useful?

Prompt-to-voice is useful because it enables users to create unique and customized AI voices for various purposes. Some of the benefits of prompt-to-voice are:

Creativity

Prompt-to-voice allows users to explore their creativity and imagination by generating voices that are not limited by existing recordings or templates. Users can experiment with different combinations of attributes, styles, characteristics, etc., and discover new possibilities for their projects.

Efficiency

Prompt-to-voice saves users time and effort by generating voices instantly from text inputs. Creators do not need to spend hours casting the right voice for their content and avoid fees that are attached to it.

Flexibility

Prompt-to-voice gives users more control and flexibility over their voice outputs. Users can easily modify or update their prompts according to their needs or preferences. They can also adjust parameters such as pitch, speed, volume, etc., to fine-tune their results once they find the voice they desire.

Give It a Try

Sign up for Coqui Studio (you get 30 free minutes when you sign up) and start using prompt-to-voice in your next project (If you have feedback, you can find us on Discord)

Table of Contents

Coqui Studio: The Studio for Generative AI Voice
What is Prompt-to-Voice?
How Does Prompt-to-Voice Work?
Why is Prompt-to-Voice Useful?
Efficiency
Give It a Try