How to use text-to-speech in Kapwing

Last updated on Dec 13, 2024

You've seen viral videos on TikTok and Instagram with an automated voice that reads text, and you want to know how it's done. On Kapwing, you can automatically add a synthetic voiceover to a project with the AI Voice feature. This tutorial will explain what this feature is and how to use it to allow for more stories to be created and shared.

What is text-to-speech?
How do I generate text-to-speech on Kapwing?
How do I make text-to-speech sound more natural?
How much text-to-speech can I generate on Kapwing?
Voice Cloning
What languages does Kapwing support for text-to-speech?

What is text-to-speech?

Text-to-speech (TTS) is a technology that converts written text into spoken words using AI-generated voices. It allows users to create audio from text input, often with customizable options for voice style, pitch, speed, and tone.

On Kapwing, TTS tools allow users to generate voice overs for their projects. Users can choose from a large variety of voices with different attributes including language, gender and age (eg. American, young, male or female).

What are the uses for text-to-speech?

TTS is a versatile tool with a wide range of applications. Here are some popular ways creators and businesses leverage this technology:

Video Content Creation: Add voiceovers to YouTube videos, social media clips, and tutorials without recording your own voice.
Accessibility: Make content more inclusive by providing audio for people with visual impairments or reading difficulties.
E-Learning and Training: Create engaging audio for online courses, corporate training, and educational materials.
Marketing and Advertising: Use voiceovers in ads, promotional videos, and product demos to engage audiences.
Podcasts and Audiobooks: Generate spoken versions of written content to reach a wider audience.

Whether you're a content creator, educator, or marketer, TTS can save time and enhance the reach of your projects.

How do I generate text-to-speech on Kapwing?

Navigate to the "AI Voice" tab in the left sidebar
Enter your text you want to convert into audio
Choose your preferred voice based on the language of your text input
Note: Business and Enterprise users can use their own voice clones. Learn more below
Optionally, add a persona for an AI avatar to narrate your text-to-speech, or, select "None" if you only need audio
Click "Add layer" to generate your text-to-speech layer, which will appear on the timeline once the generation is finished

How do I make text-to-speech sound more natural?

1. Add Emphasis with Quotation Marks

Emphasizing key words can change the meaning and impact of your sentences. Wrap words or phrases in quotation marks (" ") to draw attention.

Example:

Without emphasis: What do you think?

0:00

/0.9665

With emphasis: What do “you” think?

0:00

/1.201625

This technique is great for highlighting important points or making your message more engaging.

2. Show Emotion with Punctuation

Kapwing’s AI voice responds to punctuation cues to adjust tone and intensity. Use exclamation points (!) or question marks (?) to add emotion, and double them (!!, ??) for extra emphasis.

Examples:

Neutral: What are you doing here?