ElevenLabs, a voice cloning and synthesis startup founded by former Google and Palantir employees, has announced the launch of AI Dubbing. This dedicated product can translate any speech, including long-form content, into more than 20 different languages. The offering aims to revolutionize the dubbing process for audio and video content, which has traditionally been manual for years.
One of the key advantages of AI Dubbing is its ability to break language barriers for smaller content creators who lack the resources to hire manual translators. Mati Staniszewski, CEO and co-founder of ElevenLabs, highlights the potential for independent creatives, from video content creators and podcasters to film and TV studios. Staniszewski states, “We see huge potential for independent creatives – such as those creating video content and podcasts – all the way through to film and TV studios.”
High-Quality Translations with Original Voice Characteristics
The AI Dubbing feature developed by ElevenLabs promises to deliver high-quality translated audio in minutes. The tool retains the original voice of the speaker, complete with their emotions and intonation. Unlike traditional speech-to-speech translation methods that involve multiple labor-intensive steps, AI Dubbing simplifies the process for users. They can easily select the AI Dubbing tool, choose the source and target languages, and upload the content file.
Once the content is uploaded, the tool automatically detects the number of speakers and actively works on the translation. The progress bar on the screen provides users with real-time updates on the process. Upon completion, the translated and dubbed file can be downloaded and used.
“Behind the scenes, the tool works by tapping ElevenLabs’ proprietary method to remove background noise, differentiating music and noise from actual dialogue from speakers. It recognizes which speakers speak when, keeping their voices distinct, and transcribes what they say in their original language using a speech-to-text model. Then, this text is translated, adapted (so lengths match) and voiced in the target language to produce the desired speech while retaining the speaker’s original voice characteristics. Finally, the translated speech is synced back with the music and background noise originally removed from the file, preparing the dubbed output for use.”
Expanding Possibilities for Content Globalization
ElevenLabs’ AI Dubbing feature supports over 20 languages, including Hindi, Portuguese, Spanish, Japanese, Ukrainian, Polish, and Arabic. This wide language support empowers content creators to globalize their content and reach broader audiences worldwide.
Previously, ElevenLabs offered separate tools for voice cloning and text-to-speech synthesis. With the new end-to-end AI Dubbing interface, creators can seamlessly translate their audio content, such as podcasts, into different languages without the need for additional steps. The platform’s integration of voice cloning, text and audio processing, and multilingual speech synthesis enables a streamlined and efficient dubbing process.
Staniszewski confirms that while the AI Dubbing feature will be available to all users, there are some character limits similar to the text-to-speech generation feature. Approximately one minute of AI Dubbing equates to 3,000 characters.
It’s worth noting that ElevenLabs is not alone in the AI-based voicing field. Other notable players include MURF.AI, Play.ht, and WellSaid Labs. Furthermore, Meta recently launched SeamlessM4T, an open-source multilingual foundational model capable of understanding nearly 100 languages and generating real-time translations from speech or text.
According to Market US, the global market for AI-powered voice and speech synthesis tools reached $1.2 billion in 2022. It is projected to grow exponentially, estimated to touch nearly $5 billion by 2032, with a compound annual growth rate (CAGR) of slightly above 15.40%.