Create your own avatar to free yourself from an editing studio! Translate the content of all your videos without re-recording! ... The promises of artificial intelligence (AI) in the world of video are numerous and above all attractive. But what is the reality? Behind their fine promises, are AIs up to the task of producing quality videos?
Our expert Lotfi deciphers a new AI use case for you: Is it really possible to create videos in different languages with just a few clicks, and how?
In an era where new solutions based on artificial intelligence are emerging daily, interest in automated video production is proving crucial.
Faced with this situation, the use of digital avatars and voice cloning in video creation represents an interesting solution. I believe it's a technological innovation that could radically transform viewer engagement.
I find that being able to create super-personalized videos that speak directly to viewers in their own language and in a tailored way opens up a whole new field of possibilities. This means that communication can become incredibly targeted, reaching people in a more effective and personal way.
From a practical point of view, for media outlets and companies alike, the appeal of reducing costs and production time while increasing the ability to mass-produce content is undeniable. The ability to forge closer links with audiences is also crucial.
Imagine being able to express yourself, not just through words or images, but through a virtual version of yourself or your brand. A version capable of speaking, interacting and even arousing emotions.
This is precisely what digital avatars offer, marking the beginning of an era where your digital presence can be as rich and nuanced as your physical presence.
Heygen caught my attention while I was looking for tools to create avatars, following the discovery of its potential through YouTube videos. The synchronization between voice and video, both realistic and entertaining, persuaded me to give it a try.
To create an avatar on HeyGen, you first need to film yourself talking to the camera for at least two minutes. The instructions given must be followed precisely: use a high-resolution camera and record in a quiet, well-lit location. It's important to look directly into the camera, to pause between sentences, and to avoid any movement of the hands above the torso.
It's also advisable to avoid any cuts in the video, background noise or shadows on the face. The process aims to capture expressions as accurately as possible, for a natural, expressive avatar.
Consent will be required via webcam and microphone registration. Once given, HeyGen will create your digital avatar in just a few minutes.
Your avatar can then be used directly in the tool's studio. The tool can generate the voice from a text you supply, or clone the voice from an audio or video file.
However, some adjustments were necessary. The avatar may exaggerate certain head movements or look away, necessitating touch-ups to ensure optimum visual continuity. On the vocal front, text-based synthesis produces a less natural sound than that obtained from direct audio recording.
The quality of the text-to-speech offered by Heygen didn't fully meet my expectations, which encouraged me to look for other solutions for voice duplication. After exploring various options, I discovered ElevenLabs, a text-to-speech service. To use it, all I had to do was provide a few audio clips of me discussing various subjects, so that they could analyze and faithfully reproduce the sound of my voice.
The process at ElevenLabs involves using these extracts to generate a digital imitation of my voice, capable of verbalizing any text I propose. This innovation greatly simplifies my projects, particularly in the production of videos or tutorials, by eliminating the need to record my voice for each new piece of content.
ElevenLabs offers two methods of voice generation: an instant option, requiring just over a minute of voice samples, and a more elaborate option, requiring up to a month of development for a flawless result.
I opted for the instant solution and was impressed by the quality of the result: clear pronunciation, a natural voice close to my own.
As an added bonus, ElevenLabs integrates natively with HeyGen, so you can generate videos of yourself with a voice very close to your own.
Yuzzit targets a diverse clientele, including a significant number of Spanish speakers. Our challenge was to create resources that could be understood by this clientele without having to find native speakers to produce them.
With this in mind, we planned to integrate the previous content generation process, adding an additional element: voice dubbing, which will also be carried out by ElevenLabs.
The process is simple: you supply an audio or video track and the artificial intelligence behind the solution analyzes the file, translates the speech and synthesizes the translated text into a new audio track that retains the original tone and style of voice. The result is a dubbed version of your video that sounds natural and authentic.
However, creating content in Spanish revealed a major challenge: the correct translation and pronunciation of technical terms specific to our field, such as "RTMP". These words, crucial to the understanding of our tutorials and content, do not lend themselves easily to direct translation.
The final phase of the translation process involved synchronizing the video with the Spanish narration produced. This simply involved returning to the HeyGen platform, uploading our video and applying our audio file.
The software adjusts the Spanish narration to match the lip movements in the video. The result? A video that gives the impression of having been originally created in Spanish, making its content clear and accessible to our Spanish-speaking users.
After this complete test for the creation of an avatar and the automatic translation of my videos, I'm still very enthusiastic about the possibilities that AIs offer for content creation.
In our case, I can see that my avatar is really useful for my projects. Especially the voice functionality, which I can use to improve and translate my voice-overs and tutorial videos.
However, despite the encouraging results, AI is not without its challenges and areas for improvement.
My first point of attention concerns the costs that these tools can represent, especially if, as in this case, you have to combine several paid services for a single project, and with results that are still limited.
Indeed, and this is my second point, beyond the quality which varies according to the level of the projects, I'm still lukewarm on the promise of simplified creation for a professional result. The workflow involved the use and synchronization of several different artificial intelligences (HeyGen and Eleven Labs), which ultimately makes the process complex and could discourage some users.