NVIDIA’s Latest tech makes AI Voices more Expressive and Realistic

Although the voices on Amazon’s Alexa, Google Assistant, and other AI helpers are significantly superior to those on older GPS gadgets, they still lack the rhythms, intonation, and other characteristics that make speech feel human. NVIDIA presented new research and tools at the Interspeech 2021 conference that can replicate such natural speech features by letting you train the AI system with your own voice. NVIDIA’s text-to-speech research team produced RAD-TTS, a winning submission in a NAB broadcast convention competition to develop the most lifelike avatar, to improve its AI voice synthesis.

An individual can use the system to train a text-to-speech model with their own voice, including tempo, intonation, timbre, and other factors. Another RAD-TTS function is voice conversion, which allows a user to convey the words of one speaker using the voice of another. That interface allows you to fine-tune the pitch, duration, and vigor of a synthesized voice at the frame level. NVIDIA’s researchers used this technology to create more conversational-sounding voice narration for its own I Am AI video series, instead of utilizing human voices. The goal was to have the narration reflect the tone and aesthetic of the videos, which hasn’t always been the case in AI-narrated videos.

Although the results are still a little artificial, they are far superior to any AI narrative I’ve ever heard.

“Our video producer could use this interface to record himself reading the video script, and then use the AI model to translate his speech into the female narrator’s voice. The producer may then instruct the AI like a voice actor, altering the synthesized speech to stress keywords and changing the narration’s cadence to better portray the video’s tone,” according to NVIDIA.

The startup, which began its offering in March, has already seen a conversion rate of over 70% among businesses that have tried it out. Terminus, Olive, Litmus, Imply, and Parse.ly are among Rattle’s more than 50 customers.

“[Our] lead response time has gone down by 75%, and crucial procedures have gone from days to minutes,” stated Jeff Ronald, GTM Ops Manager at LogDNA, after deploying Rattle. On Tuesday, the startup revealed that it had raised $2.8 million in a seed round from Lightspeed and Sequoia Capital India.

NVIDIA is making some of this research open-source through the NVIDIA NeMo Python toolkit for GPU-accelerated conversational AI, which is available on the company’s NGC hub of containers and other software. It’s optimized to run efficiently on NVIDIA GPUs, of course.

“Several of the models have been trained on NVIDIA DGX systems with tens of thousands of hours of audio data. “Using mixed-precision computing on NVIDIA Tensor Core GPUs, developers can fine-tune any model for specific use cases, speeding up training,” the company noted.

NVIDIA’s Latest tech makes AI Voices more Expressive and Realistic

More Posts

Sulfur Residue on Jupiter’s Icy Moon Europa is Mapped by Scientists

Comparative Discussion among Trust, Cartel, and Holding Company

BepiColombo Is Finally Meeting Mercury for the First Time

Theory of Constraints (TOC)

The Bar for Behavioral Health Startups just got Higher

Professor Brian Cox on Following Perseverance in New Film Seven Days on Mars

Latest Post

Top QS World University Rankings 2024

Nano-oscillator Achieves Record Quality Factor

Not Only Do Opposites Attract: A New Study Demonstrates That Like-Charged Particles Can Come Together

A Breakthrough in Single-photon Integration Shows Promise for Quantum Computing and Cryptography

Could the Sun be Conscious? Enter the Unorthodox World of Panpsychism

The Brains of Conspiracy Theorists Are Different: Here’s How

Facebook’s next chapter just might make sense

New Innovation Advances Optical Light Microscopy

Joe Biden’s Team Hid a Job Advert within the Source Code of Their Transition Website

Researchers used carbon dots created from human hair to boost solar technology