A recent free AI-powered audio processing tool from Adobe can improve some subpar voice recordings by eliminating background noise and enhancing speech. When it works, the end product sounds just like a recording done with a top-notch microphone in a professional sound booth.
Project Shasta, a research endeavor into artificial intelligence, is where the new technology, termed Enhance Speech, first appeared. Project Shasta was recently renamed Adobe Podcast by Adobe.
The best way to use Enhance Speech is through a desktop web browser, but doing so necessitates signing up for an Adobe account. Once they have registered, users can post MP3 or WAV files that are up to 1GB in size or an hour long. After a while, you may either download the audio that has been cleaned up or listen to the finished product in your browser.
In our tests with the service, Enhance Speech performed best when used with audio that had a voice without significant noise or crosstalk. For instance, after processing the audio using Enhance Speech, we recorded audio from the built-in microphone of an iMac of a person standing 10 feet away, including surrounding fan noise, and the final audio sounded like it had been recorded up close in a quiet studio with a professional microphone.
How does it function? Adobe did not disclose any information, but we believe that a deep-learning model was trained on a significant amount (perhaps thousands of hours) of both clear and noisy audio. The model might then “learn” to recognize the frequencies of human speech and create a replica that closely resembles the original. We have contacted the firm for comment, but until Adobe shares more technical information, this is just conjecture.
On that note, some Hacker News commenters have reported hallucinated results from extremely noisy audio (such as speech recorded beside a waterfall) or from non-English language sources, which suggests that Enhance Speech is doing more than just a basic noise-reduction technique. These results include unexpected output like phantom voices where the AI misinterprets the input audio.
It’s not the first product to offer this kind of noise reduction capacity backed by AI, as Enhance Speech does. For instance, a commercial service named Audio Studio and an open-source program called mayavoz both perform a related function.
It’s important to note that Enhance Speech is a component of a wider collection of AI-powered podcasting tools from Adobe, which also includes a Mic Check tool (now offered for free) and a transcript-based audio-editing tool that is presently in an invitation-only beta test.