The Sound of Artificial Reality

Adobe's brilliant foray into the soundscapes that will help AI video seem more authentic

Nov 30, 2024

I’m sitting inside a massive conference hall in Miami awaiting the arrival of special guest Awkwafina (Crazy Rich Asians), and a cohort of software engineers who will present Sneaks, Adobe’s annual preview of new software innovations. The elite stagecraft of MAX event has become almost rote for the tech giant, so it’s tempting to frame the company’s latest look behind the digital curtain as mere corporate theatricality.

But smoke and mirrors isn’t on the menu today. Super Sonic, the Photoshop pioneer’s new ambient AI sound generation tool, may represent a meaningful new milestone in AI.

Much of the power of AI lies in replicating reality, so let’s think about what makes a lie believable. The art of mimesis is what artists have practiced for centuries, getting us to believe in the reality of invented worlds and taking us on a journey through their mind’s eye. With the ascent of generative AI, the power to pull off this aesthetic prestidigitation has increased severalfold. However, the focus in the last 24 months has largely been on visuals. But it is the auditory that often closes the loop in convincing many of us that what we’re seeing is real, and subsequently consenting to become immersed in the creator’s surreality.

Decades of research have established the fundamental importance of sound in human perception. The 1976 Nature study by McGurk and MacDonald (“Hearing Lips and Seeing Voices”), which gave rise to the term “the McGurk effect,” demonstrated how audio input can alter visual perception—what we hear can change what we think we see. Their findings, replicated countless times over the decades, reveal our brain's deep reliance on audio-visual integration for constructing reality.

More recent work by Dr. Huriye Atilgan, who studies auditory neuroscience, showed that the temporal synchronization between audio and visual inputs helps determine how we process reality. Adobe's algorithms, by generating synchronized ambient sound, exploit these neural mechanisms that evolution has spent millennia fine-tuning.

The competing technologies in this space (Meta’s AudioCraft, Google's AudioLM, ElevenLabs, Suno, and others) represent important developments that are nevertheless still nascent and on the margins. While multiple companies dabble in audio generation, none boast Super Sonic’s deep integration with Adobe’s dominant creative software ecosystem.

Bending Reality’s Edge

In general, most of the generative AI space is rightfully focused on the technology’s rise as the next frontier of video. But with the combination of generative AI audio, we are witnessing the birth and industrialization of sensory emulation on an unprecedented scale.

During the Super Sonic demonstration, Adobe’s Justin Salamon, head of the company’s Sound Design AI Group, took the tool through its paces. Under the unforgiving gaze of a live audience, he took little time creating everything from ambient forest sounds to accompany an AI video scene to surprisingly effective flying saucer sounds paired with an AI-generated video of the same. Currently, the generative AI audio can only add up to 10 seconds of generative audio to video footage. But as with many things AI, this will likely improve soon.

Using Super Sonic, a filmmaker crafting a scene of 1920s Montmartre, Paris could dial in not just generic “street sounds” but the specific timbre of early Citroën engines echoing off zinc rooftops, the clatter of wooden cart wheels on cobblestones, the particular acoustic signature of absinthe glasses clinking in small cafés. Horror game designers could generate soundscapes that shift based on user biometrics or movements, creating Kubrickian psychological experiences. The footsteps behind you could subtly change their acoustic properties as your anxiety rises, the room tone imperceptibly shifting to match your mounting dread.

For music production, Super Sonic could deliver new kinds of auditory precision. Need the exact room tone of Sun Studios circa 1954? The specific microphone bleed of Abbey Road's Studio Two? The characteristic tape hiss of a 1960s Ampex machine? All now can become adjustable parameters in your sonic palette.

Special AI Effects

“Our primary goals were to…ensure that the quality of the AI model met professional standards and could seamlessly integrate with Adobe’s audio and video product ecosystem. We also wanted to democratize sound design for all video editors…so that even beginners could enhance their projects with professional-quality sound effects,” said Adolfo Hernandez Santisteban, one of Super Sonic’s developers, during the launch of the new tool.

“Video editors often spend considerable time searching for the right sound effects to match their visuals…we aimed to streamline the process by enabling on-demand generative AI sound effects.”

Based on what we’re seeing from Adobe, it appears that the future will not only be seen through AI's generative lens but also heard through its algorithmically attuned “ears.” This kind of granular control of sound promises to rival the precision pixel manipulation we now take for granted in Photoshop.

Using these tools, it’s fun to imagine what film score master Bernard Herrmann (Taxi Driver, Vertigo, Psycho) might have come up with to access our auditory imaginations. And while some may lament the advance of AI into the realm of cinematic reality-bending, it’s also possible that Herrmann might—as newer artists are—view AI as just another instrument in humanity's multifaceted palette of creative expression.

Hundred Year Lens

Discussion about this post