AI headphones driven by Apple M2 can translate multiple speakers at once

Google’s Pixel Buds wireless earbuds have offered a fantastic real-time translation facility for a while now. Over the past few years, brands such as Timkettle have offered similar earbuds for business customers. However, all these solutions can only handle one audio stream at once for translation. 

The folks over at the University of Washington (UW) have developed something truly remarkable in the form of AI-driven headphones that can translate the voice of multiple speakers at once. Think of it as a polyglot in a crowded bar, able to understand the speech of people around him, speaking in different languages, all at once. 

Recommended Videos

The team is referring to their innovation as a Spatial Speech Translation, and it comes to life courtesy of binaural headphones. For the unaware, binaural audio tries to simulate sound effects just the way human ears perceive them naturally. To record them, mics are placed on a dummy head, apart at the same distance as human ears on each side. 

The approach is crucial because our ears don’t only hear sound, but they also help us gauge the direction of its origin. The overarching goal is to produce a natural soundstage with a stereo effect that can provide a live concert-like feel. Or, in the modern context, spatial listening. 

The work comes courtesy of a team led by Professor Shyam Gollakota, whose prolific repertoire includes apps that can put underwater GPS on smartwatches, turning beetles into photographers, brain implants that can interact with electronics, a mobile app that can hear infection, and more. 

How does multi-speaker translation work?

“For the first time, we’ve preserved the sound of each person’s voice and the direction it’s coming from,” explains Gollakota, currently a professor at the institute’s Paul G. Allen School of Computer Science & Engineering.

The team likens their stack to a radar, as it kicks into action by identifying the number of speakers in the surroundings, and updating that number in real-time as people move in and out of the listening range. The whole approach works on-device and doesn’t involve sending user voice streams to a cloud server for translation. Yay, privacy!

In addition to speech translation, the kit also “maintains the expressive qualities and volume of each speaker’s voice.” Morever, directional and audio intensity adjustments are made as the speaker moves across the room. Interestingly, Apple is also said to be developing a system that allows the AirPods to translate audio in real-time.

How does it all come to life?

The UW team tested the AI headphones’ translation capabilities in nearly a dozen outdoor and indoor settings. As far as performance goes, the system can take, process, and produce translated audio within 2-4 seconds. Test participants appeared to prefer a delay worth 3-4 seconds, but the team is working to speed up the translation pipeline.

So far, the team has only tested Spanish, German, and French language translations, but they’re hopeful of adding more to the pool. Technically, they condensed blind source separation, localization, real-time expressive translation, and binaural rendering into a single flow, which is quite an impressive feat.

As far as the system goes, the team developed a speech translation model capable of running in real-time on an Apple M2 silicon, achieving real-time inference. Audio duties were handled by a pair of Sony’s noise-cancelling WH-1000XM4 headphones and a Sonic Presence SP15C binaural USB mic.

And here’s the best part. “The code for the proof-of-concept device is available for others to build on,” says the institution’s press release. That means the scientific and open-source tinkering community can learn and base more advanced projects on the foundations laid out by the UW team. 

Comments on "AI headphones driven by Apple M2 can translate multiple speakers at once" :

Leave a Reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

The hottest new ChatGPT trend is disturbingly morbid
COMPUTING

The hottest new ChatGPT trend is disturbingly morbid

The rise of AI has helped us make some huge leaps. From helping with medicine research to spotting c...

Read More →
Microsoft’s new Surface Pro is smaller, lighter, and more stylus-friendly
COMPUTING

Microsoft’s new Surface Pro is smaller, lighter, and more stylus-friendly

Microsoft has just introduced a new Surface Pro tablet. Touted to be the thinnest and lightest Copil...

Read More →
WWDC 2025 could be the least exciting Apple event in years — and I think that’s a good thing
COMPUTING

WWDC 2025 could be the least exciting Apple event in years — and I think that’s a good thing

Apple WWDC This story is part of our complete Apple WWDC covera...

Read More →
Microsoft Copilot Vision turns your phone camera into an interactive visual search tool
COMPUTING

Microsoft Copilot Vision turns your phone camera into an interactive visual search tool

Late last year, Microsoft introduced a new AI feature called Copilot Vision for the web, and now it�...

Read More →
ChatGPT can now remember more details from your past conversations
COMPUTING

ChatGPT can now remember more details from your past conversations

OpenAI has just announced that ChatGPT received a major upgrade to its memory features. The chatbot ...

Read More →
You can now view all of your ChatGPT-generated images in one place
COMPUTING

You can now view all of your ChatGPT-generated images in one place

OpenAI did text generation and image generation separately for quite a while, but that all changed a...

Read More →
OpenAI’s latest model creates life like images and readable text, try it free
COMPUTING

OpenAI’s latest model creates life like images and readable text, try it free

OpenAI has introduced its 4o model into ChatGPT to enable native image generation within the chatbot...

Read More →
Apple’s hardware can dominate in AI — so why is Siri struggling so much?
COMPUTING

Apple’s hardware can dominate in AI — so why is Siri struggling so much?

Over the past year or so, a strange contradiction has emerged in the world of Apple: the company mak...

Read More →
Meta rolls out its AI chatbot to nearly a dozen Middle Eastern nations
COMPUTING

Meta rolls out its AI chatbot to nearly a dozen Middle Eastern nations

MetaMillions of Facebook, Instagram, WhatsApp, and Messenger users throughout the Middle East now en...

Read More →