Transforming conference calls by understanding the human brain

Everyone knows that traditional conference calls are often frustrating experiences—tedious, confusing, and unproductive. But to fix conference-call technology, my team at Dolby had to figure out why these calls work so poorly.

The answer, we found, lies in evolution and the science of the human brain. We’re not evolved to understand conversations in which every voice comes from the same place, in which it’s hard to distinguish one voice from another and in which we lose subtle audio cues.

It’s no wonder that when we have to contend with the unnatural, mono audio stream from conventional conferencing tools, we struggle to have fully productive meetings.

At Dolby, we’ve applied our deep understanding of human brain function and audio engineering to develop Dolby® Voice™, which uses sophisticated signal processing to create a conference-call experience that seems to your brain like an in-person meeting. The technology is debuting exclusively in the BT® MeetMe with Dolby Voice service.

To understand how—and why—Dolby Voice technology works, it helps to start with an understanding of why face-to-face meetings work.

When we enter a conference room for a face-to-face meeting, our brain quickly completes a complicated analysis. Within 30 seconds, we’ve created a model of the room in our heads—not only do we know exactly where everyone is, we even evaluate the room’s reverberations to further pinpoint sounds.

As the meeting goes on, we can distinguish subtle audio cues, such as someone clearing their throat because they want to speak. We can tell who’s chuckling and who’s signaling disagreement. And we can do it all without even thinking about it. We’re barely aware we’re doing it.

Traditional conference calls destroy most of the information that we instinctively rely on. All the voices come from one place, so we have trouble distinguishing who’s talking. The audio quality is terrible, so it’s hard to understand what is being said. And we can hear only one person at a time, so we lose all those audio cues that signal when someone disagrees or wants to be heard.

Without that data, our brains have to work overtime just to keep track of what’s being said and by whom. For example, one of your colleagues suggests a novel and interesting strategy. But you’re so busy trying to figure who has spoken and exactly what they said that you have limited brainpower left to evaluate the idea itself.

Restoring information

With Dolby Voice, we’ve restored much of the information that the brain needs to make conference calls feel like face-to-face meetings. One key element is voice separation. On a BT MeetMe with Dolby Voice call, you’ll hear each participant’s voice coming from what feels like a different place in the room, so it’s much easier to keep track of who’s speaking.

We’ve raised the audio fidelity of the call, so people’s words are much clearer and more distinct. At the same time, our technology screens out the extraneous noises—the clicking of a keyboard, for instance, or the crinkling of paper—that can drown out the conversation on traditional conference calls.

And with Dolby Voice, the conversation can no longer be hijacked by one loud person who won’t stop talking. On a Dolby Voice conference call, you can hear more than one person speak at a time, allowing the kind of natural interaction that happens face-to-face. People can interject important pieces of information or politely signal their desire to speak, and everyone can hear them.

With these changes, conference calls become much more pleasant, natural experiences where participation, spontaneity, and collaboration become commonplace. We believe the effects will be profound.

In our testing, we see that people who take part in Dolby Voice conversations relax. Suddenly, the strain of understanding what’s being said is gone, and people can truly process the conversation, consider each other’s points, and make thoughtful contributions.

Conference calls go from being something people dread to a tool that allows significant collaboration among teammates across the city or across the globe. And that unlocks the full potential of an organization in a way that we believe will be transformative.

Subscribe

Dolby is just pure awesomeness. I LOVE the contributions the company makes such as this. Dolby voice technology is yet another breakthrough that makes me want to join the team even more. I’ll stop ranting, as this is just a comment XD

The biggest problem with conference calls is the delay caused by the Voice Over IP and other processing technology that most of these services use. This very often causes a 2-4 second silent pause after you stop speaking, which is just long enough that everyone feels compelled to jump in at once. It then takes 2-4 more seconds for everyone to realize that they are all talking at once, at which point everyone stops talking, and the cycle repeats. This usually happens dozens of times during a typical hour-long telecon. By doing extra processing, I assume that this system will slow things down even more, and exacerbate this problem. No thanks Dolby.

“It is true that excessive delay can be a big problem for natural conversational flow – especially for fast moving exchanges – and this is something that Dolby understands very well in the context of natural meeting experiences. The good news is that Dolby Voice is engineered to introduce the least delay possible and is better (has less latency) than other VoIP solutions. Our experts have been able to engineer a solution that can perform sophisticated signal-processing in near-real time. This is combined with advanced server-side processing and network-transport optimization to deliver effective remote meetings that effortlessly include the natural back and forth of a dynamic conversation.”

I’m not even sure it’s a network delay – there’s just a delay while people wait to see if someone else is about to speak, and then they all decide at once that noone else is going to and bingo, they all do.

I can see this working for multiple people on individual laptops or personal devices, and Dolby processing software ‘places’ each participant in the stereo mix and provides high bandwidth audio, and good quality noise suppression. This will work great to a stereo headphone. How then does this work when connecting in to a meeting room full of people? I can see a ‘dolby array microphone’ provision at some point gathering the positional information of the participants in the room so they can be placed in the mix for the remote particiants, but what about the other way round? Is there just a stereo feed back to the room loudspeakers – you can’t expect the people in the room to be wearing headphones? Does the Dolby Voice ‘Room system’ exist yet? That’s where the big change will come in my world.

“Thank you for your interest and your insightful question Jason. Including the “meeting room full of people” use case in the Dolby Voice experience is indeed essential. We all know how painful it can be to be remote from the meeting room with conventional conferencing — often you are unable to hear all that is said or pick up who just agreed, or who didn’t. BT and Dolby have already announced plans to deliver client applications for PC, Mac, iOS and Android devices. We have also discussed our intent to deliver a powerful new class of conference phone. This device will truly combine the remote and room participants into one meeting and deliver the Dolby Voice experience to all attendees. We will be sharing more details in the first half of 2014.”

For me it will probably take more than a ‘super’ conference phone to deal with some of the larger rooms that I work on, but that will depend on how ‘super’ it is and what room size it can cope with. Typically current high performance conference phones as single units are OK in rooms to 14 people max. A few more if extension microphones are used. We often do rooms with up to 50-60 people seated round a table, and a more integrated ‘room system’ will be required. I can see how Dolby could take some elements of ‘Atmos’ technology and apply them to these situations to create physical ‘placement’ of remote callers in the soundspace of the room. Combined with multiple microphone arrays placing room participans in the stereo mix for remote users, that could become a really powerful and truly gamechanging solution.

Am I right in assuming that this can be called as disruptive technology in the world of Audio Conference?

I sincerely appreciate and admire that science and tethnology is continuously making things better.

Would you confirm that this innovation is first of its kind or first of its kind to be commercialised?

Just a thought would there be any way to select or deselect the Dolby experience ( or selected features) during a conference call – May be I want to hear that keyboard typing to keep a check or something

We think Dolby Voice is a “disruptive” technology innovation because it delivers an experience that will transform the way people view communications between mobile and distributed teams. It also has the potential to lower costs. BT MeetMe with Dolby Voice is the first commercial service to deliver this type of a conferencing experience.

We try to give users the best experience available and feel that most users will prefer to have background noises reduced and all participants more easily heard and identified.

The problems Dolby says it has solved are significant. With the audio portion of a meeting now under control, maybe we can begin to focus on the two other — and perhaps bigger — problems of conference calls: lack of visual interaction, and boring content. The first problem is a technological one; the second, unfortunately, is all too human.
Actually, the two may be connected: I was on a conference call yesterday in which we took 40+ minutes to address an issue that we could have resolved in less than 5 minutes if we had met face to face. The lack of visual interaction and visual cues (a critically important parallel channel of communication in a meeting) caused one of the meeting members to launch into 300-word sentences in an attempt to explain a point — and then to reexplain it several times in succeeding 300-word sentences (and no, I’m not exaggerating).
Dolby’s singular achievement, described above, will certainly help. But until we address the other two issues, we will continue to cram 3 minutes of content into 1 hour of teleconference.

Instead of everyone thinking outside of the box, what we need is a box large enough to accommodate all the people that actually need to be in these conference calls, but no larger than that. They all can then occupy this box at the same point in time. I will refer to this box idea as an ‘office’. Having all the people in close proximity will totally remove the latency whilst using this Dobly technology thus solving the delay issue some people have been banging on about. Also solved is the visual cues that are missed by people not within this same ‘office’. Reduced wiring costs is also an added benefit given the short distance between each teleconference terminal.

That, plus quantum polarized fibre optic cables and open source router software (OpenWRT, DD-WRT, etc.) for the copper endpoints of the network, supporting the full range of network spectra (Ethernet, 1901 AC powerline, 802.11abgn and .ac WiFi, etc.), could bring as much as 5 gigabits with low latency to homes in the near term, with 10 gigabit connections to local services and perhaps as much as 2-5 gigabits to the world.

This software would at least be useful *then*.

South Korea will get it first. North America will only get it if we have a telecom revolution – I mean a revolution of the Russian sort, building terabit backbones and pushing commercial providers out to the edges with no ability to monopolize backhaul to any given region. There is plenty of money to be made out where no one has fiber yet. But we can no longer let them strangle the places that do.

Have you considered interfacing with existing open open source conferencing? My project, FreeSWITCH, has a conferencing module that many companies are using to try to innovate conferencing. I created the conferencing module to make it easier for our development team to work while on the phone all day from different parts of the world. Over the years, I have developed keen skills to overcome the issues mentioned in the post but I still recognise when a less experienced guy calls in and is utterly confused. I have daily meetings on my conference with 17 or more people and I wonder how it would change things to use some of the features you describe.

Improving audio-only conferencing seems a tad like making a better buggy whip, but to be fair we’re probably still a decade out from pervasive video where the kids of tomorrow say “huh?” when we talk about audio-only. How does your technology compare to some of the spatial audio from high-end telepresence systems? Is the proposition similar performance at fractional price?
KA

Thank you for your comment. Its been said that 90% of a good video experience is the audio. We agree that the audio experience is critical and feel that properly configured video solutions running over high quality, managed networks can deliver outstanding audio and video experiences. In these cases, the audio experience can be comparable to some aspects of the Dolby Voice experience, although we are doing things to enhance the multiparty experience that you will not find in any multiparty video systems. Organizations need a range of different solutions: audio, web, and video. Audio conferencing delivers convenience, costs cents/minute (not dollars/minute like video), can be accessed anywhere, and is very simple to use. The results are that people use 100s of times more multi-party audio than multiparty video. Because organizations will always use far more audio conferencing, it is critical to give users the best solution.

Challenge is in today’s mobile world it is highly unlikely that you will have room full of people on both sides. Challenge is how to improve the quality ( including with delays, dead sound etc) in a real world environment where people are dialing in from all over the planet.

We agree. Dolby Voice-based conferencing services are designed for meetings that include any combination of people meeting together in a room and individuals joining from a mobile device or their PC/Mac.

I think what dolby is doing is great, but I also feel that the real question still is how these meetings are handled. If you have someone that is leading the meeting then there shouldn’t be multiple people speaking at once. As for the clearing of the throat and visual queues I believe that is more a matter of people needing to speak up or “raise their hand” so to speak when they want to say something whether it be to agree, disagree or debate a topic. Regardless, I’m looking forward to hearing more about it going forward.

This is a start. If you combine the Dolby Voice with Video, you can see the person’s face reactions as well as the sound format provided. Now that we have improved sound quality, we should improve video as well. Looking at a screen still overcomes the value of Dolby Voice to a percent. So what happens if you have a meeting room that is completely white. Have a 360 projector on the ceiling. As each user joins the meeting, He/She is projected to a corner in the meeting room along with the voice of the person from the same projection point. This then gives you a true feel with all our senses in tact. Like Dolby says, you get to see the faces, get a feel of tone and reaction as well as quality voice. Perfect. Soon, with technology, we could even have the video of the person in the meeting projected as a 3d image on the conference table. Now that is the future.

As a hearing-impaired person, I welcome anything that improves the auditory hell that conference calls can become. My biggest bugbear is inconsiderate people who use handsfree when there is no need to, and/or call in with really noisy background going on. If you can help address that I’ll be really impressed.

I’ve done some rudimentary analysis of the recordings presented in the online example. It would seem that Dolby Voice presents an 8KHz audio path consistent with the use of a standard wideband codec like G.722. Of course it could be iSac, Speex, AMR-WB or Opus. Do you not see any value in full bandwidth audio for voice? Most video conference systems pass at least 14 KHz in stereo.

Further, how do you envision a traditional meeting room engaging in such a meeting? It seems to me that it’s less than ideal that participants on derive the most significant benefits of the approach one-by-one by way of a computer or mobile device. That would seem to limit the application to a certain niche.

Thanks for your comments. As you know regular phones and mobiles are narrowband, with less than 4kHz audio bandwidth. The Dolby technology can provide wideband or super-sideband audio giving 8kHz or 16kHz audio bandwidth depending on how the service is configured. We use our own codec, and not one of the standard ones you mention, to give us high quality audio at relatively low data-rates and because our codec also offers other advantages as part of our advanced signal processing chain.

Your comments about the importance of including rooms in the conference make perfect sense and we plan to launch a conference phone product later this year. The new device is revolutionary and captures all the audio in the room for remote participants and helps to separate multiple remote talkers for those in the room. The experience is fully integrated with the soft-client experience already launched on Macs and PCs and which will be available on smartphones in a few weeks.