The quality of the sound we hear, the images we see and the emotions that speech convey, matter greatly for the creation of valid meaning. This complexity is often not captured when communicating through the internet and with machines.

Lost in translation

Whether it be a Skype call, or giving a voice command to your smartphone, communicating through technology can be tricky.

Anything that disrupts our ordinary speech rhythms, as well as the way we process tone of voice, facial expression and other physiological cues, can affect interpretation of the speech act and transform meaning.

Understanding the complex ways in which we communicate can help us develop technologies which will improve online exchanges and reduce misunderstandings, so engineers and researchers such as Harte have been focused on two ways to improve digital communication:

Improving speech quality

In many situations where humans and gadgets try to talk to each other – through dialogue systems, e-books, tablets, mobile phones and computer games – it is the machine which struggles to understand the spoken cues and then formulate an intelligible and natural sounding response.

Researchers are trying to improve human-to-human communication by enhancing how the multimedia capabilities of the internet function together simultaneously. At Trinity College Harte is currently working on a project to improve something called “Audio-Visual Speech Recognition”.

This means using visual data such as tracking lip movements to improve speech recognition and thus the audio signal. Using similar mechanisms the research group Sigmedia hope to improve human-to-machine communication by having machines sense lip and eye movements, gesture and voice.

Transmitting emotions

The other major challenge to improving speech technologies is related to the emotional content of speech.

Designers and researchers know this – but the field of affective computing (computing that influences emotion) is relatively young. At this point speech recognition systems are mostly inadequate to the task of conveying and recognising emotion.

This makes these applications both less user friendly and less effective.

For these systems to improve there must be research into how to create a framework for the classification of emotional signals, in particular given that they vary greatly across cultures. Fusing audio and visual cues and accounting for cultural and situational variation is key to this process.

Researchers must ask how well their system functions when speech is informal, people are speaking in a second languages, or when the speakers are emotionally influenced.

The importance of non-verbal cues

Understanding a spoken message also depends on what we see at the time.

To illustrate this point, Harte combines an identical sound with two videos of lips mouthing different phrases.

Although the sound remains the same, the audience believes they have heard two different words – even after she explains the trick (check out the video below to try it for yourself).

Essentially, the same sound will be heard differently depending on the visual signal. This phenomenon is called the “McGurk effect”, and shows that speech is seen as well as heard.

This example points to a fact that anthropologists, psychologists, mothers and salespeople know well: non-verbal cues like tone of voice and gesture texture our understanding of any speech-act.

This has to be taken into account when communicating digitally.

Even speech itself is hard to understand without context. In addition, the interpretation of the context changes according to our cultural background. The tempo and rhythm of our speech, how long we pause and how long we wait after someone has spoken before initiating a response differ across cultures.

In some languages, the time we would wait before we respond is much longer than others, where it is customary to overlap our response with the end of another person’s statement. This effects communication to such a great extent that a native English speaker choosing to speak in Spanish will mostly abide by the customary patterns of English and vice versa.

A time lag during a Skype voice call can thus intensify misunderstanding and dissonance in inter-cultural communications. And this may be exaggerated by the quality of the audio or video signal.

Does digital communication have a place in business?

There is a general belief that investment in communications technology can cut the cost of international business and collaboration.

Thus Google hangouts, Skype and Facebook video are increasingly used for professional purposes such as conferences, international meetings, student lessons and supervision.

There are many documented examples of success, failures and misunderstandings.

Successful international business will probably continue to rely on handshakes, given the importance of physical presence in conveying emotion, creating trust and building empathy.

However, businesses also run on efficiency.

If technology can improve to the extent that it enables the processing of non-verbal gestures (such as lip and eye movements), then the reduced costs in terms of travel will continue to make it a lucrative area of academic research, technological investment and business practice.

One thing is certain, improvements will depend increasingly on the synthesis of multimedia capabilities and recognition of our cultural differences in communicating, interpreting and understanding one another.

Philippa Nicole Barr does not work for, consult to, own shares in or receive funding from any company or organisation that would benefit from this article, and has no relevant affiliations.

Philippa is a PhD candidate in Architecture at University of Sydney, studying architectural atmospheres and sensory and communication technologies. She has extensive experience in media and design in Italy, Germany, and Australia.

Booker, which helps service businesses better engage with customers online, has raised $35 million in a Series C round led by Medina Capital, with participation from strategic investor First Data, Jump Capital, and Signal Peak Ventures, as well as existing investors. The New York City company now sees 3 million appointments booked monthly across 73 countries in 11 languages on its platform. [via Booker]

PCH, a company which “helps entrepreneurs turn ideas into brands and makes a variety of consumer tech products for major companies such as Apple,” has acquired Fab for a reported $15 million in cash and stock. Fab previously had a $1 billion valuation and raised $325 million. It will “continue to focus on design” at PCH. [Source: Bloomberg]

BlackBerry has unveiled several new smartphones at the Mobile World Congress in Barcelona, including the touchscreen-focused BlackBerry Leap and a device with a “dual curve slider,” in addition to its keyboard-equipped products. [Source: New York Times]

March 3, 2015

“I hope to have a bigger presence in the tech world. I love coming up with different app ideas, and I have a few more that are coming out. Once you get started and you have this creative bug of ideas that you want to get out, I feel like I’ve partnered with the right team, and now I have the creative outlet to make that happen. I’m happy that people are into it and perceiving it well. I just want to create more apps.”

PayPal is planning to acquire Paydiant, the company behind CurrentC — retailers’ answer to Apple Pay — for a reported $280 million. No word yet on how the companies will mix, nor if Paydiant’s relationship with the industry group behind CurrentC will remain intact. [Source: Re/code]

Microsoft is in talks to acquire Prismatic, a news aggregation service that uses natural language processing to recommend content in which its users might be interested, according to a report from TechCrunch. Apple, Yahoo, Google, and Facebook are all said to have expressed similar interest in the company. (Which is surely a sign of actual interest and not at all an attempt by someone at the company to make it seem like a hot commodity — right?) [Source: TechCrunch]

March 2, 2015

“Just wanted to confirm that the rumors are true — I’m excited to be running Google’s Photos and Streams products! It’s important to me that these changes are properly understood to be positive improvements to both our products and how they reach users.”

Samsung has announced Samsung Pay, a competitor to the Apple Pay product included in Apple’s latest iPhones, at the Mobile World Congress in Barcelona. The feature will allow new Samsung Galaxy S6 owners who use MasterCard to pay for goods with their phones. It’s not clear when other credit card companies will be supported. [Source: The Guardian]

Google’s product head, Sundar Pichai, said during the Mobile World Congress in Barcelona today that the company’s wireless network will debut in the United States in the “coming months.” Asked about the network’s features, Pichai said that it wants to “experiment” like it has with Android, and that it has carrier partners with which it’s working. [Source: TechCrunch]