Research Panel Discussion

The Future of Audio Multimedia

6th November, 2pm-3:30PM Palm 2

Summary

Media sharing sites on the Internet and the one-click upload
capability of smartphones have led to a deluge of online multimedia
content. In 1 month, more videos content is uploaded to YouTube than
all the US media companies produced in 60 years, creating an
ever-growing demand for methods to make them easier to retrieve,
search, and index. While visual information is a very important part
of a video, acoustic information often complements it. This is
especially true for the analysis of consumer-produced, “unconstrained”
videos from social media networks, such as YouTube uploads or Flickr
content. In the past years, however, the content track of ACM
Multimedia has traditionally focused on the machine vision tasks of
video and image analysis. It is time to induce a shift in this
conference and introduce audio as an equally-weighted focus. By audio,
we mean any audio that a computer might encounter in YouTube videos or
on mobile device microphones: speech, music, environmental sounds, and
noise. Collectively, the analysis of these signals is known as
“machine listening” or “computational audition” and is a growing area
of research. This panel will bring together experts in machine
listening to discuss the future of the field and its relationship to
multimedia analysis. It will be aimed at the general multimedia
audience, including both researchers who study machine listening and
those who study image and video analysis, with the hope of seeding new
cross-disciplinary ideas and collaborations. Panelists will
specifically address the information complementary to video that is
easily obtained from audio, including the linguistic and emotional
content of speech, characteristics of talkers, characteristics of
acoustic scenes and events, and characteristics of wildlife and
natural events.

Potential Discussion Topics:

The promise of audio analysis on YouTube and mobile phones

Emerging topics in audio-driven interfaces for multimedia systems and applications

What are the most synergistic topics and techniques between audio and vision?

How could audio research be presented more usefully to ACM Multimedia?

How could ACM Multimedia attract more audio researchers?

What are the killer apps of audio multimedia?

Looking for Emotional and Social signals in Multimedia: Where art thou?

4th November, 2pm-3:30pm Palm 2

The panel discussion will focus around the following questions:

Where and What? Where are emotional and social signals in multimedia? What are they? Do the two areas, namely Emotional and Social Signals in MM and Social Media and Presence, cover all the different types of emotional and social signals in MM?

Context or Content? What if the meaning of the content can be better obtained from the context surrounding the content? Does introducing emotional and social dimensions help? Couldn’t we simply use social media to solve the content analysis problem? So for images, for example, rather than exploiting the GPS, time stamp, bluetooth information, should we be exploiting socially or emotionally relevant context such as, who you were with when you took the photo, etc.

The Scale Issue: At what scale are we formulating and providing solutions to the problem of emotional and social behaviour analysis? At the Individual, small groups, or large-scale? Does the treatment of emotional and social signals at all scales essentially address the same research problems? Or are the expertise and techniques fundamentally different? Are the expected findings and underlying theory different, why?

Closing the Gap: Has anything changed since 2006? Could or should MM Systems be designed to incorporate emotional and social signals? Where are the gaps? What are the problems that the community should be addressing? What topics should we be recommending our PhD students to focus on?

The future: If you were starting a PhD in this area, what would you choose as (one or more) thesis topic?