Dick Bulterman, Ph.D.

Prof.dr. Dick C.A. Bulterman was President and CEO of FX Palo Alto Laboratory (FXPAL) from 2013-2015. He is also a full professor at the Vrije Universiteit Amsterdam. Prof. Bulterman, who received his Ph.D. from Brown University, has been active in the areas of multimedia systems and user interfaces for over three decades. He has published extensively on user and systems issues. He has written two popular books on multimedia presentation structuring. He is a main architect of the W3C SMIL language, he is Vice Chair of ACM SigWEB and is associate editor of leading journals in the multimedia field. In 2013, he received the prestigious ACM SIGMM Lifetime Achievement Award. Prior to joining FXPAL in 2013, he was head of the Distributed and Interactive Systems group at CWI: Centrum Wiskunde en Informatica in Amsterdam, The Netherlands.

Abstract

In this paper we report on our efforts to define a set of document extensions to Cascading Style Sheets (CSS) that allow for structured timing and synchronization of elements within a Web page. Our work considers the scenario in which the temporal structure can be decoupled from the content of the Web page in a similar way that CSS does with the layout, colors and fonts. Based on the SMIL (Synchronized Multimedia Integration Language) temporal model we propose CSS document extensions and discuss the design and implementation of a proof of concept that realizes our contributions. As HTML5 seems to move away from technologies like Flash and XML (eXtensible Markup Language), we believe our approach provides a flexible declarative solution to specify rich media experiences that is more aligned with current Web practices.

Abstract

In this paper we discuss communication problems in video-mediated small group discussions. We present results from a study in which ad-hoc groups of five people, with moderator, solved a quiz question-select answer style task over a video-conferencing system. The task was performed under different delay conditions, of up to 2000ms additional one-way delay. Even with a delay up to 2000ms, we could not observe any effect on the achieved quiz scores. In contrast, the subjective satisfaction was severely negatively affected. While we would have suspected a clear conversational breakdown with such a high delay, groups adapted their communication style and thus still managed to solve the task. This is, most groups decided to switch to a more explicit turn-taking scheme.
We argue that future video-conferencing systems can provide a
better experience if they are aware of the current conversational
situation and can provide compensation mechanisms. Thus we
provide an overview of what cues are relevant and how they are
affected by the video-conferencing system and how recent
advancements in computational social science can be leveraged. Further, we provide an analysis of the suitability of normal webcam data for such cue recognition. Based on our observations, we suggest strategies that can be implemented to alleviate the problems.

Abstract

As commercial, off-the-shelf, services enable people to easily connect with friends and relatives, video-mediated communication is filtering into our daily activities. With the proliferation of broadband and powerful devices, multi-party gatherings are becoming a reality in home environments. With the technical infrastructure in place and has been accepted by a large user base, researchers and system designers are concentrating on understanding and optimizing the Quality of Experience (QoE) for participants. Theoretical foundations for QoE have identified three crucial factors for understanding the impact on the individualÃÂ¢ÃÂÃÂs perception: system, context, and user. While most of the current research tends to focus on the system factors (delay, bandwidth, resolution), in this paper we offer a more complete analysis that takes into consideration context and user factors. In particular, we investigate the influence of delay (constant system factor) in the QoE of multi-party conversations. Regarding the context, we extend the typical one-to-one condition to explore conversations between small groups (up to five people). In terms of user factors,
we take into account conversation analysis, turn-taking and role-theory, for better understanding the impact of different user
profiles. Our investigation allows us to report a detailed analysis on how delay influences the QoE, concluding that the actual
interactivity pattern of each participant in the conversation results on different noticeability thresholds of delays. Such results have a direct impact on how we should design and construct video-communication services for multi-party conversations, where user activity should be considered as a prime adaptation and optimization parameter.

Abstract

Delay has been found as one of the most crucial factors
determining the Quality of Experience (QoE) in synchronous
video-mediated communication. The effect has been extensively
studied for dyadic conversations and recently the study of small
group communications has become the focus of the research
community. Contrary to dyads, in which the delay is symmetrically perceived, this is not the case for groups. Due to the heterogeneous structure of the internet asymmetric delays between participants are likely to occur.

Abstract

Live 3D reconstruction of a human as a 3D mesh with commodity electronics is becoming a reality. Immersive applications (i.e. cloud gaming, tele-presence) benefit from effective transmission of such content over a bandwidth limited link. In this paper we outline different approaches for compressing live reconstructed mesh geometry based on distributing mesh reconstruction functions between sender and receiver. We evaluate rate-performance-complexity of different configurations. First, we investigate 3D mesh compression methods (i.e. dynamic/static) from MPEG-4. Second, we evaluate the option of using octree based point cloud compression and receiver side surface reconstruction.

Abstract

Creating compelling multimedia content is a difficult task. It involves not only the creative process of developing a compelling media-based story, but it also requires significant technical support for content editing, management and distribution. This has been true for printed, audio and visual presentations for centuries. It is certainly true for broadcast media such as radio and television.
The talk will survey several approaches to describe and manage media interactions. We will focus on the temporal modeling of context-sensitive personalized interactions of complex collections of independent media objects. Using the concepts of â€˜togethernessâ€™ being employed in the EUâ€™s FP-7 project TA2: Together Anywhere, Together Anytime, we will follow the process of media capture, profiling, composition, sharing and end-user manipulation. We will
consider the promise of using automated tools and contrast this with the reality of letting real users manipulation presentation semantics in real time.
The talk will not present a closed form solution, but will present a series of topics and problems that can stimulate the development of a new generation of systems to stimulate social media interaction.

Abstract

3D Tele-immersion enables participants in remote locations to share, in real-time, an activity. It offers users interactive and immersive experiences, but it challenges current media streaming solutions. Work in the past has mainly focused on the efficient delivery of image-based 3D videos and on realistic rendering and reconstruction of geometry-based 3D objects. The contribution of this paper is a real-time streaming component for 3D Tele-Immersion with dynamic reconstructed geometry. This component includes both a novel fast compression method and a rateless packet protection scheme specifically designed towards the requirements imposed by real time transmission of live-reconstructed mesh geometry. Tests on a large dataset show an encoding speed-up upto 10 times at comparable compression ratio and quality, when compared to the high-end MPEG-4 SC3DMC mesh encoders. The implemented rateless code ensures complete packet loss protection of the triangle mesh object and a delivery delay within interactive bounds. Contrary to most linear fountain codes, the designed codec enables real time progressive decoding allowing partial decoding each time a packet is received. This approach is compared to transmission over TCP in packet loss rates and latencies, typical in managed WAN and MAN networks, and heavily outperforms it in terms of end-to-end delay. The streaming component has been integrated into a larger 3D Tele-Immersive environment that includes state of the art 3D reconstruction and rendering modules. This resulted in a prototype that can capture, compress transmit and render triangle mesh geometry in real-time in realistic internet conditions as shown in experiments. Compared to alternative methods, lower interactive end-to-end delay and frame rates over 3 times higher are achieved.

Abstract

Geometry based 3D Tele-Immersion is a novel emerging media application that involves on the fly reconstructed 3D mesh geometry. To enable real-time communication of such live reconstructed mesh geometry over a bandwidth limited link, fast dynamic geometry compression is needed. However, most tools and methods have been developed for compressing synthetically generated graphics content. These methods achieve good compression rates by exploiting topological and geometric properties that typically do not hold for reconstructed mesh geometry. The live reconstructed dynamic geometry is causal and often non-manifold, open, non-oriented and time-inconsistent. Based on our experience developing a prototype for 3D Teleimmersion based on live reconstructed geometry, we discuss currently available tools. We then present our approach for dynamic compression that better exploits the fact that the 3D geometry is reconstructed and achieve a state of art rate-distortion under stringent real-time constraints.
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6854788&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6854788

Abstract

With the growing popularity of video communication systems, more people are using group video chat, rather than only one-to-one video calls. In such multi-party sessions, remote participants compete for the available screen space and bandwidth. A common solution is showing the current speaker prominently. Bandwidth limitations may not allow all streams to be sent at a high resolution at all times, especially with many participants in a call. This can be mitigated by only switching on higher resolutions when they are required. This switching encounters delays due to latency and the properties of encoded video streams. In this paper, we analyse and improve the switching delay of our video conferencing system. Our server-centric system offers a next-generation video chat solution, providing end-to-end video encoding. To evaluate our system we use a testbed that allows us to emulate different network conditions. We measure the video switching delay between three clients, each connected via different network profiles. Our results show that missing Intra-Frames in the transmission has a strong influence on the switching delay. Based on this, we provide an optimization mechanism that improves those delays by resending Intra-Frames.
http://dl.acm.org/citation.cfm?id=2579472

Abstract

With the massive amount of captured multimedia, authoring is more relevant than ever. Multimedia content is available in many settings including the web, mobile devices, desktop applications, as well as games and interactive TV. The authoring and production of multimedia documents demands attention to many issues related to the structure and to the synchronization of the media components, to the specification of the document and of the interaction, to the roles of authors and end users, as well as issues concerning reuse and digital rights management. Several complementary approaches to support the authoring of multimedia documents have been reported in the literature, and in many cases they have been studied via authoring tools and applications. One aim of this special issue is to assess current approaches, tools and applications, discussing how they tackle the main issues relative to the process of authoring, as well as their limitations.

Abstract

Creating compelling multimedia productions is a nontrivial task. This is as true for creating professional content as it is for nonprofessional editors. During the past 20 years, authoring networked content has been a part of the research agenda of the multimedia community. Unfortunately, authoring has been seen as an initial enterprise that occurs before ‘real’ content processing takes place. This limits the options open to authors and to viewers of rich multimedia content for creating and receiving focused, highly personal media presentations. This article reflects on the history of multimedia authoring. We focus on the particular task of supporting socially-aware multimedia, in which the relationships within particular social groups among authors and viewers can be exploited to create highly personal media experiences. We provide an overview of the requirements and characteristics of socially-aware multimedia authoring within the context of exploiting community content. We continue with a short historical perspective on authoring support for these types of situations. We then present an overview of a current system for supporting socially-aware multimedia authoring within the community content. We conclude with a discussion of the issues that we feel can provide a fruitful basis for future multimedia authoring support. We argue that providing support for socially-aware multimedia authoring can have a profound impact on the nature and architecture of the entire multimedia information processing pipeline.