Stefanov, Kalin

KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.

2018 (English)Doctoral thesis, comprehensive summary (Other academic)

Abstract [en]

Nonverbal communication is essential for natural and effective face-to-face human-human interaction. It is the process of communicating through sending and receiving wordless (mostly visual, but also auditory) signals between people. Consequently, a natural and effective face-to-face human-machine interaction requires machines (e.g., robots) to understand and produce such human-like signals. There are many types of nonverbal signals used in this form of communication including, body postures, hand gestures, facial expressions, eye movements, touches and uses of space. This thesis investigates two of these nonverbal signals: hand gestures and eye-gaze. The main goal of the thesis is to propose computational methods for real-time recognition and generation of these two signals in order to facilitate natural and effective human-machine interaction.

The first topic addressed in the thesis is the real-time recognition of hand gestures and its application to recognition of isolated sign language signs. Hand gestures can also provide important cues during human-robot interaction, for example, emblems are type of hand gestures with specific meaning used to substitute spoken words. The thesis has two main contributions with respect to the recognition of hand gestures: 1) a newly collected dataset of isolated Swedish Sign Language signs, and 2) a real-time hand gestures recognition method.

The second topic addressed in the thesis is the general problem of real-time speech activity detection in noisy and dynamic environments and its application to socially-aware language acquisition. Speech activity can also provide important information during human-robot interaction, for example, the current active speaker's hand gestures and eye-gaze direction or head orientation can play an important role in understanding the state of the interaction. The thesis has one main contribution with respect to speech activity detection: a real-time vision-based speech activity detection method.

The third topic addressed in the thesis is the real-time generation of eye-gaze direction or head orientation and its application to human-robot interaction. Eye-gaze direction or head orientation can provide important cues during human-robot interaction, for example, it can regulate who is allowed to speak when and coordinate the changes in the roles on the conversational floor (e.g., speaker, addressee, and bystander). The thesis has two main contributions with respect to the generation of eye-gaze direction or head orientation: 1) a newly collected dataset of face-to-face interactions, and 2) a real-time eye-gaze direction or head orientation generation method.

Place, publisher, year, edition, pages

KTH Royal Institute of Technology, 2018. p. 54

Series

TRITA-EECS-AVL ; 2018:46

National Category

Computer and Information Sciences

Identifiers

urn:nbn:se:kth:diva-227986 (URN)978-91-7729-810-6 (ISBN)

Public defence

2018-06-07, Hörsal K2, Teknikringen 28, Stockholm, 14:00 (English)

Opponent

Potamianos, Gerasimos

University of Thessaly, Volos, GR .

Supervisors

Beskow, Jonas

KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.