Intention (WP3)

Overview

One of the pillars of human-human communication is Intention. The ability to predict and understand what others will do gives us clear advantages compared to other species [1]. The same holds for a robot involved in human-robot-interaction – “reading the mind” of the human makes it possible for the robot to be pro-active and act on incomplete or incorrect information. It also makes it possible for a social robot to assist a human without explicit commands. Robots capable of this will give rise to “a fundamentally new kind of collaboration between humans and robots” [2]. Intention recognition for robots in general has been thoroughly addressed in our earlier research, both at the sensory motor level [3], and at higher levels [4] with several interaction modalities. Speech is often a preferred mode of interaction for applications with robots in health and eldercare [5]. In an assumed scenario, an older adult wants to eat, gets up from the chair and moves towards the kitchen. The person may use an explicit imperative command such as “Make me a sandwich”, or an implicit declarative sentence such as “I am hungry”. Both sentences could represent the same underlying intention of asking for help with the sandwich, but require very different inference mechanisms. The robot could also use video image analysis and draw the same conclusion by observing the human moving towards the kitchen. The work in WP3 will examine how such varying character and quality of interaction can be adjusted for and combined in order to infer human intention in advanced and novel ways. The ESRs will first conduct data collection and a user study at ARPAL, to investigate how older adults’ preference for implicit versus explicit interaction depend on factors such as the robot’s capabilities, real and as perceived by the human. They will then develop techniques for recognition of intention from speech and vision. ESR6 will incorporate inferred intention at a higher level in the dialogue manager, with the aim of dealing with certain age-related dialogue phenomena such as change of topic and repetitions. A potential application for the research in WP3 is robots that support ADL, such as PAL-R’s TIAGo, or FHG’s Car-O-bot. Equipping this kind of robots with a multimodal system for advanced intention recognition has a potential to improve interaction, user satisfaction, and task performance.

Tasks and Deliverables

TasksT3.1 Development of algorithms for inference of intention from natural language analysis combined with planning (ESR4)
T3.2 Development of algorithms for inference of intention from visual sensors (ESR5)
T3.4 Dialogue management with varying Interaction Quality (ESR6)
T3.3 Combined methods for inference of intention from natural language and vision (ESR4/ESR5/ESR6)

Involved ESRs

ESR4 will develop algorithms to infer human intention from sentences of varying linguistics types, representing vastly different Interaction Quality. For imperative sentences, we have earlier[6] shown how semantic roles can be mapped to intention using machine learning techniques, and will extend this work by using dependency parsing for analysis of compound noun phrases. Grounding to physical objects will be done by extending earlier work[7] in which priming in semantic networks was shown to reduce perceptual ambiguity in intention recognition. For declarative sentences, a task planner will be incorporated. Planners have been previously used in robotics together with imperative sentences that define new planning goals[8],[9]. In our approach, the planner will be equipped with static goals (such as keeping the user satiated), and uttered declarative sentences will add facts to the current state. The planner will then generate a plan that can move the robot from current state to the goal state. In this way, the planner is used as a reasoning tool for implicit intention recognition, rather than a problem solving task planner. For the example above, the declarative statement “I am hungry” would evoke a plan of making a sandwich, in order to maintain the goal of keeping the human satiated. ESR4 will use data collected by ESR5 and ESR6 at ARPAL. During a secondment to ESR5@FHG, visual-sensor based intention recognition will be integrated. During an industrial secondment to PAL-R, results will be implemented and evaluated on the TIAGo robotic platforms (this is a tentative plan that may be adjusted to best fit actual research).

ESR5 (FHG) Implicit intention recognition using visual sensors

ESR5 will develop techniques by which the robot infers the intention and activity of a human. The basis will be localisation of human body parts object using an RGB-D sensor. We will make use of previous work in the ACCOMPANY project. The human activity recognition was there based on a graphical model to recognise activity sequence based on RGB-D videos[10]. It used latent variables to exploit sub-level semantics of human activities. The model showed outperforming results over the state-of-the-art approach[11]. The object recognition system contains methods to combine different sensor modalities to propose a fast, scale invariant local feature descriptor for recognising textured objects. An extension for the detection uses a global, combined 2d/3d feature descriptor[12]. Additional work dealt with development of fast global 3D shape descriptors[13] as well as preliminary work on human-comprehensible texture and 3d shape descriptors[14].

ESR5 will combine these two technologies in order to classify typical actions of a human and the objects in it. For the example when preparing a sandwich, the robot would identify the location of the user (kitchen) and relevant objects (piece of bread, knife) he/she is interacting with, classify the activity using spatio-temporal relations and previously learned activity classes (preparing food on the table). It could then offer suitable assistance (e.g. fetch specific ingredients from another room or list recipes on the screen) to support the user. The user will then not have to command to the robot for detailed assistance. The system is in this way capable of recognising and acting upon varying Interaction Quality caused by a user losing interest in verbally commanding the robot. ESR5 will implement suitable software components to solve this task using the Care-O-bot 4 assistive robot. Initially, ESR5 will, together with ESR6, conduct a user study during a secondment to ESR14@UWE. During a secondment to ESR1@HAM, algorithms for emotion recognitionwill be integrated to support intention recognition. During an industrial secondment to ABB, results will be implemented on their YuMi robot (this is a tentative plan that may be adjusted to best fit actual research).

ESR6 (UMU) Intention driven dialogue management

Dialogue management is responsible for deciding on the next appropriate robot step (including verbal utterances) based on user intention, current state, dialogue history, context, and purpose of the dialogue. Dialogue management approaches are traditionally divided into knowledge-based dialogue management (i.e. hand-crafted finite-state and planning approaches) and data-driven approaches[15]. Recent hybrid approaches to dialog management combine the benefits of both traditional approaches and avoid the disadvantages[16]. Phenomena such as sudden changes of topic, need of clarification, ambiguity, turn taking, misunderstandings, and non-understandings influence the character and quality of human-robot interaction based on dialogue. A social robot must be able to identify and adapt to such varying Interaction Quality in an effective and efficient way. In the context of eldercare, we encounter specific problems. For instance, accepted norms for dialogues, such as Grice’s conversational maxims are not always followed due to various age-related inabilities[17] (e.g. unmotivated topic changes).

ESR6 will investigate to what extent dialogue phenomena are caused by, or correlated to, violations of Grice’s maxims. The end-goal is to develop hybrid dialogue management approaches that 1) detect breaches of accepted norms for dialogue (e.g. Grice’s maxims) and 2) adapt dialogue management to the varying Interaction Quality. Implicit intention (recognised from language and vision) and emotion will be used as input to the dialog manager to promote decision making. Our hybrid dialog management approach will use novel finite-state based methods similar to our previously developed automata that include memory[18] and novel graph-transformation approaches for dialog management defined as logical interfaces, thus extending our earlier work[19],[20].

ESR6 will first, together with ESR5, conduct a user study during a secondment to ESR14@UWE, in order to investigate the occurrence of dialogue phenomena. Dialogues between staff and older adults will be recorded and analysed. Using implicit intention as input to the dialog manager will be investigated in close collaboration with ESR4. Intention from vision will be considered during a secondment to ESR5@FHG. The ability to deal with varying Interaction Quality will be evaluated in a series of dialogue simulations. A prototype dialogue management system will be developed, and during a secondment to PAL-R, implemented on the TIAGo robot (this is a tentative plan that may be adjusted to best fit actual research).

[5] Teixeria, A., A critical analysis of speech-based interaction in healthcare robots: making a case for increased use of speech in medical and assistive robots, in Speech and Automata in Health Care, Amy Neustein (ed.), 2014.