You are here

Scientific Achievements

SEVEN YEARS OF POETICON RESEARCH

A core part of CSRI research and development has been formulated and enabled through the POETICON project series. POETICON is an interdisciplinary European funded project in the field of Cognitive Systems and Robotics. It started in January 2008, continued as POETICON++ in January 2012 and is currently running its seventh year of focused research. POETICON started with an ambitious basic research objective: to explore and model the “poetics of everyday life”, i.e. the synthesis of sensorymotor representations and natural language in everyday human interaction. This is related to an old problem in Artificial Intelligence on how meaning emerges, which was approached from an embodied and enactive cognition perspective. Basic tools for language, vision and action parsing, were developed and the modeling of their integration dynamics was explored. Experimental research fed the technology development, while a humanoid platform was used for a proof of concept demonstration of what could be achieved in Cognitive Systems through real integration of individual cognitive modules.

POETICON++ builds on these preliminary results arguing that robots need natural language for controlled generalisation of learned behaviours and for creativity. Its main objective is the development of an innovative computational mechanism for robust generalisation of motor programs and visual experiences for robots that will utilise the hierarchical and generative nature of language for ‘indexing’ (labelling) sensorymotor experiences at different levels of abstraction. The mechanism will integrate natural language and visual action and object recognition tools with advanced manipulation and mobility skills, affordance-based self-exploration abilities and a bio-inspired action-language learning module for: (a) behaviour generation through verbal instruction, and (b) visual scene understanding by a humanoid.

The POETICON project series is coordinated by the CSRI Director and brings together an international team of PIs and their teams, including Prof. Yiannis Aloimonos (University of Maryland, USA), Prof. Giulio Sandini, Prof. Luciano Fadiga and Prof. Giorgio Metta (Italian Institute of Technology, Italy), Prof. Angelo Cangelosi (University of Plymouth, U.K.) and Prof. Jose Santos Victor (Istituto Superior Tecnico, Portugal).

THE MINIMALIST GRAMMAR OF ACTION

Language and action have been found to share a common neural basis and in particular a common ‘syntax’, an analogous hierarchical and compositional organization. While grammatical formalisms and associated discriminative or generative computational models exist for language structure, such formalisms and models for the structure of action are still elusive. However, structuring action has important implications on action learning and generalization, in both human cognition research and computation. We have developed a minimalist grammar of action which is corroborated by neurobiological evidence. This is a formal specification of the structure of action, which employs the Chomskyan generative transformation grammar paradigm and is corroborated by neurobiological evidence. Though there are a variety of grammars for describing the structure of language, we chose the Chomskyan approach and its latest evolution into the Minimalist Program. The main reason was the fact that this framework is the culmination of an attempt to describe and explain language syntax in terms of principles and parameters that are not tied to the idiosyncrasies of the human language system, but instead may have counterparts in other biological systems. Thus, this perspective allows one to look for universals not only within the structures of different human languages, but also across natural language to non-symbolic sensorimotor spaces, such as human action.

Our generative action grammar comprises a set of terminals, features, non-terminals, and production rules in the sensorimotor domain. The need for filling in the values of the action features, drives the merging of action constituents into binary structures organised hierarchically into temporal sequences of actions of increasing complexity. These driving features are the tool of an action, the affected object and its goal. A minimal set of production rules (in which the tool and affectedobject complements of an action have a primary role) apply recursively in generating the denoted action tree(s), while a parser builds such trees bottom up dealing with both tail and true recursion (i.e. discontinuous action structures and long dependencies). Recursion, merge and move, are shown to be mechanisms that manifest themselves not only in human language, but in human action too.

POETICON Lithic Tool Experiments

What are the common uses of objects? Which attributes are characteristic of them? Object affordance and attribute knowledge is part of our common sense knowledge and thus, it is vital in the development of intelligent systems and the exploration of human learning mechanisms. However, such information has never been compiled in a large scale and with minimal bias on the part of the experiment designer. We designed and run a series of cognitive experiments through which we elicited such information on a large scale. In the experiments we used both visual and tactile stimuli comprising lithic tools, i.e., novel objects to the modern man, made for particular functions that formed part of the everyday activities of another age; as such, they proved to be ideal for eliciting rich information on everyday activities of the modern man in a completely unguided/unbiased way.

The participants used free speech (verbal reports) for answering a simple question: which objects were they presented with and what they could use them for. The experiment had various conditions including visual presentation of the stimuli in a computer screen, versus active manipulation of a lithic tool collection. All sessions were recorded with two cameras (en face and profile). The resulting data comprise more than 95 hours of verbal reports, from more than 120 participants. It includes both verbally expressed semantic information comprising object attributes, affordances and related argumentation, as well as a rich set of pantomimes, gestures and exploratory acts.

THE PRAXICON SEMANTIC MEMORY

The need of structured knowledge for intelligent systems across domains and applications has led to the development of a number of knowledge bases and ontologies in the form of relational databases and semantic association networks. Neuroscience findings and theories point to a multisensory and distributed semantic memory in the human brain; however, from a cognitive perspective, our common sense knowledge bases and ontologies remain still static stores of mostly lexical concepts, with ad-hoc created semantic associations. From an engineering perspective, large-scale generalisation and common sense reasoning in intelligent systems and robotics still stumbles on the semantic gap between high level symbolic representations and low level visuomotor experiences. We have developed the PRAXICON, a dynamic, recursive and referential semantic network with biological basis that aims to address the representation and integration challenges of embodied and enactive cognition.

Concepts in the PRAXICON may have a concrete (physical) or abstract reference pertaining to entities, movements and features with no domain constraints. They have multiple representations (symbolic, perceptual, motoric). All of them are important, but some of them are more important than others for generalisation and reasoning. Associations between concepts in the PRAXICON are simple or recursive in nature; they pertain to a finite set of pragmatic relation types that provide constraints on the production of new associations, without restricting such production though. The PRAXICON draws a clear line between language and semantic memory; it considers language an additional modality that contributes (significantly and uniquely) to the acquisition and use of generalised knowledge through dynamic interaction with perception and action, rather than an efficient representation means of such knowledge. We argue that the PRAXICON is necessary for intelligent systems, so that they move beyond one shot learning to large scale generalisation and common sense reasoning.

CROSS‐MODAL RELATIONS IN MULTIMEDIA

There is a vast amount of multimedia data, created by professionals or laymen and their sheer production is increasing rapidly: TV productions, illustrated documents (such as newspapers, books, blogs, and encyclopedias), captioned photo albums (in social media or within official archives, e.g., in crime scene investigation), homemade videos, surveillance videos, education or cultural heritage related audiovisual archives, verbally or gesturally commanded video games are just some examples. As we process such messages, we employ our cognitive system to trace this integration for making sense out of it, predicting and interpreting continuously as the message (or its processing) evolves dynamically in time. However, what is it that we trace though? In other words, what do we see as we listen, or what do we read as we see? How is speech/text associated to accompanying images/video of objects and actions and corresponding sounds? Understanding semantic association processes in integrating language, images, and sounds can contribute radically in employing critical thinking both when processing information created by others and generating audiovisual messages ourselves.

COSMOROE is a theoretical framework for modeling the semantic interplay between different means of expression, when formulating multimodal messages. COSMOROE identifies a number of semantic association types through which integration of modalities is served in multimodal message formation processes. Verbal and visual representations of objects, agents, movements, gestures, events, and abstract concepts engage into a semantic interplay that ranges from simple one-to-one equivalence relations to forced equivalence (metonymic and metaphoric cases), contradiction, and complementarity. The framework has been used in a variety of multimodal contexts, such as the analysis of TV travel series, newspaper caricatures and Hollywood movies; examples of such analysis are availabe online through the COSMOROE Search Engine. Recently, COSMOROE relations have formulated a hypothesis that drives experimental research for exploring language-image modulation of object saliency during perception of complex stimuli.

TIMING AND TIME PERCEPTION

Following up on activities of the TIMELY research network, Argiro Vatakis (Athens, Greece), along with Hedderik van Rijn (Groningen, The Netherlands) and Warren Meck (Durham, NC, USA) initiated and established the first multidisciplinary Journals and Online Review Series devoted to Timing and Time perception:

“Timing is ever-present in our everyday life – from the ringing sounds of the alarm clock to our ability to walk, dance, remember, and communicate with others. This intimate relationship has led scientists from different disciplines to investigate time and to explore how individuals perceive, process, and effectively use timing in their daily activities. aims to become the forum for all psychophysical, neuroimaging, pharmacological, computational, and theoretical advances on the topic of timing and time perception in humans and other animals. We envision a multidisciplinary approach to the topics covered, including the synergy of: Neuroscience and Philosophy for understanding the concept of time, Cognitive Science and Artificial Intelligence for adapting basic research to artificial agents, Psychiatry, Neurology, Behavioral and Computational Sciences for neuro-rehabilitation and modeling of the disordered brain, to name just a few.”

“The journal Timing & Time Perception (Brill Publishers) was initiated with the realization that the study of ‘timing and time perception’ is growing exponentially with interest from fields as diverse as cognitive science, computer science, economics, philosophy, psychology, robotics, and neuroscience … to name just a few. As with any scientific endeavor, once a sufficient empirical base has been established it becomes both necessary and desirable to support such a rapidly growing enterprise with a platform for publishing integrative and multidisciplinary reviews... We are pleased to announce that Timing & Time Perception Reviews (a joint publication of the University of Groningen and Brill Publishers) is being launched as a diamond open-access journal with that goal firmly in mind...” (Timing & Time Perception Reviews, vol.1, 2014).