Plenary Talks

Isabelle Guyon

Transporting electricity across states, countries, or continents is vital to modern societies. We take for granted that electricity is available to use at all time, but reliably managing power grids and in particular avoiding “blackouts” (catastrophic cascading failures) is a difficult problem requiring skilled engineer controlling operation at all times. With the advent of renewable energies and the globalization of electricity markets, the problem is increasing in complexity. In this context, there are opportunities for neural networks and machine learning methods to help automating the system. The contributions of neural networks can range from replacing existing physical simulators of the grid by faster neural network proxies, to suggesting preventive or curative actions to protect lines from over-heating, by rebalancing the flow in the power grid. The latter problem may be amenable to reinforcement learning. We will compare two neural network approaches that we developed to speed up power flow computations. The first one, the LEAP net (LatentEncoding of Atypical Perturbation) implements a form of transfer learning, permitting to train on a few source domains (grid topology perturbations), then generalize to new target domains (combinations of perturbations), without learning on any example of that domain. We evaluate the viability of this technique to rapidly assess curative actions that human operators take in emergency situations, using real historical data, from the French high voltage power grid. The second one, the Graph Neural Solver (GNS) overcomes the limitation of the LEAP net to work in the vicinity of a fixed grid topology by implementing an iterative approximation of the physics equations. Finally, to go beyond the simple prediction of flows and move towards assisting operators to control the grid, we present the competition program "learning to run a power network", of which a first edition ran this year as part of the IJCNN competition program.

Dr. C. Lee Giles

Neural networks are often considered to be black box models. However, discrete time recurrent neural networks (RNNs), which are one of the most commonly used, have properties that lend themselves to similarities with automata and formal grammars and thus to the extraction and insertion of grammar rules. Assume that we have a discrete time RNN that has been trained on sequential data. For each discrete step in time, or a collection thereof, an input can be associated with the RNNs current and previous activations. We can then cluster these activations into states to obtain a previous state to current state transition that is governed by an input. From a formal grammar perspective, these state-to-state transitions can be considered to be production rules. Once the rules are extracted, a minimal unique set of states can be readily obtained. It can be shown that, for learning known production rules of regular grammars, the rules extracted are stable and independent of initial conditions and, at times, outperform the trained source neural network in terms of classification accuracy. Theoretical work has also shown that regular expression production rules can be easily inserted into certain types of RNNs and proved that the resulting systems are stable. Since for many problem areas such as finance, medicine, security, etc., black box models are not acceptable, the methods discussed here have the potential to uncover what the trained RNN is doing from a regular grammar and finite state machine perspective. We will discuss the strengths, weaknesses, and issues associated with using these and associated methods and applications such as verification.

Věra Kůrková

Title: Limitations of Shallow NetworksVěra KůrkováInstitute of Computer Science of the Czech Academy of Sciences

Although originally biologically inspired neural networks were introduced as multilayer computational models, shallow networks have been dominant in applications till the recent renewal of interest in deep architectures. Experimental evidence and successful applications of deep networks pose theoretical questions asking: When and why are deep networks better than shallow ones?

This lecture will present recent mathematical results describing high-dimensional tasks which either cannot be computed by reasonably sparse shallow networks or their computation is unstable. As minimization of the number of units in a shallow network is a hard nonconvex problem, we will focus on approximate measures of network sparsity defined in terms of suitable norms. We will show how geometrical properties of high-dimensional spaces imply lower bounds on network complexity. The bounds depend on sizes and covering numbers of dictionaries of computational units. Combining the general results with estimates of sizes of common dictionaries, we will derive large lower bounds on complexity of shallow networks needed for computation of almost any function on a sufficiently large domain. We will also consider non uniform distributions modeling relevance of computational tasks and derive consequences for choices of dictionaries of computational units suitable for efficient computation. To complement probabilistic results by constructive ones, we will present classes of functions built using Hadamard matrices and pseudo-noise sequences. We will use them to obtain examples of functions which can be computed by two-hidden-layer perceptron networks of considerably smaller model complexities than by networks with one hidden layer. Finally, we will discuss connections with the No Free Lunch Theorem and the central paradox of coding theory.

Erkki Oja

Title: Forty Years of Unsupervised Machine Learning

Unsupervised learning is a classical approach in artificial neural networks, pattern recognition and data analysis. Its importance is growing today, due to the increasing data volumes and the difficulty of obtaining labelled training data of sufficient quantity and quality, that could be used for supervised learning. The talk looks at the basic approaches during the past forty years, especially from the perspective of neural networks and machine learning. A widely used methodology are linear latent variable models, such as principal component analysis, independent component analysis, and nonnegative matrix factorizations. All can be presented as decompositions of the data matrix containing the unlabeled samples. Another widely used classical methodology is clustering, which also has a relation to matrix factorizations. In self-organizing maps, the clusters are ordered in a specific way. In deep learning, nonlinear latent variables can be found by autoencoders. Lately, using unsupervised adversarial networks for image synthesis has gained popularity.

Adam Miklósi

Title: Ethorobotics as an emerging discipline for building better social agentsMiklósi Ádám, Eötvös Loránd University, Budapest

Ethology is the biological study of animal behaviour, including humans. In recent years, social robotics aims to build autonomous agents that co-habit with humans in various social groups at the work place, hospitals or homes for elderly. Thus it is time to establish a new interdisciplinary approach that relies on more than 100 years of biological knowledge on animal behaviour and facilitates the construction of hardware and software for social robots.

Thus ethorobotics is defined as the science of applying animal social behavioural rules for the design of social robots interacting with living beings (animals or humans). This means that ethorobotics has strong roots in biology, looking at the function of the behaviour and considers often the embodiment (shape and form) rather as a consequence of achieving the best performance under given conditions.

The key example for ethorobotics is the family dog that has a long history of domestication, and proved to be very successful in human communities during the last 20-30 thousand years, despite being rather different in shape and also in behavioural and cognitive performance in comparison to humans.

After studying human-dog interaction for many years, we came to the conclusion that this relationship could provide a very good initial model for human-robot interaction. We consider the dog to be man’s first ’biorobot’. Thus we suggest that social robots of the future should be by no means similar to man but represent a "new species”.

We aim to present evidence how ethorobotics could promote building better social robots. Based on the detailed study of the behavioural aspects of human-dog relationship, we can make proposals for the behavioural capacities of social robots. These would include social skills, like attachment, faithfulness, emotional responsiveness, social monitoring.

The talk argues and demonstrates that the third generation of artificial neural networks, the spiking neural networks (SNN), can be used to design brain-inspired architectures that are not only capable of deep learning of temporal or spatio-temporal data, but also enabling the extraction of deep knowledge representation from the learned data. Similarly to how the brain learns time-space data, these SNN models do not need to be restricted in number of layers, neurons in each layer, etc. as it is the case with the traditional deep neural network architectures. The presented approach is illustrated on an exemplar SNN architecture NeuCube (free software and open source available from www.kedri.aut.ac.nz/neucube) and case studies of brain and environmental data modelling and knowledge representation using incremental and transfer learning algorithms These include predictive modelling of EEG and fMRI data measuring cognitive processes and response to treatment, AD prediction, BCI, human-human and human-VR communication and other. More details can be found in the recent book: Time-Space, Spiking Neural Networks and Brain-Inspired Artificial Intelligence, Springer,2019, https://www.springer.com/gp/book/9783662577134.

Danil Prokhorov

Title: Machine learning in the automotive world: from powertrains to autonomous vehicles and beyond

Machine learning in general and artificial neural networks in particular have always been a fascinating area of automotive R & D. Perhaps, this fascination is a reflection of a great contrast between traditionally slow advancements in a very conservative, regulated business in which costs of hardware dominate and rapid growth of software/high tech, which increasingly become the key in driving innovative automotive solutions. It is indeed appealing to be able to design and deploy systems with properties which may change radically without hardware changes, reprogramming or reconfiguring them by software instead.

Powertrain applications of machine learning continue to be few and far between due to legacy issues. In contrast, autonomous driving applications of machine learning promise to break with this tradition by introducing the major new technology as essentially an add-on to existing vehicles. I overview machine learning R & D for automotive applications over the past 20 years. I will give the eyewitness account of several examples and their lessons learned. I will also discuss important directions for future research.

It is proposed that the evolution of cortical structures in the vertebrate brain (neocortex and hippocampus) introduced novel computational principles that complement those realized in multi-layered feed-forward networks. A hall mark of cortical architectures is recurrence, the dense and reciprocal coupling among distributed feature specific neurons. Such networks engage in high dimensional non-linear dynamics exhibiting oscillatory activity in widely differing frequency ranges and complex correlation structures. Analysis of massive parallel recordings of neuronal responses in cat and monkey visual cortex suggests that the cerebral cortex exploits the high dimensional dynamic space offered by recurrent networks for the encoding, classification and storage of information. Evidence is presented that the recurrent connections among cortical neurons are susceptible to activity dependent modifications of their synaptic gain, which allows the network to store priors about the statistical contingencies of the outer world. Matching of sensory evidence with stored priors is associated with fast transitions towards sub-states of reduced dimensionality that are well classifiable by linear classifiers. In addition the network dynamics allow for the superposition and fast read out of information about sequentially presented stimuli, facilitating the encoding and storage of information about sequences. It is proposed that computations in high dimensional state space can account for the ultra-fast integration of sensory evidence with stored priors and the subsequent classification of the results of this matching operation.

Ichiro Tsuda

One of the most striking characteristics of the developing brain is the generation of functionally differentiated neural areas, while emerging interactions develop between networking areas, whereby the brain works as a whole. Functional differentiation is well known as, typically, Brodmann areas or as a functional map in that different areas represent different cognitive and behavioral functions. Recently, the functional parcellation of the human neocortex was observed by means of the functional connectivity of the dynamics involved in the corresponding neural networks, and was shown to consist of finer areas compared with the functional map. The presence of functional parcellation suggests that a self-organization of neural networks occurs rapidly, based on chaotic dynamics, under various constraints of behaviors. A similar self-organization of neural networks may also occur during the ontogenetic development of the brain under constraints, which may be stimulation by light and sound from the external environment or the physical pressure stemming from the individual’s own skull. In this respect, we hypothesize the existence of a common principle of self-organization with constraints in both functional differentiation and functional parcellation.

To clarify the neural mechanism of functional differentiation, we constructed a mathematical model of self-organization with constraints. By casting different constraints, we investigated the mathematical structures embedded into the process of functional differentiation at various stages of neuronal development and obtained the following dynamic behaviors. We observed the genesis of a neuron-like dynamical system in the developmental process of coupled dynamical systems. We found the genesis of neuron-like units that respond specifically to sensory stimuli. We also detected the genesis of functional modules from randomly uniform networks of oscillations, where one module can be interpreted as a “higher” level (such as a cognitive area) and the other can be interpreted as a “lower” level (such as a motor area interacting with the body). In all cases, the appearance of chaos and chaotic itinerancy in the whole network plays an essential role in the generation of functional elements.

The differentiation of both sensorimotor systems and memory systems is decisive for brain development. In this respect, we studied the neural networks of memory, and its dynamics. We found chaotic transitions between memories that were dynamically represented by attractors by introducing inhibitory neurons into the recurrent networks of excitatory neurons. In this situation, the transitions between attractors were described by chaotic itinerancy. This finding allowed the study of the dynamics of episodic memory formation in the hippocampus. We proposed a Cantor coding hypothesis, which was partially substantiated using hippocampal slices from rats. In my talk, I will first describe a theoretical framework of self-organization with constraints and compare it with conventional theories of self-organization. Next, I will deal with mathematical models at various levels of differentiation. Finally, I will discuss the memory system and its dynamics.