Jungsik Hwang

P-VMDNN @ ICDL-EPIROB 2017

Last September, I attended the ICDL-EPIROB 2017 which was held in Lisbon, Portugal. I really enjoyed the conference and the beautiful city and I also had a great time with my old & new friends :)

This time, I was lucky to have an opportunity for the oral presentation. I had my presentation on my recent study on visuomotor learning, particularly on P-VMDNN (Predictive Visuo-Motor Deep Dynamic Neural Network).

The manuscript, the presentation slides (pdf) and the supplementary videos that I used during my talk can be downloaded from below:

During/after my presentation, I received many insightful questions about the model. It might be helpful to share those questions (and my answers) for those who are interested in the model. Below, you can find those questions and my short answers. Some of the questions might be examined more deeply in the future.

Questions from ICDL-EPIROB 2017

What’s the developmental aspect of imitation in my study?

In this study, I haven’t dealt with the developmental aspect of imitation. Instead, the robot was explicitly tutored by the experimenter (kinesthetic teaching). In the future study, the developmental aspects will be also considered.

How do you train the model?

During the training process, the model was trained to generate 1-step look-ahead predictions with BPTT (Backpropagation through time) algorithm on TensorFlow. The model’s learnable parameters (called variables in TensorFlow) including weights, biases and the initial states were optimized to minimize the prediction error at both pathways. Note that the different initial states are obtained for each training dataset.

Lateral connection between the two pathways

I received a lot about this issue. Currently, the P-VMDNN model only has lateral connection at the highest level layers. Recently, the model was extended further to have the lateral connections between the different levels. Since it is still under-reviewed for publication, I can’t explain in detail here. But according to Fuster’s model, such lateral connections seem also biologically plausible.

What’s the size of the model? training time? computational cost? and is online training possible?

Computational Cost. Training is not demanding, but online prediction error minimization is a bit difficult.

During the error regression, why don’t you just feed observation to input of the model?

If I just feed the visual observation to the input of the model, then the model’s neuronal activation will be driven by those values (i.e. sensory entrainment). We have a different perspective on visual perception. In our model, perception is considered as an active action of updating the internal states while minimizing prediction error. In other words, model’s neuronal activation is driven by the error minimization mechanism. It might be also interested to investigate the balance between these two approaches.

How does the model generate the visual prediction?

The visual prediction is generated in the visual pathway which is based on the P-MSTRNN (Predictive Multiple Spatio-Temporal Scales RNN). A bit more specifically, the convolution operation with padding is conducted. For more information about P-MSTRNN, please refer this paper.

Compression occurs in which pathway?

Both pathways. While the visuomotor pattern is processed along the hierarchy (from lower to higher level), the model extracts the latent features in the pattern. That is, the lower level layers encode specific information whereas the higher level layers encode abstract information of the visuo-proprioceptive patterns.

What’s the meaning of internal states?

The internal states refer the neuron’s values (sum of inputs) before the activation function. Particularly, the internal states of the very first time step is called the initial states. The initial states are obtained from the training process. After the training, each training data has different initial states. So if you set the specific initial states, the model will generate the corresponding visuomotor sequence. Therefore, the initial states can be interpreted as intention sometimes.

Do I have to set the initial states in every layer?

Yes. If you set the initial states in the higher level layers only, it might not work well. In the extension of this study, it was found that the higher level layer was encoding the type of gesture while the lower level layer was encoding the specific appearance of the human subject. In this case, setting the initial states in the entire layer is more natural. It is like telling the model about what and whose gesture to mentally simulate.

What would happen if I just choose random/neutral initial states that are far from the obtained ones?

It is highly likely that the model will not mentally simulate the visuo-proprioceptive patterns properly. But also, there is a chance that some kinds of novel visual & proprioceptive predictions can be made. I’ve been working on the similar topic and this will be dealt more thoroughly later.

Scaling the model to more complex tasks?

Well, I think that the model can be extended to more complex tasks. Indeed, the extension of this study is under-review so I can’t describe in detail right now.. But in that study, the model was successfully extended to a more complex task including a different number of human subjects’ gestures.

Last September, I attended the ICDL-EPIROB 2017 which was held in Lisbon, Portugal. I really enjoyed the conference and the beautiful city and I also had a great time with my old & new friends :)

This time, I was lucky to have an opportunity for the oral presentation. I had my presentation on my recent study on visuomotor learning, particularly on P-VMDNN (Predictive Visuo-Motor Deep Dynamic Neural Network).

The manuscript, the presentation slides (pdf) and the supplementary videos that I used during my talk can be downloaded from below:

Basic Information

When I train my neural network models, I often use the method called “Softmax Transformation”. It is the method of representing the training data into the sparse form. When I first learned how to do it, I had some troubles ‘cause there wasn’t enough example about how to do it. And still, I can’t find nice explanation about the softmax transformation with examples. So here I have some brief explanation and sample codes for the softmax transformation. Let’s see how the softmax transformation works step by step.

One of the difficulties that I faced when I joined Cognitive Neurorobotics lab was that I wasn’t familiar with the terms used in the lab. Some terms are from the field of dynamics and some other terms were “coined” by my advisor (Prof. Tani). So, it took me quite a time to understand them. I guess it might be a bit more difficult for other people sometimes.

So, I’d like to briefly explain those terms that can be frequently found in my studies on “cognitive neurorobotics” or in Tani’s book (“Exploring Robotic Minds Actions, Symbols, and Consciousness as Self-Organizing Dynamic Phenomena”). This post is targeted to the general audiences (someone like me five years ago). So those terms won’t be explained in a great detail. Instead, I’ll just try to give a general idea about them.