The idea of the workshop was in bringing together mathematicians and machine learning researchers working on audio- and deep learning related problems. The following topics have been proposed for the discussion:

I would like to highlight some of the presented work and share my notes and feelings.

I could recap the first big question we discussed as follows: "Canwe formalize something that already exists in data and impose this info into DNNs?" It seems that imposing domain knowledge to thenetworksis under a great and active discussion right now. Some examples here:

Irene Waldspurger presented her work entitled "Inversion of the wavelet transform modulus." Her talk was about audio reconstruction from scalagrams, specifically, from Cauchy wavelets, as well as about scattering transforms and the possibility of using them as initialization for CNNs.

The excellent talk on "Invariant and selective data representations with applications to Deep learning" given by Fabio Anselmi.

Other notable presentations were given by Joakim Andén and Vincent Lostanlen. They discussed the use of joint time-scattering transform for CNNs and scattering on the pitch spiral. They proposed to use a hierarchical CNN where filters of the first few layers are fixed and presented in the form of multiple scattering transforms. The joint scattering has been shown to be time-shift invariant, frequency transposition invariant and robust to time-warping deformations.

To my great joy, the topic of probabilistic networks and deriving optimal architectures came to the discussion several times:

Philipp Grohs discussed the variety of open theoretical questions in his talk "Deep Learning as a Mathematician" (you can find the slideshere)

Antoine Deleforge presented the work: "Reversed Mixture-of-Experts Networks for High- to Low-Dimensional Regression" about the low-dimensional estimation of high-dimensional data (for such tasks as sound source estimation or human pose estimation), and building inverse regression networks based on combining the mixture of experts with the final gating network.

Karen Ullrich gave a talk on Bayesian Networks with applications in sparcification and overconfidence evaluation.

The presentation entitled "Bayesian meter tracking on learned signal representations" given by Andre Holzapfel was related to probabilistic post-processing of the results obtained with CNNs.

Another topic that resonates well with my work is understanding and interpretability of the learned data representations and networks. It has been brought to consideration in many of the talks in one form or another, but the following lectures were devoted to this topic entirely:

Grégoire Montavon in his talk "Explaining the Predictions of Deep Neural Networks" presented several methods for network explanation such asTaylor decomposition and layer-wise relevance propagation (LRP). It's worth mentioning that they have a great online demo (http://heatmapping.org/). Also, they are organizing a workshop at NIPS on the topic of interpretability: http://www.interpretable-ml.org/nips2017workshop.

and I was talking about my research on multimodal musical instrument recognition.

There are many useful presentations left out of the scope of this short review but notwithstanding were very interesting and of high scientific quality.

I would like to thank Monika and Arthur for organizing such a great event and inviting me, as well as my supervisors Emilia and Gloria for giving me an opportunity to participate. It was, undoubtedly, useful and extremely educational.