Coral species, with complex morphology and ambiguous boundaries, pose a great challenge for automated classification. CNN activations, which are extracted from fully connected layers of deep networks (FC features), have been successfully used as powerful universal representations in many visual tasks. In this paper, we investigate the transferability and combined performance of FC features and CONV features (extracted

In optimization-based signal processing, the so-called prior term models the desired signal, and therefore its design is the key factor to achieve a good performance. For audio signals, the time-directional total variation applied to a spectrogram in combination with phase correction has been proposed recently to model sinusoidal components of the signal. Although it is a promising prior, its applicability might be restricted to some extent because of the mismatch of the assumption to the signal.

Random distortion testing (RDT) addresses the problem of testing whether or not a random signal deviates by more than a specified tolerance from a fixed value. The test is non-parametric in the sense that the distribution of the signal under each hypothesis is assumed to be unknown. The signal is observed in independent and identically distributed (i.i.d) additive noise. The need to control the probabilities of false alarm and missed de- tection while reducing the number of samples required to make a decision leads to the SeqRDT approach.

In the context of Cued Speech (CS) recognition, the recognition
of lips and hand movements is a key task. As we know, a good
temporal segmentation is necessary for the supervised recog-
nition system. However, lips and hand streams cannot share
the same temporal segmentation since they are not synchro-
nized. In this work, we propose a hand preceding model to
predict temporal segmentations of hand movements automati-
cally by exploring the relationship between hand preceding time

The scarcity of emotional speech data is a bottleneck of developing automatic speech emotion recognition (ASER) systems. One way to alleviate this issue is to use unsupervised feature learning techniques to learn features from the widely available general speech and use these features to train emotion classifiers. These unsupervised methods, such as denoising autoencoder (DAE), variational autoencoder (VAE), adversarial autoencoder (AAE) and adversarial variational Bayes (AVB), can capture the intrinsic structure of the data distribution in the learned feature representation.

This paper presents the formulation and analysis of a novel distributed maximum likelihood algorithm that utilizes a first-order optimization scheme. The proposed approach utilizes a static average consensus algorithm to reach agreement on the initial condition to the iterative optimization scheme and a dynamic average consensus algorithm to reach agreement on the gradient direction. The current distributed algorithm is guaranteed to exponentially recover the performance of the centralized algorithm.