Abstract: In this paper we present a database of fundamental frequency series
for singing performances to facilitate comparative analysis of algorithms
developed for singing assessment. A large number of recordings have been
collected during conservatory entrance exams which involves candidates’
reproduction of melodies (after listening to the target melody played on the
piano) apart from some other rhythm and individual pitch perception related
tasks. Leaving out the samples where jury members’ grades did not all agree,
we deduced a collection of 1018 singing and 2599 piano performances as
instances of 40 distinct melodies. A state of the art fundamental frequency (f0)
detection algorithm is used to deduce f0 time-series for each of these recordings
to form the dataset. The dataset is shared to support research in singing
assessment. Together with the dataset, we provide a flexible singing assessment
system that can serve as a baseline for comparison of assessment algorithms.

September 14, 2017

Abstract: Broadcast is a common operation in machine learning and widely used in calculating bias or subtracting maximum
for normalization in convolutional neural networks. Broadcast
operation is required when two tensors possibly with different
number of dimensions, hence with different number of elements,
are input to an element-wise function. Tensors are scaled in
process so that the two tensors match in size and dimension.
In this research, we introduce a new broadcast functionality for
matrices to be used on CUDA enabled GPU devices. We further
extend this operation to multidimensional arrays and measure its
performance against the implementation available in the Knet
deep learning framework. Our final implementation provides
up to 2x improvement over the Knet broadcast implementation,
which only supports vector broadcast. Our implementation can
handle broadcast operations with any number of dimensions.

September 04, 2017

Abstract: We address the problem of object recognition from RGB-D images using deep convolutional
neural networks (CNNs). We advocate the use of 3D CNNs to fully exploit the
3D spatial information in depth images as well as the use of pretrained 2D CNNs to learn
features from RGB-D images. There exists currently no large scale dataset available
comprising depth information as compared to those for RGB data. Hence transfer learning
from 2D source data is key to be able to train deep 3D CNNs. To this end, we propose
a hybrid 2D/3D convolutional neural network that can be initialized with pretrained 2D
CNNs and can then be trained over a relatively small RGB-D dataset. We conduct experiments
on the Washington dataset involving RGB-D images of small household objects.
Our experiments show that the features learnt from this hybrid structure, when fused with
the features learnt from depth-only and RGB-only architectures, outperform the state of
the art on RGB-D category recognition.