Abstract: In this paper we present a database of fundamental frequency series
for singing performances to facilitate comparative analysis of algorithms
developed for singing assessment. A large number of recordings have been
collected during conservatory entrance exams which involves candidates’
reproduction of melodies (after listening to the target melody played on the
piano) apart from some other rhythm and individual pitch perception related
tasks. Leaving out the samples where jury members’ grades did not all agree,
we deduced a collection of 1018 singing and 2599 piano performances as
instances of 40 distinct melodies. A state of the art fundamental frequency (f0)
detection algorithm is used to deduce f0 time-series for each of these recordings
to form the dataset. The dataset is shared to support research in singing
assessment. Together with the dataset, we provide a flexible singing assessment
system that can serve as a baseline for comparison of assessment algorithms.

September 14, 2017

Abstract: Broadcast is a common operation in machine learning and widely used in calculating bias or subtracting maximum
for normalization in convolutional neural networks. Broadcast
operation is required when two tensors possibly with different
number of dimensions, hence with different number of elements,
are input to an element-wise function. Tensors are scaled in
process so that the two tensors match in size and dimension.
In this research, we introduce a new broadcast functionality for
matrices to be used on CUDA enabled GPU devices. We further
extend this operation to multidimensional arrays and measure its
performance against the implementation available in the Knet
deep learning framework. Our final implementation provides
up to 2x improvement over the Knet broadcast implementation,
which only supports vector broadcast. Our implementation can
handle broadcast operations with any number of dimensions.

September 04, 2017

Abstract: We address the problem of object recognition from RGB-D images using deep convolutional
neural networks (CNNs). We advocate the use of 3D CNNs to fully exploit the
3D spatial information in depth images as well as the use of pretrained 2D CNNs to learn
features from RGB-D images. There exists currently no large scale dataset available
comprising depth information as compared to those for RGB data. Hence transfer learning
from 2D source data is key to be able to train deep 3D CNNs. To this end, we propose
a hybrid 2D/3D convolutional neural network that can be initialized with pretrained 2D
CNNs and can then be trained over a relatively small RGB-D dataset. We conduct experiments
on the Washington dataset involving RGB-D images of small household objects.
Our experiments show that the features learnt from this hybrid structure, when fused with
the features learnt from depth-only and RGB-only architectures, outperform the state of
the art on RGB-D category recognition.

Abstract: Symbol grounding is the problem of associating symbols from
language with a corresponding referent in the environment. Traditionally,
research has focused on identifying single objects
and their properties. The ReGround project hypothesizes that
the grounding process must consider the full context of the environment,
including multiple objects, their properties, and relationships
among these objects. ReGround targets the development
of a novel framework for “affordance grounding”, by
which an agent placed in a new environment can adapt to its
new setting and interpret possibly multi-modal input in order to
correctly carry out the requested tasks.

Abstract.
We introduce context embeddings, dense vectors derived from a language model that represent the left/right context of a word instance, and demonstrate that context embeddings significantly improve the accuracy of our transition based parser. Our model consists of a bidirectional LSTM (BiLSTM) based language model that is pre-trained to predict words in plain text, and a multi-layer perceptron (MLP) decision model that uses features from the language model to predict the correct actions for an ArcHybrid transition based parser. We participated in the CoNLL 2017 UD Shared Task as the ``Koç University'' team and our system was ranked 7th out of 33 systems that parsed 81 treebanks in 49 languages.

May 17, 2017

Our neural net based dependency parser was number 7 overall out of 33 teams participating in the CoNLL 2017 Shared Task "Multilingual Parsing from Raw Text to Universal Dependencies" in which participating teams had to parse 68 corpora in 50 languages. I would like to thank Ömer Kırnap and Berkay Furkan Önder for their contributions and all-nighters.
Full post...