Deep Neural Network Models for Image Classification and Regression

PhD candidate Salim Malek

PhD Candidate

Salim Malek

Abstract of Dissertation

Deep learning, a branch of machine learning, has been gaining ground in many research fields as well as practical applications. Such ongoing boom can be traced back mainly to the availability and the affordability of potential processing facilities, which were not widely accessible than just a decade ago for instance. Although it has demonstrated cutting-edge performance widely in computer vision, and particularly in object recognition and detection, deep learning is yet to find its way into other research areas. Furthermore, the performance of deep learning models has a strong dependency on the way in which these latter are designed/tailored to the problem at hand. This, thereby, raises not only precision concerns but also processing overheads. The success and applicability of a deep learning system relies jointly on both components. In this dissertation, we present innovative deep learning schemes, with application to interesting though less-addressed topics.

In this respect, the first covered topic is rough scene description for visually impaired individuals, whose idea is to list the objects that likely exist in an image that is grabbed by a visually impaired person, To this end, we proceed by extracting several features from the respective query image in order to capture the textural as well as the chromatic cues therein. Further, in order to improve the representativeness of the extracted features, we reinforce them with a feature learning stage by means of an autoencoder model. This latter is topped with a logistic regression layer in order to detect the presence of objects if any.

In a second topic, we suggest to exploit the same model, i.e., autoencoder in the context of cloud removal in remote sensing images. Briefly, the model is learned on a cloud-free image pertaining to a certain geographical area, and applied afterwards on another cloud-contaminated image, acquired at a different time instant, of the same area. Two reconstruction strategies are proposed, namely pixel-based and patch-based reconstructions.

From the earlier two topics, we quantitatively demonstrate that autoencoders can play a pivotal role in terms of both (i) feature learning and (ii) reconstruction and mapping of sequential data.

Convolutional Neural Network (CNN) is arguably the most utilized model by the computer vision community, which is reasonable thanks to its remarkable performance in object and scene recognition, with respect to traditional hand-crafted features. Nevertheless, it is evident that CNN naturally is availed in its two-dimensional version. This raises questions on its applicability to unidimensional data. Thus, a third contribution of this thesis is devoted to the design of a unidimensional architecture of the CNN, which is applied to spectroscopic data. In other terms, CNN is tailored for feature extraction from one-dimensional chemometric data, whilst the extracted features are fed into advanced regression methods to estimate underlying chemical component concentrations. Experimental findings suggest that, similarly to 2D CNNs, unidimensional CNNs are also prone to impose themselves with respect to traditional methods.

The last contribution of this dissertation is to develop new method to estimate the connection weights of the CNNs. It is based on training an SVM for each kernel of the CNN. Such method has the advantage of being fast and adequate for applications that characterized by small datasets.