Clustering algorithms are extensively used on patient tissue samples in order to group and visualize the
microarray data. The high dimensionality and probe specific noise make the selection of the appropriate
clustering algorithm an uneasy task. This study presents a large-scale analysis of three clustering algorithms:
k-means, hierarchical clustering (HC) and evidence accumulation clustering (EAC) on thirty-five cancer gene
expression data sets selected to benchmark the performance of the clustering algorithms. Separated
performance analysis was done on data sets from Affymetrix and cDNA chip platforms to examine the
possible influence of the microarray technology. The study revealed no consistent algorithm ranking can be
inferred, though in general EAC presented the best compromise of adjusted rand index (ARI) and variance.
However, the results indicated that ARI variance under repeated k-means initializations offers useful
information on the need to implement more complex clustering techniques. If repeated K-means converges
to the same partition, also confirmed by the HC clustering, there is no need to run EAC. However, under
moderate or highly variable ARI in repeated K-means, EAC should be used to reduce the uncertainty of
clustering and unveil the data structure.

Biometric identification is the task of recognizing an individual using biological or behavioral traits and, recently, electrocardiogram has emerged as a prominent trait. In addition, deep learning is a fast-paced research field where several models, training schemes and applications are being actively investigated. In this paper, an ECG-based biometric system using a deep autoencoder to learn a lower dimensional representation of heartbeat templates is proposed. A superior identification performance is achieved, validating the expressiveness of such representation. A transfer learning setting is also explored and results show practically no loss of performance, suggesting that these deep learning methods can be deployed in systems with offline training.

In this work, we present view-point invariant person re-identification (Re-ID) by multi-modal feature fusion of 3D soft biometric cues. We exploit the MS Kinect sensor v.2, to collect the skeleton points from the walking subjects and leverage both the anthropometric features and the gait features associated with the person. The key proposals of the paper are two fold: First, we conduct an extensive study of the influence of various features both individually and jointly (by fusion technique), on the person Re-ID. Second, we present an actual demonstration of the view-point invariant Re-ID paradigm, by analysing the subject data collected in different walking directions. Focusing the latter, we further analyse three different categories which we term as pseudo, quasi and full view-point invariant scenarios, and evaluate our system performance under these various scenarios. Initial pilot studies were conducted on a new set of 20 people, collected at the host laboratory. We illustrate, for the first time, gait-based person re-identification with truly view-point invariant behaviour, i.e. the walking direction of the probe sample being not represented in the gallery samples.

The unprecedented collection and storage of data in electronic format has given rise to an interest in automated analysis for generation of knowledge and new insights. Cluster analysis is a good candidate since it makes as few assumptions about the data as possible. A vast body of work on clustering methods exist, yet, typically, no single method is able to respond to the specificities of all kinds of data. Evidence Accumulation Clustering (EAC) is a robust state of the art ensemble algorithm that has shown good results. However, this robustness comes with higher computational cost. Currently, its application is slow or restricted to small datasets. The objective of the present work is to scale EAC, allowing its applicability to big datasets, with technology available at a typical workstation. Three approaches for different parts of EAC are presented: a parallel GPU K-Means implementation, a novel strategy to build a sparse CSR matrix specialized to EAC and Single-Link based on Minimum Spanning Trees using an external memory sorting algorithm. Combining these approaches, the application of EAC to much larger datasets than before was accomplished.

Electrocardiogram (ECG) biometrics are a relatively recent trend in biometric recognition, with at least 13 years of development in peer-reviewed literature. Most of the proposed biometric techniques perform classification on features extracted from either heartbeats or from ECG based transformed signals. The best representation is yet to be decided. This paper studies an alternative
representation, a dissimilarity space, based on the pairwise dissimilarity between templates and subjects´ signals. Additionally, this representation can make use of ECG signals sourced from multiple leads. Configurations of three leads will be tested and contrasted with single-lead experiments. Using the same k-NN classifier the results proved superior to those obtained through a
similar algorithm which does not employ a dissimilarity representation. The best Authentication EER went as low as 1.53% for a database employing 503 subjects. However, the employment of extra leads did not prove itself advantageous.

This paper proposes a patch-based method to address two of the core problems in image processing: denoising
and inpainting. The approach is based on a Gaussian mixture model estimated exclusively from the observed
image via the expectation-maximization algorithm, based on which the minimum mean squared error estimate
is computed in closed form. The results show that this simple method is able to perform on the same level as
other state-of-the-art algorithms.

Atrial fibrillation (AF) is the most common type of arrhythmia. This work presents a pattern analysis
approach to automatically classify electrocardiographic (ECG) records as normal sinus rhythm or AF. Both
spectral and time domain features were extracted and their discrimination capability was assessed
individually and in combination. Spectral features were based on the wavelet decomposition of the signal
and time parameters translated heart rate characteristics. The performance of three classifiers was evaluated:
k-nearest neighbour (kNN), artificial neural network (ANN) and support vector machine (SVM). The MITBIH
arrhythmia database was used for validation. The best results were obtained when a combination of
spectral and time domain features was used. An overall accuracy of 99.08 % was achieved with the SVM
classifier.

The potential of the electrocardiographic (ECG) signal as a biometric trait has been ascertained in the literature
over the past decade. The inherent characteristics of the ECG make it an interesting biometric modality,
given its universality, intrinsic aliveness detection, continuous availability, and inbuilt hidden nature. These
properties enable the development of novel applications, where non-intrusive and continuous authentication
are critical factors. Examples include, among others, electronic trading platforms, the gaming industry, and
the auto industry, in particular for car sharing programs and fleet management solutions. However, there are
still some challenges to overcome in order to make the ECG a widely accepted biometric. In particular, the
questions of uniqueness (inter-subject variability) and permanence over time (intra-subject variability) are still
largely unanswered. In this paper we focus on the uniqueness question, presenting a preliminary study of our
biometric recognition system, testing it on a database encompassing 618 subjects. We also performed tests
with subsets of this population. The results reinforce that the ECG is a viable trait for biometrics, having
obtained an Equal Error Rate of 9:01% and an Error of Identification of 15:64% for the entire test population.

Alzheimer's disease is a type of dementia that mainly affects elderly people, with unknown causes and no effective treatment up to date. The diagnosis of this disease in an earlier stage is crucial to improve patients' life quality. Current techniques focus on the analysis of neuroimages, like FDG-PET or MRI, to find changes in the brain activity. While high accuracies can be obtained by combining the analysis of several types of neuroimages, they are expensive and not always available for medical analysis. Achieving similar results using only 3-D FDG-PET scans is therefore of huge importance. While directly applying classifiers to the FDG-PET scan voxel intensities can lead to good prediction accuracies, it results in a problem that suffers from the curse of dimensionality. This paper thus proposes a methodology to identify regions of interest by segmenting 3-D FDG-PET scans and extracting features that represent each of those regions of interest, reducing the dimensionality of the space. Experimental results show that the proposed methodology outperforms the one using voxel intensities despite only a small number of features is needed to achieve that result.

In a previous work, we proposed a semantic relatedness measure between scientific concepts, using Wikipedia
categories network as an ontology, based on the length of the category path. After observing substantial differences
in the arc density of the categories network, across the whole graph, it was concluded that these
irregularities in the ontology density may lead to substantial errors in the computation of the semantic relatedness
measure. Now we attempt to correct for this bias and improve this measure by adding the notion
of ontology density and proposing a new semantic relatedness measure. The proposed measure computes a
weighed length of the category path between two concepts in the ontology graph, assigning a different weight
to each arc of the path, depending on the ontology density in its region. This procedure has been extended to
measure semantic relatedness between entities, an entity being defined as a set of concepts.