Our
newly funded Alzheimer's Research UK project on Predicting the risk
of dementia, is developed in collaboration with University of
Manchester. The project was
recently covered by BBC. The research work concerns the
development of novel synergistic approaches to predicting dementia
based on Machine Learning (AI) and Statistical methods, and the
development of a prediction tool. There
are currently almost 1 million people in UK living with dementia.
There is currently no cure, and the condition has higher health and
social care costs than cancer, stroke and chronic heart disease,
taken together (dementia cost in UK being £26 billion per
year). Current
thinking suggests that 35% of cases of dementia could be prevented.
Our research project aims to contribute to prevention, and to
helping improve diagnosis rates (currently at least one third of
expected patients don't receive a dementia diagnosis) through
predicting risk of dementia with new machine learning and
statistical based approaches. Dr
Daniel Stamate leads on the Machine Learning aspects of the study.

Daniel
Stamate secured 54,000 Euro funding for Data Science research and PG
learning development collaboration in an Erasmus+ mobility with
National Research Tomsk State University from Russia for 2017-19.
The Russian staff and PG student visitors will contribute within our
Lab's research.

There has been an
increasing interest recently in examining the possible relationships
between emotions expressed online and stock markets. Most of the
previous studies claiming that emotions have predictive influence on
the stock market do so by developing various machine learning
predictive models, but do not validate their claims rigorously by
analysing the statistical significance of their findings. In turn,
the few works that attempt to statistically validate such claims
suffer from important limitations of their approaches.

Growing research analyses the
relationship between sentiment-filled online information and the
stock market, and shows a tendency for the former to predict the
latter. But little is known if this information's predictive power
resolves uncertainty. Rather, it is believed that it induces
volatility because investors over-react or under-react to new
information as a result of sentimental contagion.

In particular, stock market data
exhibit erratic volatility, and this time-varying volatility makes
any possible relationship between these variables non-linear. Our
work investigates and propose novel frameworks based on approaches
that account for non-linearity and heteroscedasticity. We study also
the asymmetric nature of influences of positive and negative
sentiments on the stock market volatility.

Current research is extended also
towards financial fraud detection with NLP and ML approaches, more
details to follow.

A.
Predicting risk of dementiaParticipants:
Daniel Stamate, Fionn Murtagh, in collaboration with Dr David Reeves,
and the project team he leads at the Centre for Primary Care in the
Institute for Population Health, University of Manchester, and other
academic partners.

Our
Lab's teamleads
on the Machine Learning aspects of the study based on our
newly funded Alzheimer's Research UK project on Predicting the risk
of dementia using routine primary care records, which is
developed in collaboration with University of Manchester and other
academic partners. The project got recent media coverage
at BBC. The research work concerns the development of novel
synergistic approaches to predicting dementia based on Machine
Learning (AI) and Statistical methods, and the development of a
prediction tool. There
are currently almost 1 million people in UK living with dementia.
There is currently no cure, and the condition has higher health and
social care costs than cancer, stroke and chronic heart disease,
taken together (dementia cost in UK being £26 billion per
year). Current
thinking suggests that 35% of cases of dementia could be prevented.
Our research project aims to contribute to prevention, and to helping
improve diagnosis rates (currently at least one third of expected
patients don't receive a dementia diagnosis) through predicting risk
of dementia with new machine learning and statistical based
approaches. The main source of data to be analysed in this project is
the Clinical Practice Research Datalink (CPRD).

Prediction Modelling and
Pattern Detection Approaches for the First-Episode Psychosis
Associated to Cannabis UseRecent
studies show that cannabis is one of the most popular drugs in the
world. Many countries have started to legalise it. However, recent
research work demonstrates that the consumption of cannabis is a
significant risk factor for various types of psychosis. As such,
research efforts are currently made to improve the estimation of
cannabis contribution to the psychosis development. In this ongoing
research we apply data science methodologies based on scalable
machine learning and statistical learning to devise novel approaches
to the prediction of the first-episode psychosis attributable to the
use of high potency cannabis, and the quantification of risk
factors, based on phenotype data. Genotype data is to be added to
the analysis in a next phase of the research. The work is performed
in collaboration with the teams of Dr
Marta Di Forti, Prof
Sir Robin Murray, and Prof
Daniel Stahl at the Institute of Psychiatry, Psychology &
Neuroscience, King's College London.

Machine learning
approaches to predicting pattern formation and agnostic clustering
of general population using clinical data and data from mobile
applicationsModern
psychiatric classification systems categorize psychiatric disorders
–partly evidence-based; largely pragmatically– based on
different combinations of required number of symptom domains that
exceed the operational threshold of severity. This taxonomy endorses
unique phenotypes with precise boundaries. A prevailing trend in
psychiatry has been to reify these categorical diagnoses. However,
efforts to discriminate these psychiatric disorders, using modern
genetic and neuroimaging data, have thus far failed to deliver a
promising outcome. Evidence indicates commonality rather than
distinction. The Experience Sampling Method (ESM), a personal diary
method to assess mental states in real-time, provides a unique
opportunity to observe these subtle fluctuations of mental states.
It has various advantages over the conventional method of
cross-sectional assessment of psychopathology based on self-report
questionnaires: high ecological validity, high reliability, no
recall bias, high temporal resolution, and contextual information.
However, this intense assessment strategy produces a massive amount
of information at an individual level. As such, even modern
statistical approaches sometimes fail to provide optimal solutions
to deal with the complexity of data at this scale. Machine learning
offers enhanced solutions for this kind of research challenges when
synergistically combined with more traditional statistical methods.
The aim of this study is to propose anew
and cutting edge statistical and machine learning approach to
predicting pattern formation and agnostic clustering of general
population using generic ESM data collected with mobile apps. The
new work is developed in collaboration with Dr
Sinan Guloksuz, Prof Jim van Os, and Dr Philippe Delespaul in
the Department of Psychiatry and Neuropsychology at Maastricht
University Medical Centre, and Prof
Daniel Stahl in the Department of Biostatistics and Health
Informatics, King's College London.

C.
Machine and Statistical Learning Modelling to Understand
Heterogeneous Manifestations of Asthma in Early Life Participants:
Daniel Stamate, in collaboration the team of Prof Adnan Custovic,
Department of Medicine at Imperial College London

Wheezing
is common among children and ~50% of those under 6 years of age are
thought to experience at least one episode of wheeze. However, due to
the heterogeneity of symptoms there are difficulties in treating and
diagnosing these children. ‘Phenotype specific therapy’
is one possible avenue of treatment, whereby we use significant
pathology and physiology to identify and treat pre-schoolers with
wheeze. By performing feature selection algorithms and predictive
modelling techniques, this study will attempt to determine if it is
possible to robustly distinguish patient diagnostic categories among
pre-school children. Univariate feature analysis identified more
objective variables and recursive feature elimination a larger number
of subjective variables as important in distinguishing between
patient categories. Predicative modelling sees a drop in performance
when subjective variables are removed from analysis, indicating that
these variables are important in distinguishing wheeze classes.
Current results show 90%+ performance in AUC, sensitivity,
specificity, and accuracy, and 80%+ in kappa statistic, in
distinguishing ill from healthy patients. Developed in a synergistic
statistical - machine learning approach, our methodologies propose
also a novel ROC Cross Evaluation method for model post-processing
and evaluation. The predictive modelling's stability is assessed in
computationally intensive Monte Carlo simulations.

Forthcoming
work concerns proposing and expanding a novel methodology based on
unsupervised learning / clustering to address the heterogeneity
nature and the identification of sub-categories of asthma. This work
is to be developed in collaboration with the team of Prof
Adnan Custovic, Department of Medicine at Imperial College
London.

Soft
Computing involves various advances in Algorithmics which are
specific to the nature of this computing paradigm. This theme
addresses the need for efficiency in solving optimisation problems or
the need for offering tractable solutions for specific NP-hard
problems by employing Evolutionary Computing approaches, in
particular using hybrid evolutionary approaches or parallel
evolutionary approaches.

On the other hand, devising
efficient algorithms for integrating, querying and performing
inferences with imperfect information benefits of Soft Computing
approaches as those based on multi-valued logics, and this is another
direction we follow in our research. We
provide algorithms for computing the semantics of the integrating,
querying or inference rules that describes the result of these
processes, and for deciding the query equivalence problem, which is
useful in the query optimisation problem.

Moreover, statistical simulations
are a useful Soft Computing tool that we employ for assessing new
algorithms we propose for improving the time-efficiency in blocking
expanding ring search for mobile ad hoc networks, or for various
concurrency problems.

In the process of
constructing decision trees, the criteria for selecting the splitting
attributes influence the performance of the model produced by the
decision tree algorithm. The most well-known criteria such as Shannon
entropy and Gini index, suffer from the lack of adaptability to the
datasets. This project investigates families of parameterised
impurities that we propose, to be used in the construction of
optimised decision trees. These criteria rely on families of strict
concave functions that define the new generalised parameterised
impurity measures which we applied in devising and implementing our
PIDT novel decision tree algorithm. We investigate also novel
statistical based approaches for preventing overfitting with pruning,
and we proposed the so-called S-pruning procedure. The PIDT algorithm
was evaluated on a number of simulated and benchmark datasets with
good results. Experimental results suggest that by tuning the
parameters of the impurity measures and by using our S-pruning
method, we obtain better decision tree classifiers. Ongoing work
investigates the extension of these techniques to ensemble based
predictive models based on parametrised families of impurities.