Metabolomic modelling and applications in inflammatory bowel diseases

Abstract

Background: Metabolomics experiments typically produce high dimensional data and its handling is an extremely important step in data pre-treatment. Metabolomics is an indispensable research tool for the identification and tracking of biomarkers in biological systems. In a typical metabolomics study, complex extracts or body fluids are analysed and compared by various methods to generate metabolic fingerprints. Crohnâ€™s Disease (CD) and ulcerative colitis (UC) are major components of Inflammatory Bowel Disease (IBD), a multifactorial disorder most likely resulting from altered immune response to commensal or pathogenic gut microbes under the influence of environmental factors, such as diet. Exclusive Enteral Nutrition (EEN) is the most common treatment for paediatric CD in the UK and the rest of Europe. Non-invasive metabolomics approaches could be used to diagnose and differentiate between related diseases. This could enhance disease control, management and patient compliance. It is known that gut microbiota may discriminate IBD subtypes from each other, therefore, metabolomics of faecal extracts was used to examine metabolites in faeces many of which result from the activity of gut microbiota and thus to differentiate between IBD subtypes and healthy controls as well as within IBD subtypes.Methodology: This study investigated the effect of pre-treatment strategies on data set derived from LC-MS based metabolomics experiments. Different methods of imputing missing values were examined in conjugation with various scaling and transformation methods. SIMCA-P 14 was used to evaluate the model parameters for each pre-treatment method.In this thesis, metabolomics was employed in various studies to assess metabolite biomarkers associated with healthy controls and IBD diseases. All the studies employed liquid chromatography-mass spectrometry (LC-MS) on an Orbitrap Exactive mass analyser, and using ZIC-pHILIC or/and C18 analytical columns. Data was acquired using XCalibur software and metabolite identification was ascertained based on accurate mass detection, retention time comparisons with authentic external standards, and database searching. The acquired data was analysed using both unsupervised (PCA-X) and supervised (OPLS-DA) models in SIMCA in order to determine discriminating metabolite biomarkers responsible for the observed clustering patterns.Results: Compared to the various imputation methods used in this study, NIPALS algorithm along with suitable transformation and scaling was significantly better according to the model parameter evaluation Pareto (Par) as scaling and Log transformation were better able to explain the data. The OPLS-DA model was able to discriminate the CD samples from the controls at different time points after the commencement of treatment .The models were not able to differentiate the CD samples from one another at the different time points during treatment with exclusive enteral nutrition. The metabolites identified in the CD samples which varied between CD samples and controls included tyrosine, an ornithine isomer, arachidonic acid, eicosatrienoic acid, docosatetraenoic acid, a sphingomyelin, a ceramide, and dimethylsphinganine. Similarly, the OPLS-DA model was able to discriminate the CD samples from the UC. Based on VIP values, the top 10 metabolites were used in the OPLS-DA model, and there was a clear separation between CD and UC with p CV-ANOVA = 5.30541e-007.Conclusion: The SIMCA-P 14â€™s (NIPALS) default logarithm was the only imputation methodology that generated a valid model according to valid criteria (R2-Q2