biology essay

Non Synonymous Variation Of Cases And Controls Biology Essay

Published: 23, March 2015

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

In this study we present the analysis of 288 sequences of complete mitochondrial DNA (mtDNA) genomes of Alzheimer's disease (AD) patients (cases), Parkinson's disease (PD) patients (cases) and Japanese centenarians (JC) (controls), all from populations of Japanese origin. Seeking to find as much evidence as we could, in order to support the mtDNA mutational load hypothesis and its association with the above neurodegenerative diseases we acquired the subsequent results: the multiple rare variants that appear in the mtDNA sequences of our cases and controls do not constitute a dominant risk factor for the etiology of AD and PD. In the various statistical tests we carried out we did not obtain significant results that would associate our rare variants with an increased susceptibility towards developing either of the diseases. Thus, we are not able to indicate that the mutational load hypothesis is a strongly supported scenario that can play a leading role in the pathogenicity of AD and PD.

introduction

Neurodegenerative disorders raise an increased concern as the years go by, not only in the scientific community but in public health as well. Sporadic late onset neurodegenerative diseases such as Alzheimer's disease (AD) and Parkinson's disease (PD) appear to turn into some of the dominant public health issues, specifically to Western populations over the age of 60. Several factors have been blamed for the etiology of those two diseases such as the environment, the behavior and of course the genetic background of the patients (Mayeux 2003, Joanna's paper). Albeit all these, a very complex covers everything in this field so far. There has been done lots of remarkable research the decade and maybe more, on discovering the real pathogenicity factors of AD and PD, but things are even more indefinable one might say than before. It is strongly believed that one of the main risk factors of AD and PD is age, something that connects immediately our diseases with the mitochondrial dysfunction, since the decreased mitochondrial function acts contributing significantly to the aging process of the organism (Anita Lakatos, ADNI).

At a molecular basis, mitochondrial DNA (mtDNA), is asexually inherited and its mutation rate is really high, with some of mutations consisting an important cause of neuromuscular disease. These mtDNA polymorphisms might be the connection bond that constitutes the genetic factor of the many ones that interact so as someone to develop AD or PD. Nowadays, scientists try to find how these mtDNA polymorphisms act towards the diseases, in two ways. The first is the common disease-common variants hypothesis, which is based on the common polymorphisms that define the various mitochondrial haplogroups. The second one is the mutational load hypothesis, which involves the accumulated multiple rare variants in the various mtDNA loci and how these variants can confer extra susceptibility to the common neurodegenerative diseases.

In this project, our main occupation is to shed a light to the second hypothesis as we presented above so as to analyze with the means that we were given if the existed evidence that these rare variants affect the protein function can be scientifically strongly supported.

AIMS

Data acquisition

To perform a case-control study using published mitochondrial DNA (mtDNA) sequences, to see if the mtDNA variants in AD and PD cases and controls are different. This required obtaining data from databases that contain complete mtDNA sequences. The cases have to be sequences from patients that suffer from the disease we are studying and the controls have to be sequences from healthy matched people, in this case it was most important that they were age matched as the most important risk factor in AD and PD is advanced age. We have to examine the quality of the data as well, in order to exclude sequences that contain suspicious fragments and other errors that would probably alter the results of our research.

Examination of the non-synonymous variation of cases and controls

The sequences of the cases and controls have to be aligned to the Revised Cambridge Reference Sequence (rCRS), which is the standard comparison sequence human mtDNA research (ANDREWS et al..mitomap). We will sort the variants of the sequences of the controls and cases (compared to the rCRS) according to their locus. We will examine thoroughly those that belong to the coding region and we will divide them to synonymous or non-synonymous, depending on whether they cause or not an amino acid change. According to the frequencies of the variants in the population samples for each group of the cases and controls, we will divide them to common and rare. In order to determine the likely effect of the non-synonymous variants on the protein function we will use the proper computational programs.

Statistical analysis of the distribution of the variants

We will carry out statistical analyses of the variants in order to decide whether there is significant difference to the distribution of them between the cases and controls. A variety of statistical tests will be used depending on the needs of our hypothesis and the data we are handling.

Statistical analysis of the variants in association with the dysfunction of proteins

We will carry out statistical analysis of the scores of the variants (mutation deleterious or not), in order to determine the association of the variants with the generation of dysfunctional proteins.

The diseases that we are going to study are the Alzheimer's disease and the Parkinson's disease (PD).

BACKGROUND

Mitochondria and mtDNA mutations

Mitochondria are double membrane organelles that are present in all eukaryotic mammalian cells. They are responsible for the majority of energy support of the cell since they support the aerobic respiration through oxidative phosphorylation (OXPHOS) that results to the synthesis of adenosine triphosphate (ATP) (Tuppen, et al., 2010).

The mammalian OXPHOS system involves five enzyme complexes and over 80 different proteins, from which only 13 proteins are encoded by the mtDNA that the cell contains. The rest of the proteins that participate in the OXPHOS system are encoded by several nuclear genes (Larsson, 2010).

The respiratory chain is organized by the complexes I-IV, coenzyme Q and cytochrome C, while complex V is the ATP synthase (Larsson, 2010). Through the oxidation of fatty acid and the citric acid cycle, electrons are received by the respiratory chain (RC) first four complexes. There, after the reactions of reduction-oxidation, there is production of water. Proton accumulation, activates the complex V (ATP synthase), to phosphorylate ADP into ATP (Tuppen, et al., 2010).During the OXPHOS, electrons may go away from the RC (at complexes I or III), forming the reactive oxygen species (ROS) superoxide (Larsson, 2010). Besides OXPHOS system, mitochondria regulate the concentration of cytosolic calcium and control the programmed cell death. They also play important role for the essential metabolic functions of tricarboxylic acid(TCA) cycle and the urea cycle (Tuppen, et al., 2010).

Fig. 2.Schematic representation of the mechanism of the respiratory chain according to Joseph-Horne et al., 2001

After transferring most of their genetic material to the nucleus(bacterial symbionts),mitochondria have a double stranded circular chromosome of 16,569 bp (Tuppen, et al., 2010).The mtDNA strands are called the heavy(H-guanine rich composition) and the light strand (L-cytosine rich composition) strands. The mtDNA contains no introns between mtDNA genes, except one 1.1 kbp long non-coding region, called displacement loop (D-loop). In the D-loop are found the LSD and HSP, transcription promoters for the L and H strands respectively and one of the replication origins of the H strand (OH) (Larsson, 2010). The genome of the mitochondria is comprised of 37 genes, 13 of which encode for OXPHOS components, 22 for tRNAs necessary for the translation of the mRNAs of the above 13 genes and 2 ribosomal RNAs (Mancuso, et al., 2009).

The mtDNA is inherited maternally and in a single mammal cell there are quite hundreds or thousands of copies. The mutation rate of the mtDNA is 10 times greater than the one of the nuclear DNA(nDNA), due to the fact that there are no histones to protect it (Mancuso, et al., 2009). There is also no recombination of the mtDNA in mammals, so the variation that arises from the Single Nucleotide Polymorphisms (SNPs) is asexually inherited and can be characterized by distinct lineages called haplogroups (Elson 2001 in the AJHG). In a haplogroup mtDNA sequences share one or more outlining sequence change, and upon that change new polymorphisms rise, creating new sub-haplogroups. After a restriction fragment length polymorphism (RFLP) analysis that was conducted in European populations, they were categorized in nine main haplogroups (H,V,T,J,U,K,I,X,W). Other major ethnic haplogroups are the Africans and, Asian and Native Americans, with the Africans being the oldest one (Elson and Samuels, 2012).

The vulnerability of the mtDNA in mutations is much higher (frequency of SNPs is 1 per every 13 bp), approaching the value of 70 times more than the nuclear SNPs (Maruszak, et al., 2006). Besides the absence of histones, the mtDNA is located near the electron transport chain (ETC), being uncovered opposite a more robust oxidative stress due to the ROS. The replication of the mtDNA takes place more often than the one of nDNA, resulting to mutations due to copy errors. The previous factors are associated with the promotion of the frequent mtDNA alterations, which result in neutral polymorphisms most of the time. MtDNA mutations are an important cause of genetic disease. Unfortunately, many pathogenic mtDNA mutations occur, with the first ones having been detected in 1988 and more than other 250 having been identified since then (point mutations or rearrangements) (Tuppen, et al., 2010). In order to understand the role of the pathogenic mtDNA mutations, we must bear in mind that mtDNA mutation can have an impact on the physiology of the cell only if the mutations have exceeded a specific threshold level (>60% for single large mtDNA deletions, >90% for point mutations) (Larsson, 2010). The threshold value varies, depending on the type of the tissue, the nature of the mutation, the age and some environmental criteria (Maruszak, et al., 2006). It is widely known also, that the mtDNA which has undergone the mutation(s) can be present in the cell with the normal mtDNA molecules, a situation called heteroplasmy (Howell, et al., 2005).Heteroplasmy, is a key feature of inherited mtDNA disease, and not of the sporadic late onset diseases we are studying in this paper. The mtDNA encodes small percentage (2%) of the proteins that constitute the mosaic respiratory chain (RC) of the mitochondrion. Nevertheless, all of those proteins play an important role in the maintenance of the oxidative phosphorylation (OXPHOS) at the physiological levels. Thereafter, the growing number of the mtDNA mutations is considered to constitute one of the main causes in the energy loss of the cell, as the organism ages (Maruszak, et al., 2006). In many different types of aging tissues in humans, there have been mentioned RC deficiency phenomena, e.g. heart, hippocampal neurons, midbrain dopaminergic neurons, skeletal muscle and colon (Larsson, 2010).

Mitochondria play a key role in the apoptosis of the cell, thus any abnormality to its mtDNA can lead to cell death and tissue non functionality (Maruszak, et al., 2006). In addition to being important cause of genetic disease some have suggested that mtDNA variants seem to affect the susceptibility to common neurodegenerative disease of some people that are unaffected by primary mtDNA disease.

Neurodegenerative diseases

As the mtDNA abnormalities are accumulate, they tend to implicate the mitochondrion function and they are shown to be one of the main causes of various diseases, phenotypically heterogeneous and with various ages of onset. Such kinds of diseases are the neurodegenerative diseases. Neurodegeneration is the term that is used when the structure or the function of the neurons is progressively lost, having as a result the death of the neuron cell. The most prevalent neurodegenerative disorders are the Alzheimer's disease, the Parkinson's disease and multiple sclerosis (which has similar mechanisms as AD and PD according to Joanna's paper for multiple sclerosis). The most typical characteristic of these disorders is the induced apoptosis of the cell, connecting them clearly with mitochondrial dysfunction.

Two of the most prevalent neurodegenerative diseases, AD and PD, have evidently associated their pathogenicity with the oxidative stress. There are suggestions that the patients with AD and PD accumulate faster the mtDNA mutations in the cells of the brain tissue, than the healthy people. In AD and PD death of the neurons is mainly induced by apoptosis, after the disruption of the calcium homeostasis, the production of free radicals, the nitric oxide synthetase activation, the neurotoxicity that is connected with the glutamate and of course the mitochondrial RC dysfunction (Maruszak, et al., 2006).

The issue of whether the mitochondrial dysfunction is firstly or secondly connected with the progression of neurodegeneration in AD and PD is still controversial. To this point, the damage of OXPHOS has been mostly attributed for the deficiency of the mitochondrial respiratory chain of complex IV in AD and of complex I in PD. There is still the need to elucidate the associated genetic factors for these diseases and discover whether besides the somatic mutations, the inherited mtDNA mutations, could also have a key role in the etiology of the neurodegeneration (Maruszak, et al., 2006).

Alzheimer's disease

Alzheimer's disease is the most prevalent late-onset neurodegenerative disorder and is clinically characterized by progressive damage of cognition and emotional disturbances. It is highly related with the degeneration of the synapses and neuron death in limbic structures, like the hippocampus, the amygdala and the related areas of the cerebral cortex (Mattson, 2000). It is identified as a main and increasing public health problem (it is estimated that 35 million people suffer around the world), due to the growing age limit of the Western population. The 90% of the cases are identified as sporadic and the other 10% as familial (autosomal). The etiology of AD is really complex, since it is interrelated with genetic, environmental and behavioral factors (Elson and Samuels, 2012). The familial AD cases present mutations to the three following genes, APP (amyloid precursor protein), PSEN1 (presenilin 1), PSEN2 (presenilin 2), whereas the sporadic AD form involves association with APOE4 (apolipoproteinE Îµ4) allele risk factor (Grazina, et al., 2006).

The diagnosis of the AD is based on autopsied brain cells where, neuron death, neurofibrillary tangles (NFT) and senile plaques (SP) are identified. These are thought as the hallmarks of the AD and support the amyloid cascade hypothesis. According to this hypothesis, the main event in the AD neurodegeneration is the creation and aggregation of senile plaques that contain amyloid beta (AÎ²) and the neurofibrillary tangles (NFT) due to the hyper phosphorylation of the micro tubular protein tau (Mancuso, et al., 2009).

Although many genetic studies create a linkage between the familial AD and the AÎ² cascade hypothesis, the genetic variants of APOE pose an unresolved question. It still needs to be answered whether APOE gene mutation affects up regulating or down regulating the AÎ² production that has as an outcome the damage of the brains of sporadic AD (SAD) patients due to oxidative stress. A mitochondrial cascade hypothesis has been suggested for the association of the mitochondrion with the late-onset SAD. The mitochondrion dysfunction is one of the main pathological reasons that lead to the formations of amyloid plaques and NFT. Inherited polymorphism of mtDNA and nDNA genes that encode subunits of the ETC, define the ROS production levels. ROS act detrimentally against the mtDNA through the progress of aging, accumulating thus more mitochondrial somatic mutations. The ETC activity in this way is deactivated even more, leading to oxidative stress. On the other hand, the AÎ² production is at first a result of the over generation of ROS. The AÎ² acts as an antioxidant until a specific threshold limit of ROS generation, where its activity turns into pro-oxidant. Thus, the formation of ROS and AÎ² overproduction causes a further ETC impairment (Maruszak, et al., 2006).

More evidence is emerging on the mitochondrial theory where AD patients accumulate more and more mtDNA mutations in cells of the brain tissue. In AD brains it is stated a 63% rise in the frequency of heteroplasmic mtDNA variations of the control region (Maruszak, et al., 2006). In addition, the mtSNPs and their haplotypes are considered one of the main reasons of increased vulnerability to AD. It is also under study the case of inherited mtDNA rare variants and how do they affect the pathogenesis of the disease. Both of these hypotheses on the etiology of AD need a lot of light in order to gain a clear view of the real pathogenic reasons (Tanaka, et al., 2010).

Parkinson's disease

Parkinson's disease (PD) is the second most prevalent neurodegenerative disease after Alzheimer (almost 2% of people over the age of 65 suffer form PD). The familial PD affects the 10% of the PD patients, leaving the rest 90% suffering from the sporadic form of the disease. Pathologically, with PD is lost a great number of neurons in many tissues but the main ones are those that are found in the substantia nigra. In both the familial and sporadic PD cases, the hallmark trait of the disease is the formation of intracellular Lewy bodies inclusions, whose main component is Î±-synuclein (SNCA) (Elson and Samuels, 2012). Autosomal recessive PD is connected with mutations in three nuclear genes: PARKIN2 (codes for parkin, a protein that is associated with mitochondria), PINK1 (another mitochondrial protein) and DJ-1(DJ-1 protein) (Maruszak, et al., 2006).

Impairment of complex I activity, oxidative and nitrosative stress are reported in all PD forms. The complex I is comprised from 45 subunits and 7 of its essential polypeptides are encoded from the mtDNA. Mutations on these genes could influence the function of mitochondrial respiratory chain and lead to oxidative stress, contributing to the appearance of the PD (Elson and Samuels, 2012).

There is evidence that both in the sporadic and the familial forms of PD, the inhibition of the complex I activity is one of the important issues that affect the mechanisms that result to neurodegeneration (Maruszak, et al., 2006). In addition, oxidative and nitrosative stress are considered to be a significant cause of somatic mtDNA mutations, increasing thus the probability that the mtDNA encodes information of pathogenicity of the PD. Many studies have reported associations between haplogroups and PD. Others have investigated the frequency of heteroplasmic mtDNA variants in PD. As in the case of AD, it is likewise necessary to carry on looking for the main pathogenicity risk factors.

Previous studies

The impact of the mtDNA mutations on AD and PD has been investigated in four main ways:1) cybrid analysis,2) genetic epidemiological analysis,3) case-control studies and 4) mitochondrial haplogroup- relation studies (Howell, et al., 2005). The case-control studies are trying to identify mtDNA mutations that happened in the germline and are common variants or rare variants. The two basic approaches that are used in order to study the mtDNA polymorphisms are the following:

a) The common disease - common variant hypothesis, which suggests that a common disease is significantly related with common polymorphisms that are extensive in many individuals. These polymorphisms can be referred as haplogroup - associated, since a haplogroup is clustered according to its common polymorphisms.

b) The common disease - rare variant hypothesis which suggests that the impact of multiple rare variants may be significant for the appearance of a common disease. This approach is harder to check because it is necessary to identify a high number of variants in a small number of individuals in the population (Elson and Samuels, 2012).

Dataset

We obtained our sequences from mtSNP database. Website: http://mtsnp.tmig.or.jp/mtsnp; The complete mitochondrial genomes of 672 Japanese individuals were sequenced by Tanaka et al.(2004). This set of sequences was used in order to create a phylogenetic network (Bandelt et al 1999). The final produced phylogenetic network corresponded entirely with the ones that were published earlier, both at global (Maca Meyor et al.Hernstad et al) and local level (Kong et al 2003). After re-sequencing the mistrustful fragments, Kong at al (2008) amended the above sequences. Those sequences can be found at mtSNP database. For our study we used the complete mitochondrial genomes of 96 Japanese AD patients, 96 Japanese PD patients and 96 Japanese centenarians (Shigeru Takasaki). The Japanese centenarians (JC) were used as our control (CTRL) group and the AD and PD patients were used as our cases (CASE) groups. Although this dataset was previously criticized for its quality, there was an intensive effort for curation from Kong et al(2008) as we referred to previously. In this way we decided not to check their quality once again.

Technical bagkground

This sort introduction to the technical characteristics of our project is essential to understand the function of the platforms we have used to process our data. The main platforms that we used in order to obtain the proper results to proceed with our analysis, were MitoTool and SIFT. Those are the main two topics of our technical background.

MitoTool for the analysis of human mitochondrial variations

For the purpose of our study we used MitoTool (Long Fan) for a vast array of functions we wanted to complete. Since MitoTool could provide us the means to process different types of mtDNA data, without the need of user login for access, it was the best tool for our analysis. MitoTool is established to perform four modules:1)Database module, 2)Haplogroup classification module, 3)Detailed parsing module and 4)Statistical analysis module (Long Fan). From the above modules we used the haplogroup classification module and the detailed parsing module. The main goals we wanted to achieve by using MitoTool were: a) the processing of different types of mtDNA data, b) the automatic acquisition of the variants for each of our samples compared to the revised Cambridge Reference Sequence (rCRS) (Andrews et al.), c) the automatic classification of each of our samples to a haplogroup, d) the location of the variant and the amino acid change status displayed in the same report. (Long Fan)

First, we uploaded our data as complete mtDNA sequences in fasta format, containing all the sequences of our samples for each of our groups (Alzheimer's patients, Parkinson's patients, Japanese centenarians). Each of those sequences was aligned with rCRS(Andrews et al), using the ClustalW software in the backstage (Larkin et al). Then, the variants for each of our sequences were exported, according to the result from ClustalW. Furthermore, each sequence was classified to belong to a certain haplogroup based on the haplogroup-specific variation motifs (van Oven and Kaysef,2009) and the standard of optimal exact matching and fuzzy or near matching (Long Fan). The input form and the page of the results are shown in figureâ€¦.

After having extracted the variants for each sample of each group we used the detailed parsing module in order to sort the variants according to their locations (control region, non-coding region, protein coding region and tRNA and rRNA coding region). The input form and the report page are shown in figureâ€¦.

Keeping the variants of the coding region, we entered them once more in fasta like format in the input form of the detailed parsing module, selecting the coding effect radio button. Thus, it was created a report about the aminoacid change status for each variant (synonymous or non-synonymous), based on the count of the mitochondrial genomes of 43 species of primates (Long Fan). The input form and the report page are shown in figureâ€¦.

SIFT for prediction of the amino acid changes that affect protein function

The next computational platform that we used in our research was SIFT (Sorting Intolerant From Tolerant) (Pauline C.Ng). Trying to find out which of our variants might be involved in the diseases we are studying (AD, PD), SIFT was one of our first options in order to detect which of our non-synonymous SNPs would affect the protein function by being deleterious. Thus, we could proceed with our further study and find out if and how this deleterious substitution would lead to a potential alteration of the phenotype (Pauline C.Ng).

The algorithm that the SIFT platform is using, is based on sequence for prediction (Pauline C.Ng), performs though in a similar way with other tools that are based on structure (3,6-8 SIFT paper). The structure is not a requirement, permitting more substitutions to be predicted. Moreover, as the number of genomes that are sequenced is increased, so is the number of proteins that are available, hence SIFT will be capable of predicting more and more substitutions.

In order to reach a prediction for an amino acid substitution SIFT takes into account the position of the specific SNP and the amino acid type that's has changed. According to SIFT, if an amino acid is important it will remain unchanged, assuming that changes at positions considered as well-conserved, are going to be predicted as deleterious most of the times.

When we feed in a protein sequence to SIFT, it finds closely related protein sequences and produces an alignment between our query sequence and the related ones. After it takes into consideration the amino acids that exist in each position of the alignment, starts to calculate the probability of whether an amino acid can be tolerated in this position. The previous calculation is done based on the most frequent amino acid that can be tolerated at this position. The prediction of the substitution will be reported as deleterious if the above calculated value is less that a threshold value, which is 0.05 (2 from SIFT paper).

SIFT is available at the website: sift.bii.a-star.edu.sg. At this page there are links for the tools that a user might need. As far as we are concerned, we made use of the single protein tools, and more especially the SIFT Blink. This tool provided us with SIFT predictions for our SNPs, after we had given as inputs the gi numbers of our protein sequences and the amino acid substitutions that were considering each of those proteins. An example of an input form for the SIFT Blink tool and another one for the results' report are shown in figureâ€¦.

methods

Identification and classification of mitochondrial SNPs for the cases and controls

In this case-control study we analyzed three collections of data. Our cases were represented by 96 samples of Japanese AD patients (forming the AD group) and 96 samples of Japanese PD patients (forming the PD group). Our healthy age-matched controls were represented by 96 samples of Japanese centenarians (forming the JC group). We have submitted each sequence of our samples to MitoTool (Long Fan), using the analysis of whole mtDNA genome sequence. In this way, we compared our mtDNA sequences of the cases and controls, with the rCRS (Andrew et al) and we obtained a list of all the variants for the mtDNA samples of each group. With further analysis, we categorized the variants of each group to common variants and rare variants, according to their frequency of appearance. Hence, we considered as common variants the SNPs that appeared more than or equal to 30 (31.25%) times in our samples and rare variants those that appeared less than or equal to 3 (3.125%) times in our samples. Moreover, using the detailed parsing module of MitoTool, we found the locus of the common and rare variants. For the variants that belonged to the coding region of mtDNA, we used again the detailed parsing module in order to divide them to synonymous (silent) or non-synonymous (non-silent) SNPs. Keeping only the non-synonymous variations for both the common and rare variants of the cases and controls, we submitted them to SIFT (Pauline N.g), so as to find which of them were causing tolerant or intolerant amino acid substitutions. After obtaining the SIFT score for all the non-synonymous variations we proceeded with the statistical analysis, which we will explain further down.

Haplogroup classification of the samples

As the sequence mutations were accumulated through time, the variation of the mtDNA grew larger, forming thus clusters that are called haplogroups and are characterized by particular sets of mutations (variants). On www.phylotree.org can someone see in a phylogenetic tree format how these haplogroups are hierarchically defined according to those specific mutations. Using the haplogroup classification module of MitoTool each of our samples was compared to rCRS and through the haplogroup variation motifs (van Oven and Kayser) we obtained the haplogroup type for each of them. After we have calculated the frequency of each haplogroup in our groups, we proceeded with the statistical analysis, which we will present further down.

Statistical analyses

We used Pearson's chi-squared test () in order to assess the statistical significance of the mitochondrial haplogroup frequencies between our cases (AD, PD) and controls (JC).

We grouped our common and rare substitutions according to the mtDNA locus (D-loop,tRNA, rRNA,coding region) and furthermore those that belonged to the coding region to synonymous and non-synonymous. We used Fisher's exact test in order to assess the frequency differences for categorical data (common variant, rare variant) in our cases (AD, PD) and controls (JC). Statistical significance was calculated with a two-tailed test with Î±=0.05.In the event that we found a significant result we should use correction for multiple testing.

After we acquired the SIFT scores for the non-silent substitutions (both common and rare) we conducted a two way-between subject analysis of variance (ANOVA), where our independent variables (factors) were :1) Group (AD, PD, JC), and 2) Type of variant (rare or common), and our dependent variable was the SIFT score.

Trying to specify even more our research, we exported the unique common and rare variants of each group. First, we used Venn diagrams so that we could obtain the number of unique variants of each group. Having as a map these numbers we continued using the operation of difference between sets in order to get the vector with the unique variants that were specific in each of the AD, PD or JC.

In addition, we divided even more into groups the unique rare variants according to the respiratory chain function: 1) Complex I: seven mtDNA genes (ND), 2) Complex III: cytochrome b gene (CYB), 3) Complex IV: three cytochrome oxidase genes (COX), 4) Complex V: two genes of ATPase (ATP) (Joanna Elson). As for the analysis in order to assess the statistical significance of the distribution of the variants throughout the complexes, we used the Pearson's chi-squared test ().

Statistical packages

For the analyses we used the statistical package e of R, R Development Core Team (2010). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL: http://www.R-project.org/.

results

Association analysis between cases and controls

In this study, after the submission of all the 96 sample sequences of each group to MitoTool, and their alignment to the rCRS, we acquired the next results: 1) In the 96 samples of the AD cases were detected 550 different SNPs, 2) In the 96 samples of the PD cases were detected 585 different SNPs and 3) In the 96 samples of Japanese Centenarians were detected 532 different SNPs. Next, followed the application of our threshold values (>31.25% for the common variants, <3.125% for the rare variants), so as to keep only the common and rare mutations. Thus, we found: 1)34 common and 381 rare mutations in the AD cases, 2) 33 common and 433 rare mutations in the PD cases and 3) 33 common and 377 rare mutations in the JC controls.

Trying to depict the frequencies of all the type of variants in the cases and the controls, we produced three histograms for the AD patients, the PD patients and the Japanese centenarians. In figures 3,4 and 5 we can clearly observe that for the AD and PD cases, the frequencies of the rare variants are slightly over the frequencies of the Japanese Centenarians. We consider that this is entirely natural, since we have many different rare mutations both in the cases and controls. It would fairly take long time for a rare mutation to become a common one (and if it was an advantageous one), through the evolutionary process of natural selection. But on the other hand, we would want to identify as much as possible rare mutations that are pathogenic, and the high rate of mtDNA mutation makes this procedure even more daunting than it is. A simple histogram of the various mutations, as much guidance can give us for our initial procedure, it is not enough in order to support our hypothesis. It was necessary for us, to follow a procedure with more detailed steps in order to decide whether we can or cannot accept our hypothesis. Further on, we describe the results of our research and how we adjusted it depending on the needs of each step.

Fig.3. Frequencies of 550 different SNPs in 96 AD patients (the bin size that was used is 3).

Fig. 4. Frequencies of 585 different SNPs in 96 PD patients (the bin size that was used is 3).

Fig. 5.Frequencies of 532 different SNPs in Japanese Centenarians (the bin size that was used is 3).

Case control differences in mtDNA haplogroup distributions

The primary aim of our project was to look at the mtDNA mutational load hypothesis (multiple rare mtDNA mutations as a risk factor), but since we were conducting a case-control analysis we decided to look also at the traditional pattern of the mtDNA haplogroup association with the diseases in our study (Alzheimer's, Parkinon's) (Hernstaadt and Howell 2004)(Joanna paper 253). The haplogroup distributions of AD and PD patients' (cases) and Japanese Centen'arians (controls) mtDNA are shown in Table 1. The overall distribution of mtDNA haplogroups in our cases and controls is shown in figure 6. It is clear in all three barplots (for AD, PD and JC) that the D4 haplogroup prevails all the others not only in the cases' groups but also in the controls'. Someone can also observe that the general distribution of the haplogroups in all three groups seems quite similar. Moreover, the test statistics (Pearson's Chi-squared test) revealed that there is no statistically significant association between disease status and mtDNA haplogroups (=8.956, df=14, p-value=0.834). With these results in mind, we continued with our subsequent analyses, which are applied more on the hypothesis of the cumulative effect of multiple rare mutations as a risk factor in neurodegenerative diseases (here AD and PD).

Fig. 6 Overall distribution of mtDNA haplogroups in our cases (AD, PD) and controls (JC). All the haplogroups are represented by different colours which are shown in the key table

Analysis of mutational load adjusted to mtDNA locus distribution

We wanted to check whether in our cases there are more rare variants than in controls. We grouped our variants according to their mtDNA locus (D-loop, tRNA, rRNA, coding region) and specifically those that belonged to the coding region to synonymous and non-synonymous, depending on the amino acid change status. Creating thus two tailed tables for each locus, we assessed the possibility of a higher frequency of rare variants in our cases (AD, PD) than in controls (JC). The results of the 10 Fisher's exact tests we conducted are shown in Table 2. We can clearly see that none of our hypotheses can be considered as statistically significant since every p-value for both AD and PD is bigger than Î±=0.05. In most of the cases in our tables, we have a relative excess of rare mutations in the cases than in the controls, but unfortunately after the statistical test we have to accept our no-hypothesis, and thus that the mutational load of rare mutations adjusted to mtDNA locus distribution is not different in cases and in controls.

Table 2 Distribution of variants according to mtDNA locus

Variants

AD

JC

p-valuea

PD

JC

p-valuea

I

D-loop rare

88

104

0.634

106

104

1.000

D-loop common

10

9

9

9

II

tRNA rare

27

18

1.000

18

18

1.000

tRNA common

0

0

0

0

III

rRNA rare

28

20

0.713

30

20

0.706

rRNA common

4

4

4

4

IV

SYN rare

62

68

1.000

82

68

0.784

SYN common

7

7

7

7

V

NON-SYN rare

174

159

0.836

194

159

0.676

NON-SYN common

12

12

12

12

aTo assess the difference between the distribution for the various loci we used Fisher's exact test

Analysis of pathogenicity of mutations in cases and controls

Trying to continue with our so far approach, we kept both the common and rare variants of the coding region for our cases and controls. After checking the amino acid change status, we kept only the non-silent ones and proceeded with obtaining the SIFT score, that would guide us in distinguishing them between tolerated and not tolerated. The candidate pathogenic mutations were those ones that would have acquired a SIFT score under the threshold value of 0.05. In our effort to associate the SIFT scores of utative pathogenicity with our rare and common variants in our cases and controls, we conducted an analysis of variance(ANOVA). Our hypothesis was that the rare variants possess a lower mean SIFT score than the common ones, and more especially the rare variants of the cases compared to the ones of the controls. The ANOVA results revealed: 1) There is no statistically significant difference between the SIFT scores of the variants in the cases and the controls (p-value=0.797), 2) There is no statistically significant difference in the SIFT scores between the rare and common variants (p-value= 0.192), 3) There is no statistically significant difference in the SIFT scores of variants when the factors of Group (AD, PD, JC) and Type of variant (common or rare) are combined (p-value=0.693). The boxplot in figure 7 illustrates the means and the variance for each subgroup (AD.common, JC.common, PD.common, AD.rare, JC.rare, PD.rare) according to group and type of variant. We can see that the SIFT scores of the common variants in all the groups are slightly higher ("more tolerated" if we may say) than the rare variants of all the groups. Moreover, the AD.common SIFT scores are slightly "more deleterious" than the common ones of JC and PD since the mean SIFT score of AD.common is under 0.3 whereas the mean SIFT score of PD.common and JC.common are higher than 0.4. On the other hand, if we take a look at the rare variants, we observe that the JC.rare seem to be slightly "more deleterious" (mean SIFT score <0.1), followed by the PD.rare (mean SIFT score â‰ˆ0.1) and last by AD.rare (mean SIFT score >0.1). In addition, on the strip chart of the variance of SIFT scores we can see that the PD.rare variants, evidently present more SIFT scores on the 0.00 scale than all the

Fig. 7 Mean SIFT scores and variance according to Group (AD, PD, JC) and Type of variant (common, rare).

others, followed by the JC.rare and last by the AD.rare. All the above assumptions for the slightly "more deleterious" subgroups of variants are done since we considered that the lower the SIFT score of a variant the more intolerable it is. Of course, we do bear in mind that a not tolerated variant is only this one that has a SIFT score under 0.05 (SIFT paper ref).

Detection of unique rare and common mutations in cases and controls

Taking into account the results of the above ANOVA of the SIFT scores, we tried to detect whether there are unique common and unique rare variants for each of our cases and the controls. Once more, we kept the non-synonymous mutations of each group and we created six sets of variants. The first three were the ADcommon, PDcommon and JCcommon and the other three were the ADrare, PDrare and JCrare. First we produced a Venn diagram for the common variants, which is shown in figure 8. We can see that the three subgroups (ADcommon, PDcommon, JCcommon) share all of their common variants which are : 8860,8701,8414,14766,15326,5178A,10398. We reached the conclusion that all of these common variants are not that deleterious and associated with the AD and PD since are shared through all the groups of our study. We continued with the subgroups of ADrare, PDrare and JCrare. The Venn diagram that illustrates the number of their unique variants is shown in figure 9. As we can see there are only 9 variants that are shared between all the groups. The

Fig. 8.Venn diagram between the sets ADcommon, PDcommon and JCcommon (each group contains its non-synonymous common variants)

ADrare group has 35 unique variants, 10 variants that shares with PDrare and 9 variants that shares with JCrare. The PDrare group

has 50 unique rare variants, 13 variants that shares with JCrare and 10 variants that shares with ADrare. The JCrare group has 38 unique rare variants, 9 variants that shares with ADrare and 13 variants that shares with PDrare. We were only interested in the unique rare variants of each group so we used the set operation of difference in order to extract them and proceed with our further analysis. In the tables 4,5 and 6 we present all the unique rare variants of AD, PD and JC groups respectively. We received the results that the tables present, after we submitted each unique rare variant to MITOMAP (ref for MITOMAP).

Analysis of pathogenicity of the unique rare mutations in cases and controls

Following the extraction of the unique rare variants of the AD, PD and JC sets, we decided to use the same method as we used before for all the variants. The analysis of variance (ANOVA) of the SIFT scores, but this time depending solely on the Group, since the type of variant in only one, rare. We assumed that since the variants are unique for each group, they might be slightly more deleterious and maybe closely associated with the disease status. But, as the results of the ANOVA revealed, there is no statistically significant difference between the SIFT scores of the unique rare variants in the cases and controls (p-value= 0.309). In the boxplot of figure 10 are illustrated the mean SIFT scores and the variance for each group. On the contrary of what we expected, the mean score of the unique rare variants of JC is the lowest of all, approaching surprisingly the 0.00 scale. This means that most of the unique rare variants in the Japanese Centenarians are deleterious. On the other hand, the mean SIFT scores of the AD and PD variants are almost the same, with the AD ones being slightly lower than the PD ones (both >0.1).

Fig. 10. Mean SIFT scores and variance according to Group (AD, PD,JC) (each group contains only its unique rare variants that were exported with the operation of difference, after we produced the Venn diagram of figure 9 )

Analysis of mutational load adjusted to respiratory chain complexes

Making our research more specific, we grouped furthermore the unique rare variants of each group according to the mitochondrial respiratory chain (RC) function complexes. We supported our hypothesis to the existed documented evidence that there is observed mitochondrial dysfunction in AD and PD but in different complexes. In AD, there has been observed that in complex IV appears reduced mitochondrial enzyme activity (Ryan D. Readnower). In PD, there is implication that reduction of complex I function in RC, leads to the pathogenesis of the disease (Joanne Clark). In Table 3 we present the distribution of the unique rare variants of each group according to the complex that they belong to. After the Pearson's Chi-squared test we had conducted, we found that there is no statistically significant association between

the complexes of the ROS and the disease status (=8.221, df=6, p-value=0.222). In figure 11 is illustrated the distribution of the

Table 3 Distribution of the unique rare variants of AD, PD and JC groups according to the RC complex

Respiratory chain complexes

Groups

complex I

complex III

complex IV

AD

21(34.43%)

6(27.28%)

4(19.05%)

PD

27(44.26%)

8(36.36%)

6(28.57%)

JC

13(21.31%)

8(36.36%)

11(52.38%)

Total

61(100%)

22(100%)

21(100%)

unique rare variants for each group according to the mitochondrial complexes. For the PD cases, most of the variants appear to be found in the complex I as we expected. For the AD cases, also most of the variants are found in the complex I, despite the fact

that we expected them in complex IV. For the JC controls we can say that the variants are somehow evenly distributed across the complexes with a slight excess in complex I and complex III (randomly distributed as we expected).

Fig. 11. Distribution of unique rare variants in AD, PD and JC according to the mitochondrial RC complexes

discussion

We described briefly in our introduction and background that is suggested that mitochondria play a pivotal role in the etiology of the two neurodegenerative diseases in this project: Alzheimer's disease and Parkinson's disease. In this project we addressed more specifically that the mutational of multiple rare mitochondrial DNA mutations might be one of the main risk factors for the above diseases. In order to assess the possibility of our hypothesis we carried out a case-control analysis and we resulted in the following:

The sample sequences of our cases (AD, PD) contained slightly more variants that had risen independently (1, 2 or 3 times in our samples) than did the controls. Despite this fact, we don't consider it statistically significant and of course we cannot claim that these mutations are more deleterious towards the protein function than the ones in the Japanese centenarians.

We could not find any evidence that would link significantly our haplogroup distribution in our cases and controls with the disease status. Besides that, our set of sequences in cases and controls was similarly distributed to the defined haplogroups in our study.

The distribution of the common and rare mutations according to the various mtDNA loci (non-synonymous and synonymous for the coding region) did not guarantee that there is significant difference between the mutational load (in the various loci) of multiple rare variants and common ones. Otherwise, we cannot blame the effects of the increased number of rare variants in all the different mtDNA loci instead of blaming the effects of the common mutations in the same loci, since the distribution of their cumulative numbers does not permit us.

Analysis of variance of the SIFT scores (both of common and rare mutations) of the cases did not show that these non-synonymous SNPs are on average more deleterious towards the protein function than the ones of the controls. Moreover, having as a factor only the type of variant (rare or common), we did not obtain once more significant results for the association of our rare variants with average lower SIFT score (thus increased pathogenicity). Last, not even when we combined the factors of Group (AD, PD, JC) and Type of variant (rare or common) we obtained results to lead us to lower mean SIFT scores that would turn our hypothesis into significant. However, we must admit that the subgroups of the common variants were slightly "less deleterious" than the ones of the rare subgroups, since their average SIFT scores were higher than the rare ones.

We confirmed that both our cases and controls are sharing their non-silent common variants with each other. The sets ADcommon, PDcommon, JCcommon overlap entirely, meaning that the common mutations in our study cannot be clearly associated with the disease status. On the other hand, the sets that contained the rare mutations (ADrare, PDrare, JCrare) do not overlap completely, so through the Venn diagram of these three sets we got the number of the unique rare variants for each group. Hence, for the cases we have 35 unique rare mutations for AD and 50 unique rare mutations for PD. For the controls (JC), we have 38 unique rare mutations.

The analysis of variance of SIFT scores only for the unique rare variants (ADrare, PDrare, JCrare) indicated that there is no substantial evidence to connect the unique rare variants of our cases with an augmented pathogenicity ("more deleterious"). On the contrary, we obtained the opposite result where the average SIFT scores of the unique rare variants of the Japanese centenarians had the lowest value.

The distribution of the mutational load (only the unique rare mutations) depending on the mitochondrial respiratory chain function is not statistically associated with the disease status. From our initial hypothesis that wanted the AD variants to be more distributed towards the complex IV and the PD ones towards the complex I, only the PD variants confirmed it, as the AD variants seemed to be more accumulated to the complex I as well. As for the controls, we confirmed that their unique rare mutations are distributed randomly throughout the four complexes as we initially assumed.

Taking into account all the previous results, we cannot claim that there is sufficient evidence that the mutational load of multiple rare variants in mtDNA is the dominant risk factor for the neurodegenerative diseases in our study. However, the probability of the pathogenicity of the multiple rare variants remains still valid since they are developed en masse in small subsets of people. Furthermore, there might be a possibility that some rare unique mutations appear in people that do not develop AD or PD, playing an important role as protective factor against the diseases. It is though under investigation the last part, because these kind of mutations (protective ones) usually become common following the rules of evolutionary selection with the advantageous variations. In tables â€¦ we present a list with all the unique rare variants we found in each group of our cases and controls. Some of them are associated with specific diseases like LHON, Leigh disease, early onset PD, Obesity, complex mitochondriopathy etc. Further references on these diseases and how they are linked with each particular variant can be found on MITOMAP (ref).

We accept completely that is necessary to be carried out more studies, without so many limitations as we had, in order to approach as much as we can to a definite answer on whether the mtDNA mutations contribute universally to the pathogenesis of Alzheimer's and Parkinson's disease (and other neurodegenerative diseases). Our analysis, included the examination of 288 published mtDNA sequences (96 sequences of AD patients, 96 sequences of PD patients, 96 sequences of Japanese centenarians), which is really a small number compared to the actual needs of this type of studies. The sample number of a case-control study has to be significantly larger so as to detect and identify more easily mtDNA mutations that are substantial risk and pathogenic factors. We also have to add that a careful selection of cases and their respective controls is demanded, so as to have a successful study. In our case, we selected as controls of the AD and PD patients, the group of Japanese centenarians, whose main characteristic is the age, which is also the main risk factor for AD and PD. They were also both coming from the same region (Japan), something that covered the mutations for the haplogroup defining.

For the needs of our project, it would be recommended to develop an automated pipeline that would incorporate all the technical steps of the procedure we followed in order to obtain our results. However, we have encountered several difficulties when we tried to include some of the tools we have used in such an automated pipeline. For example, initially we were using MutPred (ref) in order to obtain the scores of putative pathogenicity of our non-silent variants. Nevertheless, we found out that it would not handle well our batch queries. We switched then to SIFT, but afterwards we had to deal with the obstacle of the limited amount of time that was given to us to carry out our study. An automated pipeline, would assist us in essence to carry out our procedure much faster and might even have the chance to study some other phenotypes besides AD and PD.

Our hypothesis lacks scientific proof in order to be supported, but even if we failed we cannot renounce the evidence that the etiology of neurodegenerative diseases like AD and PD, lies to the observed mitochondrial dysfunction, and more specifically to the decrease in the energy production from the mitochondrion.

Table 4

Unique rare mutations in patients with Alzheimer's disease

Nucleotide Position

Locus

Amino Acid Change

8764

MT-ATP6

A-T

9038

MT-ATP6

M-T

6040

MT-CO1

N-S

7356

MT-CO1

V-M

7664

MT-CO2

A-T

14757

MT-CYB

M-T

14862

MT-CYB

A-V

14996

MT-CYB

A-T

15024

MT-CYB

C-Y

15221

MT-CYB

D-N

15459

MT-CYB

S-F

3338

MT-ND1

V-A

3421

MT-ND1

V-I

3736

MT-ND1

V-I

3865

MT-ND1

I-V

3943

MT-ND1

I-V

4136

MT-ND1

Y-C

4216

MT-ND1

Y-H (hg JT)

4501

MT-ND2

S-F

11255

MT-ND4

Y-H

10654

MT-ND4L

A-V

12338

MT-ND5

M-T

12451

MT-ND5

I-V

12469

MT-ND5

I-V

13942

MT-ND5

T-A

14129

MT-ND5

T-I

14178

MT-ND6

I-V

14393

MT-ND6

V-A

14502

MT-ND6

I-V

Table 5

Unique rare mutations in patients with Parkinson's disease

Nucleotide Position

Locus

Amino Acid Change

8572

MT-ATP8,MT-ATP6

ATP6:G-S ATP8:Ter-Ter

8854

MT-ATP6

A-T

9115

MT-ATP6

I-V

8894

MT-ATP6

N-I

8905

MT-ATP6

H-Y

9041

MT-ATP6

H-R

8537

MT-ATP8,MT-ATP6

ATP6:N-S ATP8:I-V

7258

MT-CO1

I-T

9288

MT-CO3

T-A

9612

MT-CO3

V-M

9921

MT-CO3

A-T

15662

MT-CYB

I-V

15851

MT-CYB

I-V

14751

MT-CYB

T-I

15257

MT-CYB

D-N

15479

MT-CYB

F-L

15777

MT-CYB

S-N

3397

MT-ND1

M-V

3434

MT-ND1

Y-C

4232

MT-ND1

I-T

4491

MT-ND2

V-I

4924

MT-ND2

S-N

4926

MT-ND2

L-F

5128

MT-ND2

N-S

5263

MT-ND2

A-V

12092

MT-ND4

L-I

11016

MT-ND4

S-N

11087

MT-ND4

F-L

12030

MT-ND4

N-S

12084

MT-ND4

S-F

10609

MT-ND4L

M-T

10750

MT-ND4L

N-S

12406

MT-ND5

V-I

12361

MT-ND5

T-A

13651

MT-ND5

T-A

13708

MT-ND5

A-T (hg J, X2b)

12397

MT-ND5

T-A

13879

MT-ND5

S-P

14162

MT-ND6

A-V

14198

MT-ND6

T-M

14417

MT-ND6

V-A

14582

MT-ND6

V-A

Table 6

Unique rare mutations in Japanese centenarians

Nucleotide Position

Locus

Amino Acid Change

8557

MT-ATP8,MT-ATP6

ATP6:A-T ATP8:syn

9017

MT-ATP6

I-T

8812

MT-ATP6

T-A

9099

MT-ATP6

I-M

8489

MT-ATP8

M-L

5979

MT-CO1

A-T

6261

MT-CO1

A-T

7389

MT-CO1

Y-H

8265

MT-CO2

L-P

7598

MT-CO2

A-T

9804

MT-CO3

A-T

14861

MT-CYB

A-T

15317

MT-CYB

A-T

15323

MT-CYB

A-T

15402

MT-CYB

T-I

15497

MT-CYB

G-S

15773

MT-CYB

V-M

15884

MT-CYB

A-T

15769

MT-CYB

Q-H

4612

MT-ND2

M-T

4659

MT-ND2

A-T

5127

MT-ND2

N-D

5442

MT-ND2

F-L

11969

MT-ND4

A-T

12622

MT-ND5

V-I

12634

MT-ND5

I-V

13810

MT-ND5

A-T

13225

MT-ND5

D-N

14318

MT-ND6

N-S

conclusion

As we stated in our background, it is widely considered that mtDNA polymorphisms might cause a significant change to the susceptibility odds of a person to develop some common complex diseases such as AD and PD (neurodegenerative diseases). In this study, we delved into mtDNA mutational load hypothesis (accumulated multiple rare mutations) in association with the neurodegenerative diseases of Alzheimer's and Parkinson's. We used suitable mtDNA sequences for a case-control study and analyzed the possible effect of the various mutations we encountered both in cases (AD, PD) and controls (JC). Unfortunately, we cannot support with considerable proof our initial hypothesis, since we had to accept our no-hypothesis in all the tests we have conduted. Nonetheless, we feel that is really important for scientists to keep on investigating and investing time and effort on this hypothesis, because the actual truth of the linkage of mitochondrial dysfunction and neurodegenerative diseases lies somewhere in the middle.

Our experts can help you with your essay question

Writing Services

Essay Writing Service

Find out how the very best essay writing service can help you accomplish more and achieve higher marks today.

Assignment Writing Service

From complicated assignments to tricky tasks, our experts can tackle virtually any question thrown at them.

Dissertation Writing Service

A dissertation (also known as a thesis or research project) is probably the most important piece of work for any student! From full dissertations to individual chapters, we’re on hand to support you.

Coursework Writing Service

Our expert qualified writers can help you get your coursework right first time, every time.

Dissertation Proposal Service

The first step to completing a dissertation is to create a proposal that talks about what you wish to do. Our experts can design suitable methodologies - perfect to help you get started with a dissertation.

Report Writing Service

Essay Skeleton Answer Service

If you’re just looking for some help to get started on an essay, our outline service provides you with a perfect essay plan.

Marking & Proofreading Service

Not sure if your work is hitting the mark? Struggling to get feedback from your lecturer? Our premium marking service was created just for you - get the feedback you deserve now.

Exam Revision Service

Exams can be one of the most stressful experiences you’ll ever have! Revision is key, and we’re here to help. With custom created revision notes and exam answers, you’ll never feel underprepared again.

Request Removal

If you are the original writer of this essay and no longer wish to have the essay published on the UK Essays website then please click on the link below to request removal: