The size of eukaryotic genomes is vastly larger than those of prokaryotes. This is partly due to the complexity of eukaryotic organismScompared to prokaryotes. However, the size of a particular eukaryotic genome is not directly correlated to the organism' complexity. This is the result of the presence of a large amount of non-coding DNA. The functions of these non-coding nucleic acid sequences are only partly understood. Some sequences are involved in the control of gene expression while others may simply be present in the genome to act as an evolutionary buffer able to withstand nucleotide mutation without disrupting the integrity of the organism.

One abundant class of DNA is termed repetitive DNA. There are 2 different sub-classes of repetitive DNA, highly repetitive and moderately repetitive. Highly repetitive DNA consists of short sequences 6-10 bp long reiterated from 100,000- 1,000,000 times. The DNA of the genome consisting of the genes (coding sequences) is identified as non-repetitive DNA since most genes occur but once in an organism' haploid genome. However, it should be pointed out that several genes exist as tandem clusters of multiple copies of the same gene ranging from 50 to 10,000 copies such as is the case for the rRNA genes and the histone genes.

Another characteristic feature that distinguishes eukaryotic from prokaryotic genes is the presence of introns. Introns are stretches of nucleic acid sequences that separate the coding exons of a gene. The existence of introns in prokaryotes is extremely rare. Essentially all humans genes contain introns. A notable exception are the histone genes which are intronless. In many genes the presence of introns separates exons into coding regions exhibiting distinct functional domains.
back to the top

Chromatin is a term designating the structure in which DNA exists within cells. The structure of chromatin is determined and stabilized through the interaction of the DNA with DNA-binding proteins. There are 2 classes of DNA-binding proteins. The histones are the major class of DNA-binding proteins involved in maintaining the compacted structure of chromatin. There are 5 different histone proteins identified as H1, H2A, H2B, H3 and H4.

The other class of DNA-binding proteins is a diverse group of proteins called simply, non-histone proteins. This class of proteins includes the various transcription factors, polymerases, hormone receptors and other nuclear enzymes. In any given cell there are greater than 1000 different types of non-histone proteins bound to the DNA.

The binding of DNA by the histones generates a structure called the nucleosome. The nucleosome core contains an octamer protein structure consisting of 2 subunits each of H2A, H2B, H3 and H4. Histone H1 occupies the internucleosomal DNA and is identified as the linker histone. The nucleosome core contains approximately 150 bp of DNA. The linker DNA between each nucleosome can vary from 20 to more than 200 bp. These nucleosomal core structures would appear as beads on a string if the DNA were pulled into a linear structure.

The nucleosome cores themselves coil into a solenoid shape which itself coils to further compact the DNA. These final coils are compacted further into the characteristic chromatin seen in a karyotyping spread. The protein-DNA structure of chromatin is stabilized by attachment to a non-histone protein scaffold called the nuclear matrix.
back to the top

The cell cycle is defined as the sequence of events that occurs during the lifetime of a cell. The eukaryotic cell cycle is divided into 4 major periods. During each period a specific sequence of events occurs. The ultimate conclusion of one cell cycle is cytokinesis resulting in two identical daughter cells.

The 4 phases of a typical cell cycle and the events occurring during each phase are outlined:

1.M phase is the period when cells prepare for and then undergo cytokinesis. M phase stands for mitotic phase or mitosis. During mitosis the chromosomes are paired and then divided prior to cell division. The events in this stage of the cell cycle leading to cell division are prophase, metaphase, anaphase and telophase.

2.G1phase corresponds to the gap in the cell cycle that occurs following cytokinesis. During this phase cells make a decision to either exit the cell cycle and become quiescent or terminally differentiated or to continue dividing. Terminal differentiation is identified as a non-dividing state for a cell. Quiescent and terminally differentiated cells are identified as being in G0 phase. Cells in G0 can remain in this state for extended periods of time. Specific stimuli may induce the G0 cell to re-enter the cell cycle at the G1 phase or alternatively may induce permanent terminal differentiation.

During G1 cells begin synthesizing all the cellular components needed in order to generate two identically complimented daughter cells. As a result the size of cells begins to increase during G1.

3.S phase is the phase of the cell cycle during which the DNA is replicated. This is the DNA synthesis phase. Additionally, some specialized proteins are synthesized during S phase, particularly the histones.

4.G2 phase is reached following completion of DNA replication. During G2 the chromosomes begin condensing, the nucleoli disappear and two microtubule organizing centers begin polymerizing tubulins for eventual production of the spindle poles.

Typical eukaryotic cell cycles occupy approximately 16 - 24 hrs when grown in culture. However, in the context of the multicellular organization of organisms the cell cycles can be as short as 6 - 8 hrs to greater than 100 days. The high variability of cell cycle times is due to the variability of the G1 phase of the cycle.
back to the top

Replication of DNA occurs during the process of normal cell division cycles. Because the genetic complement of the resultant daughter cells must be the same as the parental cell, DNA replication must possess a very high degree of fidelity. The entire process of DNA replication is complex and involves multiple enzymatic activities.

The mechanics of DNA replication was originally characterized in the bacterium, E. coli which contains 3 distinct enzymes capable of catalyzing the replication of DNA. These have been identified as DNA polymerase (pol) I, II, and III. Pol I is the most abundant replicating activity in E. coli but has as its primary role to ensure the fidelity of replication through the repair of damaged and mismatched DNA. Replication of the E. coli genome is the job of pol III. This enzyme is much less abundant than pol I, however, its activity is nearly 100 times that of pol I.

There have been 5 distinct eukaryotic DNA polymerases identified, a, ß, g, d and e. The identity of the individual enzymes relates to its subcellular localization as well as its primary replicative activity. The polymerase of eukaryotic cells that is the equivalent of E. coli pol III is pol-a. The pol I equivalent in eukaryotes is pol-ß. Polymerase-g is responsible for replication of mitochondrial DNA.

The ability of DNA polymerases to replicate DNA requires a number of additional accessory proteins. The combination of polymerases with several of the accessory proteins yields an activity identified as DNA polymerase holoenzyme.These accessory proteins include (not ordered with respect to importance):

Primase

Processivity accessory proteins

Single strand binding proteins

Helicase

DNA ligase

Topoisomerases

Uracil-DNA N-glycosylase

The process of DNA replication begins at specific sites in the chromosomes termed origins of replication, requires a primer bearing a free 3'-OH, proceeds specifically in the 5' -----> 3' direction on both strands of DNA concurrently and results in the copying of the template strands in a semiconservative manner. The semiconservative nature of DNA replication means that the newly synthesized daughter strands remain associated with their respective parental template strands.

The large size of eukaryotic chromosomes and the limits of nucleotide incorporation during DNA synthesis, make it necessary for multiple origins of replication to exist in order to complete replication in a reasonable period of time. The precise nature of origins of replication in higher eukaryotic organisms is unclear. However, it is clear that at a replication origin the strands of DNA must dissociate and unwind in order to allow access to DNA polymerase. Unwinding of the duplex at the origin as well as along the strands as the replication process proceeds is carried out by helicases. The resultant regions of single-stranded DNA are stabilized by the binding of single-strand binding proteins. The stabilized single-stranded regions are then accessible to the enzymatic activities required for replication to proceed. The site of the unwound template strands is termed the replication fork.

In order for DNA polymerases to synthesize DNA they must encounter a free 3'-OH which is the substrate for attachment of the 5'-phosphate of the incoming nucleotide. During repair of damaged DNA the 3'-OH can arise from the hydrolysis of the backbone of one of the two strands. During replication the 3'-OH is supplied through the use of an RNA primer, synthesized by the primase activity. The primase utilizes the DNA strands as templates and synthesizes a short stretch of RNA generating a primer for DNA polymerase.

Synthesis of DNA proceeds in the 5' -----> 3' direction through the attachment of the 5'-phosphate of an incoming dNTP to the existing 3'-OH in the elongating DNA strands with the concomitant release of pyrophosphate. Initiation of synthesis, at origins of replication, occurs simultaneously on both strands of DNA. Synthesis then proceeds bidirectionally, with one strand in each direction being copied continuously and one strand in each direction being copied discontinuously. During the process of DNA polymerases incorporating dNTPs into DNA in the 5' -----> 3' direction they are moving in the 3' -----> 5' direction with respect to the template strand. In order for DNA synthesis to occur simultaneously on both template strands as well as bidirectionally one strand appears to be synthesized in the 3' ------> 5' direction. In actuality one strand of newly synthesized DNA is produced discontinuously.

The strand of DNA synthesized continuously is termed the leading strand and the discontinuous strand is termed the lagging strand. The lagging strand of DNA is composed of short stretches of RNA primer plus newly synthesized DNA approximately 100-200 bases long (the approximate distance between adjacent nucleosomes). The lagging strands of DNA are also called Okazaki fragments. The concept of continuous strand synthesis is somewhat of a misnomer since DNA polymerases do not remain associated with a template strand indefinitely. The ability of a particular polymerase to remain associated with the template strand is termed its' processivity. The longer it associates the higher the processivity of the enzyme. DNA polymerase processivity is enhanced by additional protein activities of the replisome identified as processivity accessory proteins.

How is it that DNA polymerase can copy both strands of DNA in the 5' ----> 3' direction simultaneously? A model has been proposed where DNA polymerases exist as dimers associated with the other necessary proteins at the replication fork and identified as the replisome. The template for the lagging strand is temporarily looped through the replisome such that the DNA polymerases are moving along both strands in the 3' ----> 5' direction simultaneously for short distances, the distance of an Okazaki fragment. As the replication forks progress along the template strands the newly synthesized daughter strands and parental template strands reform a DNA double helix. The means that only a small stretch of the template duplex is single-stranded at any given time.

The progression of the replication fork requires that the DNA ahead of the fork be continuously unwound. Due to the fact that eukaryotic chromosomal DNA is attached to a protein scaffold the progressive movement of the replication fork introduces severe torsional stress into the duplex ahead of the fork. This torsional stress is relieved by DNA topoisomerases. Topoisomerases relieve torsional stresses in duplexes of DNA by introducing either double- (topoisomerases II) or single-stranded (topoisomerases I) breaks into the backbone of the DNA. These breaks allow unwinding of the duplex and removal of the replication-induced torsional strain. The nicks are then resealed by the topoisomerases.

The RNA primers of the leading strands and Okazaki fragments are removed by the repair DNA polymerases simultaneously replacing the ribonucleotides with deoxyribonucleotides. The gaps that exist between the 3'-OH of one leading strand and the 5'-phosphate of another as well as between one Okazaki fragment and another are repaired by DNA ligases thereby, completing the process of replication.
back to the top

The main enzymatic activity of DNA polymerases is the 5' ----> 3' synthetic activity. However, DNA polymerases possess two additional activities of importance for both replication and repair. These additional activities include a 5' ----> 3' exonuclease function and a 3' ------> 5' exonuclease function. The 5' -----> 3' exonuclease activity allows the removal of ribonucleotides of the RNA primer, utilized to initiate DNA synthesis, along with their simultaneous replacement with deoxyribonucleotides by the 5' ------> 3' polymerase activity. The 5' -----> 3' exonuclease activity is also utilized during the repair of damaged DNA. The 3' -----> 5' exonuclease function is utilized during replication to allow DNA polymerase to remove mismatched bases. It is possible (but rare) for DNA polymerases to incorporate an incorrect base during replication. These mismatched bases are recognized by the polymerase immediately due to the lack of Watson-Crick base-pairing. The mismatched base is then removed by the 3' ------> 5' exonuclease activity and the correct base inserted prior to progression of replication.
back to the top

One of the major post-replicative reactions that modifies the DNA is methylation. The sites of natural methylation (i.e. not chemically induced) of eukaryotic DNA is always on cytosine residues that are present in CpG dinucleotides. However, it should be noted that not all CpG dinucleotides are methylated at the C residue. The cytidine is methylated at the 5 position of the pyrimidine ring generating 5-methylcytidine.

Methylation of DNA in prokaryotic cells also occurs. The function of this methylation is to prevent degradation of host DNA in the presence of enzymatic activities synthesized by bacteria called restriction endonucleases. These enzymes recognize specific nucleotide sequences of DNA. The role of this system in prokaryotic cells (called the restriction-modification system) is to degrade invading viral DNAs. Since the viral DNAs are not modified by methylation they are degraded by the host restriction enzymes. The methylated host genome is resistant to the action of these enzymes.

The precise role of methylation in eukaryotic DNA is unclear. It was originally thought that methylated DNA would be less transcriptionally active. Indeed, experiments have been carried out to demonstrate that this is true for certain genes. For example, under-methylation of the MyoD gene (a master control gene regulating the differentiation of muscle cells through the control of the expression of muscle-specific genes) results in the conversion of fibroblasts to myoblasts. The experiments were carried out by allowing replicating fibroblasts to incorporate 5-azacytidine into their newly synthesized DNA. This analog of cytidine prevents methylation. The net result is that the maternal pattern of methylation is lost and numerous genes become under methylated. However, lack of methylation nor the presence of methylation is a clear indicator of whether a gene will be transcriptionally active or silent.

The pattern of methylation is copied post-replicatively by the maintenance methylase system. This activity recognizes the pattern of methylated C residues in the maternal DNA strand following replication and methylates the C residue present in the corresponding CpG dinucleotide of the daughter strand.

The phenomenon of genomic imprinting refers to the fact that the expression of some genes depends on whether or not they are inherited maternally or paternally. Insulin-like growth factor-2 (Igf2) is a gene whose expression is required for normal fetal development and growth. Expression of Igf2 occurs exclusively from the paternal copy of the gene. Imprinted genes are "marked" by their state of methylation. In the case of Igf2 an element in the paternal locus, called an insulator element, is methylated blocking its function. The function of the un-methylated insulator is to bind a protein that when bound blocks activation of Igf2 expression. When methylated the protein cannot bind the insulator thus allowing a distant enhancer element to drive expression of the Igf2 gene. In the maternal genome, the insulator is not methylated, thus protein binds to it blocking the action of the distant enhancer element.
back to the top

DNA recombination refers to the phenomenon whereby two parental strands of DNA are spliced together resulting in an exchange of portions of their respective strands. This process leads to new molecules of DNA that contain a mix of genetic information from each parental strand. There are 3 main forms of genetic recombination. These are homologous recombination, site-specific recombination and transposition.

Homologous recombination is the process of genetic exchange that occurs between any two molecules of DNA that share a region (or regions) of homologous DNA sequences. This form of recombination occurs frequently while sister chromatids are paired during meiosis. Indeed, it is the process of homologous recombination between the maternal and paternal chromosomes that imparts genetic diversity to an organism. Homologous recombination generally involves exchange of large regions of the chromosomes.

Site-specific recombination involves exchange between much smaller regions of DNA sequence (approximately 20 - 200 base pairs) and requires the recognition of specific sequences by the proteins involved in the recombination process. Site-specific recombination events occur primarily as a mechanism to alter the program of genes expressed at specific stages of development. The most significant site-specific recombinational events in humans are the somatic cell gene rearrangements that take place in the immunoglobulin genes during B-cell differentiation in response to antigen presentation. These gene rearrangements in the immunoglobulin genes result in an extremely diverse potential for antibody production. A typical antibody molecule is composed of both heavy and light chains. The genes for both these peptide chains undergo somatic cell rearrangement yielding the potential for approximately 3000 different light chain combinations and approximately 5000 heavy chain combinations. Then because any given heavy chain can combine with any given light chain the potential diversity exceeds 10,000,000 possible different antibody molecules.
back to the top

Transposition is a unique form of recombination where mobile genetic elements can virtually move from one region to another within one chromosome or to another chromosome entirely. There is no requirement for sequence homology for a transpositional event to occur. Because the potential exists for the disruption of a vitally important gene by a transposition event this process must be tightly regulated. The exact nature of how transpositional events are controlled is unclear.

Transposition occurs with a higher frequency in bacteria and yeasts than it does in humans. The identification of the occurrence of transposition in the human genome resulted when it was found that certain processed genes were present in the genome. These processed genes are nearly identical to the mRNA encoded by the normal gene. The processed genes contain the poly(A) tail that would have been present in the RNA and they lack the introns of the normal gene. These particular forms of genes must have arisen through a reverse transcription event, similar to the life cycle of retroviral genomes, and then been incorporated into the genome by a transpositional event. Since most of the processed genes that have been identified are non-functional they have been termed pseudogenes.
back to the top

Cancer, in most non-viral induced cases, is the severe medically relevant consequence of the inability to repair damaged DNA. It is clear that multiple somatic cell mutations in DNA can lead to the genesis of the transformed phenotype. Therefore, it should be obvious that complete understanding of DNA repair mechanisms would be invaluable in the design of potential therapeutic agents in the treatment of cancer.

DNA damage can occur as the result of exposure to environmental stimuli such as alkylating chemicals or ultraviolet or radioactive irradiation and free radicals generated spontaneously in the oxidizing environment of the cell. These phenomena can, and do, lead to the introduction of mutations in the coding capacity of the DNA. Mutations in DNA can also, but rarely, arise from the spontaneous tautomerization of the bases.

Modification of the DNA bases by alkylation (predominately the incorporation of -CH3 groups) predominately occurs on purine residues. Methylation of G residues allows them to base pair with T instead of C. A unique activity called O6-alkylguanine transferase removes the alkyl group from G residues. The protein itself becomes alkylated and is no longer active, thus, a single protein molecule can remove only one alkyl group.

Mutations in DNA are of two types. Transition mutations result from the exchange of one purine, or pyrimidine, for another purine, or pyrimidine. Transversion mutations result from the exchange of a purine for a pyrimidine or visa versa.

The prominent by-product from uvirradiation of DNA is the formation of thymine dimers. These form from two adjacent T residues in the DNA. Repair of thymine dimers is most understood from consideration of the mechanisms used in E. coli. However, several mechanism are common to both prokaryotes and eukaryotes.

Thymine dimers are removed by several mechanisms. Specific glycohydrolases recognize the dimer as abnormal and cleave the N-glycosidic bond of the bases in the dimer. This results in the base leaving and generates an apyrimidinic site in the DNA. This is repaired by DNA polymerase and ligase. Glycohydrolases are also responsible for the removal of other abnormal bases, not just thymine dimers.

Another, widely distributed activity, is DNA photolyase or photoreactivating enzyme. This protein binds to thymine dimers in the dark. In response to visible light stimulation the enzyme cleaves the pyrimidine rings. The chromophore associated with this enzyme that allows visible light activation is FADH2

Humans defective in DNA repair, (in particular the repair of uv-induced thymine dimers), due to autosomal recessive genetic defects suffer from the disease Xeroderma pigmentosum (XP). There are at least nine distinct genetic defects associated with this disease. One of these is due to a defect in the gene coding for the glycohydrolase that cleaves the N-glycosidic bond of the thymine dimers. There are two major clinical forms of XP, one which leads to progressive degenerative changes in the eyes and skin and the other which also includes progressive neurological degeneration.

Another inherited disorder affecting DNA repair in which patients suffer from sun sensitivity, short stature and progressive neurological degeneration without an increased incidence of skin cancer is Cockayne Syndrome.

Ataxia telangiectasia (AT) is an autosomal recessive disorder resulting in neurological disability and suppressed immune function. Patients develop a disabling cerebellar ataxia early in life and have recurrent infections. Patients suffering from AT have an increased sensitivity to x-irradiation suggesting a role for the AT gene in DNA repair.