INTRODUCTION

• Genetic diseases occur because of mutations in DNA. Many of these mutations affect the repair of other mutations that occur during DNA replication or at other times, which in turn affect the flow of genetic information from DNA to RNA (transcription and processing) and from RNA to protein synthesis (translation). Many of these mutations also affect the structures of the resulting proteins, affecting their functions.

DNA Structure and Chemistry
a). Evidence that DNA is the genetic information i). DNA transformation – know this term ii). Transgenic experiments – know this process iii). Mutation alters phenotype – be able to define genotype and phenotype b). Structure of DNA i). Structure of the bases, nucleosides, and nucleotides ii). Structure of the DNA double helix iii). Complementarity of the DNA strands c). Chemistry of DNA i). Forces contributing to the stability of the double helix ii). Denaturation of DNA

•

DNA transformation experiments show that DNA is the carrier of the genetic information. These experiments have been carried out both in vivo (in animals) and in vitro (in cell culture). The in vivo experiments were carried out by injecting mice with both a heat-killed virulent strain of Streptococcus and a non-heated, nonvirulent strain of Streptococcus. The experiments showed that something (DNA) from the heat-killed virulent strain of Streptococcus was able to alter the (still viable) non-virulent strain, converting some of the cells to virulent bacteria and killing the host. We now know that purified DNA confers this virulence. In vitro experiments have shown that purified DNA from Type S (smooth colony) Strep cells is able to be taken up by Type R (rough colonies) Strep cells. The process of getting functionally active DNA into cells is called DNA transformation. Transformation by Type S DNA alters the "genotype" of host cells, since new genes are introduced into these cells thus altering their genetic constitution. The expression of this Type S DNA changes the "phenotype“ of the transformed cells, making their colonies look "smooth" instead of "rough.“ Genotype is an organism‟s genetic constitution. Phenotype is the observed characteristics of an organism as determined by the genetic makeup and the environment.

•

Transgenic experiments, which are usually carried out in mice, involve the transfer of a specific gene into the nucleus of a fertilized egg. The gene integrates randomly into chromosomal DNA and can be engineered to be expressed in every cell, or only in certain cells at certain times. For example, introduction of the growth hormone gene into transgenic mice alters their genotype and confers a phenotype characterized by increase growth and therefore size. Transgenic experiments show that specific phenotypic traits can be conferred by specific genes, and thus that DNA is the carrier of genetic information. Other types of transgenic experiments involve mutation of specific genes in the mouse to determine the functions of those genes and to create mouse models of human genetic disease. The mutation of a gene in a transgenic mouse that eliminates the gene's function, is called a knockout mutation and the mouse carrying that mutation is called a knockout mouse.

•

•

Phenotypic differences between individuals are due in large measure to differences between genes. Evidence suggests that at least one-third of our genes are polymorphic, in other words that there are differences in the nucleotide sequences in one-third of our genes when these genes are compared from one individual to another individual. It is most likely that these differences occurred by mutation of DNA over many hundreds of thousands of years of human evolution. It is also clear that new DNA mutations give rise to phenotypic differences between individuals, the most dramatic being those that give rise to genetic diseases. All of this evidence indicates that DNA is the carrier of the genetic information. Genetic differences between individuals can have a myriad of clinical implications. Some inherited differences, which may be less severe, can confer a predisposition to certain medical problems. Other examples are individual rates of aging or individual rates of drug metabolism, both of which probably have an underlying genetic basis. More severe genetic differences can be the causes of debilitating inherited diseases.

Structures of the bases
Purines Pyrimidines

Adenine (A)

Thymine (T)

5-Methylcytosine (5mC)

Guanine (G)

Cytosine (C)

Be familiar with the structures of the purine bases, adenine (A) and guanine (G); and the pyrimidine bases, thymine (T) and cytosine (C). A common base modification in DNA results from the methylation of cytosine, giving rise to 5-methylcytosine (5mC). As we shall see subsequently, 5mC is highly mutagenic. It is believed that this methylation functions to regulate gene expression because 5-methylcytosine (5mC) residues are often clustered near the promoters of genes in so-called "CpG islands.“ (Along one strand of DNA the nucleotides are sometimes indicated by the base followed by a phosphate or “p” such as ApTpCpCpGpApCpTpGpGp - this sequence contains one CpG site.) The problem that arises from these methylations is that subsequent deamination of a 5mC results in the production of thymine, which is not foreign to DNA. As such, 5'-mCG-3' sites (or mCpG sites) are "hot-spots" for mutation, and when mutated are a common cause of cancer.

Nucleoside

[structure of deoxyadenosine]

Nucleotide

This table lists the common bases and their corresponding names when in the nucleoside or nucleotide form. Hypoxanthine (inosine) is seen in DNA following deamination of adenine (adenosine). It is also seen in transfer RNA as a common, functionally important posttranscriptional modification. Uracil (uridine) is found in RNA, instead of thymine (thymidine), which is specific for DNA.

Nomenclature

Base Purines adenine guanine
hypoxanthine

Nucleoside +deoxyribose

Nucleotide +phosphate

adenosine guanosine
inosine

Pyrimidines thymine cytosine
+ribose uracil

thymidine cytidine uridine

When a base, such as adenine, is linked to a deoxyribose sugar through a glycosidic bond, the structure is a nucleoside, in this case deoxyadenosine. The deoxyribose sugar lacks a hydroxyl group on the 2' carbon, hence deoxy. This is in contrast to the presence of a hydroxyl at that position in the ribose sugar found in RNA. When the deoxyribose sugar is phosphorylated, on either the 3' or the 5' position (or both), the structure is a nucleotide, in this case deoxyadenosine-5'-phosphate. The precursors of DNA synthesis are deoxynucleoside-5'triphosphates or dNTPs.

G-C base pair
Chargaff’s rule: The content of A equals the content of T, and the content of G equals the content of C in double-stranded DNA from any species

The DNA double helix requires that the two polynucleotide chains be base-paired to each other. This slide shows an adenine-thymine (A-T) base pair (which is the A and which is the T?); and a guanine-cytosine (G-C) base pair (which is the G and which is the C?). Because of base pairing, the polynucleotide chains in double-stranded DNA are complementary to each other.

Double-stranded DNA
5’ 3’

Major groove Minor groove

“B” DNA
3’ 5’ 3’ 5’

This slide shows double-stranded DNA, which is composed of two base-paired, complementary polynucleotide chains. Basepairing between the complementary strands is required for two important functions of DNA: 1) DNA replication involves an unwinding of the double helix (right) followed by synthesis of a complementary strand from each of the unpaired template strands, and 2) DNA serves as a template for RNA synthesis by utilizing the information in one strand to code for a complementary RNA strand.

DNA in the "B" form has a major groove and a minor groove, and has 10 base pairs per one turn of the double helix. DNA that is overwound or underwound, with fewer than or more than 10 base pairs per turn, is said to be "supercoiled". It should also be noted that the complementary strands in double helical DNA are antiparallel with respect to each other. Each polynucleotide chain has a 5' end and a 3' end. Deoxyribonucleases (or DNases) are enzymes that cleave phosphodiester bonds. Some are used for constructive purposes, such as proofreading during DNA replication, whereas others are used to degrade DNA. There are two basic classes of DNases: exonucleases and endonucleases. Exonucleases remove only the terminal nucleotide, whereas endonucleases cleave anywhere within the DNA double helix.

Chemistry of DNA
Forces affecting the stability of the DNA double helix

Three types of forces contribute to maintaining the stability of the DNA double helix: 1) hydrophobic interactions, 2) stacking interactions, and 3) hydrogen bonding. The base pairs in the interior of the DNA molecule create a hydrophobic environment, with the negatively charged phosphates along the backbone being exposed to the solvent. Thus, in an aqueous environment, the double-stranded structure is stabilized by the hydrophobic interior. Reagents that solubilize the DNA bases (e.g., methanol) destabilize the double helix. Stacking interactions and hydrogen bonding interactions are relatively weak but additive. Reagents that disrupt hydrogen bonding [e.g., formamide, urea, and solutions with very low pH (pH <2.3) or very high pH (pH >10)] destabilize the double helix.

Electrostatic replusion by negatively charged phosphates along the DNA backbone destabilize the double helix. For example, if the phosphates are left unshielded, as when DNA is dissolved in distilled water, the DNA strands will separate at room temperature. Neutralizing these negative charges by the addition of NaCl (which contributes positively charged sodium ions) to the DNA solution will prevent strand separation. In the cell, the phosphates also interact with positively charged (magnesium, potassium, or sodium) ions and with positively charged (basic) proteins.

Stacking interactions Charge repulsion Charge repulsion

Model of double-stranded DNA showing three base pairs

This slide shows a side view of three base pairs in the DNA double helix. Note the base-pair stacking interactions, the hydrophobic interior, and the phosphates on the exterior

The forces stabilizing the DNA double helix can be overcome by heating the DNA in solution or by treating it with very high or very low pH (low pH will also damage the DNA, whereas high pH will simply separate the polynucleotide chains). When the strands of DNA separate, the DNA is said to be denatured (when high temperature is used to denature DNA, the DNA is said to be melted). Because some of the forces stabilizing the DNA double helix are contributed by base pairing interactions, and because A-T base pairs have only two hydrogen bonds in contrast to G-C base pairs which have three hydrogen bonds, regions of the DNA duplex that are A-T rich will denature first. Once denaturation has begun, there is a cooperative unwinding of the double helix that ultimately results in complete strand separation.

Electron micrograph of partially melted DNA

Double-stranded, G-C rich DNA has not yet melted

A-T rich region of DNA has melted into a single-stranded bubble

• A-T rich regions melt first, followed by G-C rich regions

This slide shows an electron micrograph tracing of a DNA molecule that is only partially melted. The thicker regions are double-stranded and probably more G-C rich. The A-T rich regions are more prone to denaturation, and as seen here, form singlestranded "bubbles."

The absorbance at 260 nm of a DNA solution increases when the double helix is melted into single strands.

Hyperchromicity can be used to follow the denaturation of DNA as a function of increasing temperature. As the temperature of a DNA solution gradually rises above 50 degrees C, the A-T regions will melt first giving rise to an increase in the UV absorbance. As the temperature increases further, more of the DNA will become single-stranded, further increasing the UV absorbance, until the DNA is fully denatured above 90 degrees C. The temperature at the mid-point of the melting curve is termed "melting temperature" and is abbreviated Tm. The Tm for a DNA depends on its average G+C content: the higher the G+C content, the higher the Tm. Note: G+C content, G-C content, and GC content are equivalent terms.

DNA melting curve
Percent hyperchromicity 100

50

0 50 70 Temperature oC 90

• Tm is the temperature at the midpoint of the transition

When a solution of double-stranded DNA is placed in a spectrophotometer cuvette and the absorbance of the DNA is determined across the electromagnetic spectrum, it characteristically shows an absorbance maximum at 260 nm (in the UV region of the spectrum). If the same DNA solution is melted, the absorbance at 260 nm increases approximately 40%. This property is termed "hyperchromicity." The hyperchromic shift is due to the fact that unstacked bases absorb more light than stacked bases.

Tm is dependent on the G-C content of the DNA
Percent hyperchromicity

50

E. coli DNA is 50% G-C

60

70 Temperature oC

80

Average base composition (G-C content) can be determined from the melting temperature of DNA

This slide shows the dependence of Tm on average G+C content of three different DNAs. Under the conditions used in this experiment, E. coli DNA which has an average G+C content of about 50%, melted with a Tm of 69 degrees C. The curve on the left represents a DNA with a lower G+C content and the curve on the right represents a DNA with a higher G+C content. Tm is dependent on the ionic strength of the solution. At a fixed ionic strength there is a linear relation between Tm and G+C content.

This illustrates the concept of how sequence complexity affects the rate of DNA reassociation. Imagine two different DNA sequences in a genome, one present one time per haploid genome (right) and the other present 1,000,000 times per haploid genome (left). They would be present at a 1:1,000,000 ratio with respect to each other. If these sequences were mixed together (which is what would happen if total genomic DNA was isolated for analysis), then fragmented, denatured and allowed to reassociate, the repeated sequences would reassociate much more rapidly because it would be much easier for them to find complementary strands to base pair with. The repeated sequences would reassociate with a very low Cot1/2 and therefore with a very high k2, consistent with a rapid rate of reassociation.

The human genome consists of three populations of DNA: the fast and intermediate fractions make up about 10% and 15% of the genome, respectively, and the slow fraction makes up about 75% of the genome. Most of the genes in the human genome are in the single-copy fraction. As shown in the next slide, repeated sequences can be of two types: those that are interspersed throughout the genome or those that are tandemly repeated satellite DNAs. Among the interspersed repetitive sequences are so-called "Alu" sequences, which are about 300 base pairs in length and are repeated about 300,000 times in the genome. They can be found adjacent to or within genes, and as illustrated later, their presence can sometimes lead to the occasional disruption of genes. The interspersed repetitive sequences also include VNTRs (variable numbers of tandem repeats), which are comprised of short repeated sequences of only a few base-pairs, but of variable lengths. They, too, are interspersed throughout the genome, and are quite useful as landmarks for mapping genes because they are highly polymorphic (they differ in length or number of repeats from individual to individual).

Type of DNA Single-copy (unique) Repetitive Interspersed

% of Genome ~75% ~15%

Features Includes most genes 1 Interspersed throughout genome between and within genes; includes Alu sequences 2 and VNTRs or mini (micro) satellites Highly repeated, low complexity sequences usually located in centromeres and telomeres

Satellite (tandem) 0

~10%

fast ~10% intermediate ~15% Alu sequences are about 300 bp in length and are repeated about 300,000 times in the genome. They can be found adjacent to or within genes in introns or nontranslated regions.
2

50
slow (single-copy) ~75% 100
1 Some

I

I

I

I

I

I

I

I

I

genes are repeated a few times to thousands-fold and thus would be in the repetitive DNA fraction

. Knowing the complete sequence of the human genome will allow medical researchers to more easily find disease-causing genes. In addition, it should become possible to understand how differences in our DNA sequences from individual to individual may affect our predisposition to diseases and our ability to metabolize drugs. Because the human genome has ~3 billion bp of DNA and there are 23 pairs of chromosomes in diploid human cells, the average metaphase chromosome has ~130 million bp DNA.

This slide shows the structure of a typical human gene and its corresponding messenger RNA (mRNA). Most genes in the human genome are called "split genes" because they are composed of "exons" separated by "introns." The exons are the regions of genes that encode information that ends up in mRNA. The transcribed region of a gene (doubleended arrow) starts at the +1 nucleotide at the 5' end of the first exon and includes all of the exons and introns (initiation of transcription is regulated by the promoter region of a gene, which is upstream of the +1 site). RNA processing (the subject of a another lecture) then removes the intron sequences, "splicing" together the exon sequences to produce the mature mRNA. The translated region of the mRNA (the region that encodes the protein) is indicated in blue. Note that there are untranslated regions at the 5' and 3„ ends of mRNAs that are encoded by exon sequence but are not directly translated.

This figure shows examples of the wide variety of gene structures seen in the human genome. Some (very few) genes do not have introns. One example is the histone genes, which encode the small DNA-binding proteins, histones H1, H2A, H2B, H3, and H4. Shown here is a histone gene that is only 400 base pairs (bp) in length and is composed of only one exon. The betaglobin gene has three exons and two introns. The hypoxanthine-guanine phosphoribosyl transferase (HGPRT or HPRT) gene has nine exons and is over 100-times larger than the histone gene, yet has an mRNA that is only about 3-times larger than the histone mRNA (total exon length is 1,263 bp). This is due to the fact that introns can be very long, while exons are usually relatively short. An extreme example of this is the factor VIII gene which has numerous exons (the blue boxes and blue vertical lines).

The rather common (~1 in 500) autosomal dominant disease, familial hypercholesterolemia (FH), is caused by mutations in the LDL (low density lipoprotein) receptor gene (for more information about FH, look at pages 218-222 of Thompson & Thompson and at Case 9). Plasma LDL, which carries circulating cholesterol, is cleared from the serum by binding to the LDL receptor on liver cells and is internalized. Normal plasma cholesterol levels average below 200 mg/dl. Individuals who have one defective LDL receptor gene (heterozygous) have approximately double this amount, and those with two defective genes (homozygous) have approximately four times this amount. Heterozygous individuals are predisposed to cardiovascular disease, with males having a 50% risk of myocardial infarction by age 50. There are many ways that the LDL receptor gene has been mutated rendering it inactive or abnormal. As shown in the next figure, one mechanism has involved Alu sequences.

LDL receptor gene Alu repeats present within introns

4 unequal crossing over

5 6

Alu repeats in exons
4 Alu 5 Alu 6

X
4 Alu 5 Alu 6

one product has a deleted exon 5
4 Alu 6
(the other product is not shown)

Here you see the structure of the LDL receptor gene (which has 18 exons). Six Alu sequences are present within three of the introns and two of the exons. Because of the close proximity of the two Alu repeats located within introns 4 and 5, unequal crossing over can occur during meiosis. Crossing over (the topic of a future lecture) requires homologous sequences, which base pair with each other during the process of meiosis. The homologous sequences can be provided by the Alu repeats, which can cause an outof-register misalignment and subsequent crossing over deleting exon 5 from one of the two products of crossing over. This exon 5 in-frame deletion can be inherited and is currently a cause of FH. This deletion affects the LDL binding region of the receptor. Thus, while Alu sequences have no known function in our genomes, there are a lot of them scattered throughout our genomes, within and around genes, and they can be quite disruptive.

Chromatin structure

EM of chromatin shows presence of nucleosomes as “beads on a string”

Each nucleosome is composed of a core (left) consisting of two each of the histones, H2A, H2B, H3, and H4, around which the DNA winds 1 3/4 times. The DNA undergoes negative supercoiling as a consequence of being wound around the core histones. Histones are positively charged proteins and thus interact with the negatively charged phosphates along the backbone of the DNA double helix. While the core has 146 bp of DNA, the nucleosome proper (right) has approximately 200 bp of DNA and also includes one histone H1 monomer lying on the outside of the structure. Nucleosomes are regularly spaced along eukaryotic chromosomal DNA every ~200 bp, giving rise to the "beads on a string" structure.

Histones are small, positively charged proteins that can be extensively modified posttranslationally, in general to make them less positively charged. Histone deacetylases (HDACs) are associated with transcriptional repression because they make histones better able to bind DNA, thus making DNA less accessible to the transcription machinery. Histone deacetylases are recruited to the chromosome by transcriptional repressors such as the retinoblastoma (Rb) protein (the subject of another lecture). Histone acetylases are recruited to chromosomes by transcription factors (TFs). Histone acetylases reduce the positive charges on histones, causing them to loosen their grip on the DNA to allow transcription factors to bind.

The orderly packaging of DNA in the cell is essential for the process of DNA replication, as well as for the process of transcription. Packaging of DNA into nucleosomes is only the first step, foreshortening chromosomal DNA somewhat by virtue of its being wrapped around the core histones 1 3/4 times. However, if the average human genomic DNA molecule is ~130 million bp in length, its length would be an astounding 44 mm. All this DNA X 23 chromosomes has to packaged in the nucleus of a cell that is too small to be seen with the unaided eye. Thus, the DNA needs to be packaged in higher-order structures such as shown above, first into closely packed arrays of nucleosomes called nucleofilaments, which are then coiled into thicker and thicker filaments.

Nucleofilament structure

The interphase nucleus contains loosely packed, filamentous chromosomes, whose DNA is available for gene transcription. During each round of cell division, the chromosomal DNA is replicated and then condensed into metaphase chromosomes for segregation into the daughter cells, followed by decondensation as the interphase nucleus is formed.

Condensation and decondensation of a chromosome in the cell cycle

The chromosome contains a single, long molecule of double stranded DNA, and thus has two ends. These ends create two problems: they are difficult to replicate and they have a tendency to fuse with other chromosome ends causing karyotypic rearrangements. To prevent these problems, chromosomes have protective ends called "telomeres" that are composed of tandemly repeated, 5-8 bp sequences up to 12 kb in length. In germline cells and in the cells of young individuals, telomeres are of maximal length, but with every round of somatic cell division telomeres get a little shorter. After many rounds of replication and cell division, telomeres become too short to protect the chromosome ends from fusing with other chromosomes. At this stage, cells are said to be "senescent." Telomere length is maintained in germline cells by an enzyme called "telomerase," which can restore any shortening that has occurred. Tumor cells also have telomerase and thus are immortal and can grow indefinitely.